Question

（这个问题与我之前在stackoverflow上发布的上一个问题有关...这里是链接

Extracting Values From an XML File Either using XPath, SAX or DOM for this Specific Scenario）

问题在于，如果我希望得到每个参与者在所有句子中写下的文字，请记住上述情况，而不是获得句子。例如。如果“预算”一词总共使用十次，参与者“Dolske”使用七次，其他人使用三次。所以我需要所有单词的列表以及每个参与者写的次数？每个回合中的单词列表？

实现这一目标的最佳策略是什么？任何样本代码？

此处附有XML（您也可以在引用的问题中查看）

“（495584）Firefox - 搜索建议传递错误的上一个结果以形成历史记录”

<Turn>
  <Date>'2009-06-14 18:55:25'</Date>
  <From>'Justin Dolske'</From>
  <Text>
    <Sentence ID = "3.1"> Created an attachment (id=383211) [details] Patch v.2</Sentence>
    <Sentence ID = "3.2"> Ah. So, there's a ._formHistoryResult in the....</Sentence>
    <Sentence ID = "3.3"> The simple fix it to just discard the service's form history result.</Sentence>
    <Sentence ID = "3.4"> Otherwise it's trying to use a old form history result that no longer applies for the search string.</Sentence>
  </Text>
</Turn>

<Turn>
  <Date>'2009-06-19 12:07:34'</Date>
  <From>'Gavin Sharp'</From>
  <Text>
    <Sentence ID = "4.1"> (From update of attachment 383211 [details])</Sentence>
    <Sentence ID = "4.2"> Perhaps we should rename one of them to _fhResult just to reduce confusion?</Sentence>
  </Text>
</Turn>

<Turn>
  <Date>'2009-06-19 13:17:56'</Date>
  <From>'Justin Dolske'</From>
  <Text>
    <Sentence ID = "5.1"> (In reply to comment #3)</Sentence>
    <Sentence ID = "5.2"> &amp;gt; (From update of attachment 383211 [details] [details])</Sentence> 
    <Sentence ID = "5.3"> &amp;gt; Perhaps we should rename one of them to _fhResult just to reduce confusion?</Sentence>
    <Sentence ID = "5.4"> Good point.</Sentence>
    <Sentence ID = "5.5"> I renamed the one in the wrapper to _formHistResult. </Sentence>
    <Sentence ID = "5.6"> fhResult seemed maybe a bit too short.</Sentence>
  </Text>
</Turn>

..... 等等

帮助将受到高度赞赏......

Answer 1

获取所有值，更好地使用sTax解析器，这对于这类任务是有益的。然后用文字分割所有的句子，做任何你想做的事。就像使用Class Turn创建一个模型，在那里存储作者和句子，为这个类编写服务然后继续。 :)

要在单词中拆分句子，请使用split（）或StringTokenizer，但不推荐使用tokenizer。要使用split，请创建一个临时数组，如

stringArray = sentence.toString().split(" ");

或者像“sentence.getValue（）”，无论如何。

在method参数中你放了regEx。在你的情况下，它是一个简单的空间，因为它分裂句子。然后你可以翻看单词并计算你需要的东西。

如果是ArrayList，请使用List.toArray（）在数组视图中获取列表。

从XML文件中提取字数

1 个答案: