Question

背景

我有一个文本字符串和一个哈希集，其中包含我要查找的单词。

给予

<div v-for="n in 5 | reverse" :key="n.index">
    <div class="container">
        <ul class="rating">
            <li :class="[ (n <= 5) ? 'fill' :'' ]"></li>
            <li :class="[ (n <= 4) ? 'fill' :'' ]"></li>
            <li :class="[ (n <= 3) ? 'fill' :'' ]"></li>
            <li :class="[ (n <= 2) ? 'fill' :'' ]"></li>
            <li :class="[ (n <= 1) ? 'fill' :'' ]"></li>
        </ul>
        <span>{{ n }}</span>
    </div>
</div>

目标

目标是扫描字符串，每次遇到哈希集中的单词时，我们都要存储该单词和起始索引的位置。

在上述情况下，我们应该能够存储以下内容

String doc = "one of the car and bike and one of those";
String [] testDoc = doc.split("\\s+");
HashSet<String> setW = new HashSet<>();
setW.add("and");
setW.add("of");
setW.add("one");

` ATTEMPT

one-->0 

of-->4 

and-->15 

and-->24, 

one-->28, 

of-->32

到目前为止，这是我想到的唯一问题是indexOf方法仅查看单词的首次出现，因此我不确定该怎么做。如果我在扫描每个单词之后继续修剪字符串，那么我将不会获得原始字符串中单词的索引位置。

我希望在这里输入一些信息。

Answer 1

documentation有一个超载版本，该版本需要一个索引才能开始搜索。您可以使用它重复搜索相同的字符串，直到到达结尾为止。

请注意，您可以删除contains()的测试，以免再次搜索字符串。

Answer 2

将单词列表转换为正则表达式，然后让正则表达式为您搜索。

例如您的3个字就是这样的正则表达式：

To

当然，您不需要部分单词，因此需要添加单词边界检查：

and|of|one

由于完全匹配是，因此无需再次捕获该词，因此请使用非捕获组。您还可以轻松地使单词搜索不区分大小写。

尽管纯词（所有字母）永远不会有问题，但最好使用\b(and|of|one)\b来引号来保护正则表达式。

示例

Pattern.quote()

输出

String doc = "one of the car and bike and one of those";
String[] words = { "and", "of", "one" };

// Build regex
StringJoiner joiner = new StringJoiner("|", "\\b(?:", ")\\b");
for (String word : words)
    joiner.add(Pattern.quote(word));
String regex = joiner.toString();

// Find words
for (Matcher m = Pattern.compile(regex, Pattern.CASE_INSENSITIVE).matcher(doc); m.find(); )
    System.out.println(m.group() + "-->" + m.start());

如果要稍微压缩（混淆）代码，可以将其作为单个语句编写为Java 9 +：

one-->0
of-->4
and-->15
and-->24
one-->28
of-->32

输出相同。

Answer 3

好吧，如果您想减少迭代次数，则还有另一种解决方案，此代码将字符串遍历一次。我想按字符访问一个字符串。我用一个StringBuilder附加每个字符，并检查何时获得空格，只需将该字符串附加到最终答案列表中，并添加索引。我已经在下面描述了我的方法，我认为它只是访问每个字符一次，此代码的时间复杂度为O（n）。

StringBuilder sb=new StringBuilder();
    ArrayList<String> answer=new ArrayList<>();
    ArrayList<Integer> index=new ArrayList<>();
    HashSet<String> setW = new HashSet<>();
    setW.add("and");
    setW.add("of");
    setW.add("one");
    index.add(0);
    String doc = "one of the car and bike and one of those";
    for(int i=0;i<doc.length();i++){
        if(i==doc.length() || doc.charAt(i)==' '){
            index.add(i+1);
            answer.add(sb.toString());
            sb=new StringBuilder();
            i++;
        }
        sb.append(doc.charAt(i));
        if(i==doc.length()-1){
            if(setW.contains(sb.toString())){
                answer.add(sb.toString());
            };
        }
    }
    for(int i=0;i<answer.size();i++){
        if(setW.contains(answer.get(i))){
            System.out.println(answer.get(i)+"-->"+index.get(i));
        }
    }

基于此想法，我获得了预期的输出，提交我对这个问题的答案的原因是要获得另一个可能的解决方案。（在HashSet的答案中，我们将不仅获得setW中存在的每个单词的索引，因此，如果您不希望这样做，可以使用一个if（！setW.contains（answer.get（i））删除它。条件。

输出

one-->0
of-->4
and-->15
and-->24
one-->28
of-->32

查找字符串中单词的多次出现并存储各自的注视索引

3 个答案: