Question

我有一个包含大量单词的文本文件。我想删除包含重复字母的单词（例如，zoos - 包含2个o）。这样做的最佳方式是什么？

Answer 1

这样的东西

Pattern p = Pattern.compile("([a-zA-Z])*([a-zA-Z])\\2([a-zA-Z])*");
Matcher m = p.matcher("zoo");
System.out.println(m.matches());

只需添加一个循环来尝试文件中的每个单词，如果m.matches() == true - 删除它。

顺便说一句，如果你感觉不舒服，这将不会起作用

Answer 2

这是使用正则表达式和流api：

的示例

package demo;

import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class Demonstration
{
    public static void main(String[] args)
    {
        List<String> input = Arrays.asList( //
            new String[] {"a", "bb", "ccc", "ded", "ff", "ghi", "jkll"});

        // Prints [a, ded, ghi]
        System.out.println(removeWordsWithRepetitiveCharacters(input));
    }

    private static List<String> removeWordsWithRepetitiveCharacters(List<String> words)
    {
        return words.stream() //
            .filter(word -> !word.matches(".*(\\w)\\1+.*")) //
            .collect(Collectors.toList());
    }
}

如何从满足条件的文本中删除特定单词？

2 个答案: