Question

我创建的示例代码返回两个字符串之间有多少字符相似，但是如何获取它以便它不会返回重复的相似字符？就像让我们说输入是1：＆＃34;你好＆＃34;和2：＆＃34;借出＆＃34;。当代码迭代时，它将看到总共2个类似的l，我怎样才能让它只返回一个＆＃34; l＆＃34;在我的情况下？

static int compare(String input1, String input2){

            int count = 0;


        for(int i = 0; i < input1.length(); i++) 
        {
            if(input2.contains(String.valueOf(input1.charAt(i)))){
                count++;
            }
        }

        return count;
    }

Answer 1

为获得最佳性能，如果字符串可能很长，并且您需要支持所有 Unicode字符，请使用Set<Integer>和retainAll()，其中整数值是Unicode代码点。

在Java 8中，可以使用以下代码完成：

private static int countDistinctCommonChars(String s1, String s2) {
    Set<Integer> set1 = s1.codePoints().boxed().collect(Collectors.toSet());
    Set<Integer> set2 = s2.codePoints().boxed().collect(Collectors.toSet());
    set1.retainAll(set2);
    return set1.size();
}

如果你想要返回常用字符，你可以这样做：

private static String getDistinctCommonChars(String s1, String s2) {
    Set<Integer> set1 = s1.codePoints().boxed().collect(Collectors.toSet());
    Set<Integer> set2 = s2.codePoints().boxed().collect(Collectors.toSet());
    set1.retainAll(set2);
    int[] codePoints = set1.stream().mapToInt(Integer::intValue).toArray();
    Arrays.sort(codePoints);
    return new String(codePoints, 0, codePoints.length);
}

测试

public static void main(String[] args) {
    test("hello", "lend");
    test("lend", "hello");
    test("mississippi", "expressionless");
    test("expressionless", "comprehensible");
    test("", ""); // Extended, i.e. 2 chars per code point
}
private static void test(String s1, String s2) {
    System.out.printf("Found %d (\"%s\") common chars between \"%s\" and \"%s\"%n",
                      countDistinctCommonChars(s1, s2),
                      getDistinctCommonChars(s1, s2),
                      s1, s2);
}

输出

Found 2 ("el") common chars between "hello" and "lend"
Found 2 ("el") common chars between "lend" and "hello"
Found 3 ("ips") common chars between "mississippi" and "expressionless"
Found 8 ("eilnoprs") common chars between "expressionless" and "comprehensible"
Found 2 ("") common chars between "" and ""

请注意，上一次测试使用'Domino Tiles' Unicode Block（U + 1F030到U + 1F09F）中的Unicode字符，即以surrogate pairs形式存储在Java字符串中的字符。

Answer 2

另一种方法是创建input2的副本并删除找到的字符。我相信这应该比使用ArrayList更有效。

static int compare(String input1, String input2){

        int count = 0;

        String check = input2;
        for(int i = 0; i < input1.length(); i++) 
        {
            if(check.contains(String.valueOf(input1.charAt(i)))){
                check = check.replace(String.valueOf(input1.charAt(i)), "");
                count++;
            }
        }
        return count;
}

Answer 3

我建议创建一个ArrayList来跟踪已经被捕获的字符＆＃34;而你正在迭代字符串。

您必须在for循环之前初始化字符的ArrayList。如果找到重复的字符，请将其添加到ArrayList。如果ArrayList实际包含字母，请在当前代码中更改for循环内的if语句。因此，它不会重复和增加计数。

请记住字符区分大小写。

Answer 4

如果你正在处理各种角色，那么使用它可能最容易。如果您只处理a-z，那么您可以这样做：

int count = 0;
for (char c = 'a'; c <= 'z'; c++)
    if (input1.indexOf(c) >= 0 && input2.indexOf(c) >= 0)
        count++;
return count;

Answer 5

您可以使用this正则表达式删除字符串中的任何重复字符。

(.)(?=.*\1)

删除所有重复项后，您可以继续正常执行。

static int compare(String input1, String input2){
    int count = 0;

    input1 = input1.replaceAll("(.)(?=.*\\1)", "");
    input2 = input2.replaceAll("(.)(?=.*\\1)", "");


    for(int i = 0; i < input1.length(); i++) 

    ...

Answer 6

static int compare(String input1, String input2){

        int count = 0;
        String taken = "";

        for(int i = 0; i < input1.length(); i++) 
        {
            if(input2.contains(String.valueOf(input1.charAt(i)))){
                if(!taken.contains(String.valueOf(input1.charAt(i)))){
                    taken = taken.concat(String.valueOf(input1.charAt(i)));
                    count++;
                }
            }
        }

        return count;
    }

我怎样才能返回一个常见值？

6 个答案: