Question

使用最少的代码比较两个字符串的最简单方法是什么，而忽略以下内容：

"hello  world" == "hello world"                   // spaces
"hello-world"  == "hello world"                   // hyphens
"Hello World"  == "hello worlD"                   // case
"St pierre"    == "saint pierre" == "St. Pierre"  // word replacement

我确信之前已经完成了，并且有一些库可以做这种事情，但我不知道。这在C ++中最好，但如果在其他语言中有一个非常短的选项，我也想听听它。

或者，我也对任何能够提供一定比例匹配的库感兴趣。比方说，hello-world和hello wolrd有97％可能是相同的含义，只是一个连字符和一个错误拼写。

Answer 1

从两个字符串中删除空格。
从两个字符串中删除连字符。
将两个字符串转换为小写字母。
将所有出现的“saint”和“st。”转换为“st”。
比较正常的字符串。

例如：

#include <cctype>
#include <string>
#include <algorithm>
#include <iostream>

static void remove_spaces_and_hyphens(std::string &s)
{
    s.erase(std::remove_if(s.begin(), s.end(), [](char c) {
                return c == ' ' || c == '-';
            }), s.end());
}

static void convert_to_lower_case(std::string &s)
{
    for (auto &c : s)
        c = std::tolower(c);
}

static void
replace_word(std::string &s, const std::string &from, const std::string &to)
{
    size_t pos = 0;
    while ((pos = s.find(from, pos)) != std::string::npos) {
        s.replace(pos, from.size(), to);
        pos += to.size();
    }
}

static void replace_words(std::string &s)
{
    replace_word(s, "saint", "st");
    replace_word(s, "st.", "st");
}

int main()
{
    // Given two strings:
    std::string s1 = "Hello, Saint   Pierre!";
    std::string s2 = "hELlO,St.PiERRe!";

    // Remove spaces and hyphens.
    remove_spaces_and_hyphens(s1);
    remove_spaces_and_hyphens(s2);

    // Convert to lower case.
    convert_to_lower_case(s1);
    convert_to_lower_case(s2);

    // Replace words...
    replace_words(s1);
    replace_words(s2);

    // Compare.
    std::cout << (s1 == s2 ? "Equal" : "Doesn't look like equal") << std::endl;
}

当然，有一种方法可以更有效地编码，但我建议你从一些工作开始，只有当它被证明是一个瓶颈时才进行优化。

听起来您可能对string similarity algorithms “Levenshtein distance”感兴趣。例如，搜索引擎或编辑使用类似的算法来提供关于拼写纠正的建议。

Answer 2

我不知道任何库，但是对于equulity，如果速度不是rpoblem，你可以进行char-by-char比较并忽略“特殊”字符（分别在文本中进一步移动迭代器）。

至于比较文本，您可以使用简单的Levenshtein distance。

Answer 3

对于空格和连字符，只需替换字符串中的所有空格/连字符并进行比较。例如，将所有文本转换为大写或小写并进行比较。对于单词替换，您需要一个单词词典，其中键是缩写，值是替换词。您还可以考虑使用Levenshtein Distance算法来显示一个短语与另一个短语的相似程度。如果您想要统计概率单词/短语与另一个单词/短语的接近程度，则需要样本数据进行比较。

Answer 4

QRegExp正是您要找的。它不会打印出百分比，但你可以做一些非常灵巧的方法来比较一个字符串到另一个字符串，并找到一个字符串到另一个字符串的匹配数。

正则表达式几乎可用于任何语言。我喜欢GSkinner's RegEx页面来学习正则表达式。

http://qt-project.org/doc/qt-4.8/qregexp.html

希望有所帮助。

Answer 5

前3个要求，

删除字符串的所有空格/超出（或将其替换为字符，例如''） “你好世界” - ＆gt; “你好世界”
比较它们忽略大小写。 Case insensitive string comparison in C++

对于最后一个要求，它更为复杂首先你需要一个字典，在KV结构中：
'圣'：'圣' '先生'：'先生'

第二次使用boost令牌来分隔字符串，然后在KV Store中获取然后将令牌替换为字符串，但它可能性能较低：

http://www.boost.org/doc/libs/1_53_0/libs/tokenizer/tokenizer.htm

两个字符串相等

5 个答案: