模糊匹配c#

时间:2015-11-12 06:57:59

标签: c# regex string

我的问题是 假设我有一个字符串:

"快速的布朗福克斯跳过懒狗"它有8个单词 我还有一些其他字符串,我必须比较上面的字符串 这些字符串是:

  1. 这是与上述字符串不匹配的字符串。

  2. Quick Brown fox Jumps。

  3. 棕色狐狸跳过懒惰。

  4. 快速的棕色狐狸越过狗。

  5. 狐狸跳过懒狗。

  6. 跳过。

  7. 懒狗。

  8. 例如用户给出阈值(匹配字符串的百分比率)为60% 这意味着

    = 8 * 60/100(此处8为上述字符串的总字数,60为阈值)

    = 4.8

    这意味着至少4个单词应该匹配,这意味着结果应该是

    1. Quick Brown fox Jumps。

    2. 快速的棕色狐狸越过狗。

    3. 棕色狐狸跳过懒惰。

    4. 狐狸跳过懒狗。

    5. 我怎样才能在c#中做这种模糊匹配,请帮助我..

2 个答案:

答案 0 :(得分:6)

我建议比较 dictionarie ,而不是字符串

  1. 如果句子中有相同的单词,例如"狐狸跳过狗#34;
  2. 标点符号:句号,逗号等
  3. 案例,比如," Fox"," fox"," FOX"等。
  4. 所以实施

    public static Dictionary<String, int> WordsToCounts(String value) {
      if (String.IsNullOrEmpty(value))
        return new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);
    
      return value
        .Split(' ', '\r', '\n', '\t')
        .Select(item => item.Trim(',', '.', '?', '!', ':', ';', '"'))
        .Where(item => !String.IsNullOrEmpty(item))
        .GroupBy(item => item, StringComparer.OrdinalIgnoreCase)
        .ToDictionary(chunk => chunk.Key, 
                      chunk => chunk.Count(), 
                      StringComparer.OrdinalIgnoreCase);
    }
    
    public static Double DictionaryPercentage(
      IDictionary<String, int> left,
      IDictionary<String, int> right) {
    
      if (null == left)
        if (null == right)
          return 1.0;
        else
          return 0.0;
      else if (null == right)
        return 0.0;
    
      int all = left.Sum(pair => pair.Value);
    
      if (all <= 0)
        return 0.0;
    
      double found = 0.0;
    
      foreach (var pair in left) {
        int count;
    
        if (!right.TryGetValue(pair.Key, out count))
          count = 0;
    
        found += count < pair.Value ? count : pair.Value;
      }
    
      return found / all;
    }
    
    public static Double StringPercentage(String left, String right) {
      return DictionaryPercentage(WordsToCounts(left), WordsToCounts(right));
    }
    

    您提供的样本将是

      String original = "Quick Brown Fox Jumps over the lazy dog";
    
      String[] extracts = new String[] {
        "This is un-match string with above string.",
        "Quick Brown fox Jumps.",
        "brown fox jumps over the lazy.",
        "quick brown fox over the dog.",
        "fox jumps over the lazy dog.",
        "jumps over the.",
        "lazy dog.",
      };
    
      var data = extracts
        .Select(item => new {
          text = item,
          perCent = StringPercentage(original, item) * 100.0
        })
        //.Where(item => item.perCent >= 60.0) // uncomment this to apply threshold
        .Select(item => String.Format(CultureInfo.InvariantCulture, 
          "\"{0}\" \t {1:F2}%", 
          item.text, item.perCent));
    
      String report = String.Join(Environment.NewLine, data);
    
      Console.write(report);
    

    报告

      "This is un-match string with above string."   0.00%
      "Quick Brown fox Jumps."                      50.00%
      "brown fox jumps over the lazy."              75.00%
      "quick brown fox over the dog."               75.00%
      "fox jumps over the lazy dog."                75.00%
      "jumps over the."                             37.50%
      "lazy dog."                                   25.00%
    

答案 1 :(得分:0)

正则表达式应该是这样的。

(\bWord1\b|\bWord2\b|\bWord3\b|\betc\b)

然后你只需计算匹配数并将其与单词数进行比较。

string sentence = "Quick Brown Fox Jumps over the lazy dog";
string[] words = sentence.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries);
Regex regex = new Regex("(" + string.Join("|", words.Select(x => @"\b" + x + @"\b"))) + ")", RegexOptions.IgnoreCase);


string input = "Quick Brown fox Jumps";
int threshold = 60;

var matches = regex.Matches(input);

bool isMatch = words.Length*threshold/100 <= matches.Count;

Console.WriteLine(isMatch);