按最小距离的标记对列表进行排序

时间:2012-08-28 09:41:03

标签: c# sorting

我想对字符串列表进行排序。我有1000个地址(一些自定义地址数据用空格分隔)。第二件事是我的搜索查询。现在我想获得所有单词标记(没有数字)并按最小距离排序。

e.g。

string query = "123 HAM";
// 1. get only "HAM" token
// 2. count distances
// 3. sort by them
//distance("HAM", "12 HAM DRIVE") -> 0
//distance("HAM", "13 HAM DRIVE") -> 0
//distance("HAM", "14 HAMER DRIVE") -> 2
//distance("HAM", "37 HAMMERSMITH AVENUE") -> 8

如果我的查询标记为HAM,则HAMHAM之间的距离为0,HAMHAMER之间的距离为2(因为{{1}更多2个字母)等等。

我得到'word'代币:

HAMER

现在,对于每个地址,我想计算这些距离并按它们排序。有没有快速的方法呢?我是说,例如使用private static IEnumerable<string> GetLetterTokens(string location) { string[] words = location.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries); return words.Where(word => Regex.IsMatch(word.Trim(), @"^[a-zA-Z]+$")); }

建议:)

2 个答案:

答案 0 :(得分:1)

  

我认为你可以使用Levenshtein Distance - L.B

var result = addresses.OrderBy(a => 
         string.Join(" ", GetLetterTokens(a))
       , new LevenshteinDistance());

public class LevenshteinDistance : IComparer<String>
{
    /// <summary>
    /// Compute the distance between two strings.
    /// </summary>
    public int Compare(string s, string t)
    {
    int n = s.Length;
    int m = t.Length;
    int[,] d = new int[n + 1, m + 1];

    // Step 1
    if (n == 0)
    {
        return m;
    }

    if (m == 0)
    {
        return n;
    }

    // Step 2
    for (int i = 0; i <= n; d[i, 0] = i++)
    {
    }

    for (int j = 0; j <= m; d[0, j] = j++)
    {
    }

    // Step 3
    for (int i = 1; i <= n; i++)
    {
        //Step 4
        for (int j = 1; j <= m; j++)
        {
        // Step 5
        int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;

        // Step 6
        d[i, j] = Math.Min(
            Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
            d[i - 1, j - 1] + cost);
        }
    }
    // Step 7
    return d[n, m];
    }
}

答案 1 :(得分:1)

我认为这就是你要找的东西:

    string token = "HAM";
    List<string> addresses = new List<string>
    {
        "12 HAM DRIVE",
        "13 HAM DRIVE",
        "14 HAMER DRIVE",
        "37 HAMMERSMITH AVENUE",
        "15 HAM HAMER DRIVE",
    };

    var result = from a in addresses
                 let tokens = GetLetterTokens(a)
                 let distances = from t in tokens
                                 where t.Contains(token)
                                 select t.Length - token.Length
                 where distances.Any()
                 let distance = distances.Min()
                 orderby distance
                 select new
                 {
                     Address = a,
                     Distance = distance,
                 };

如果您只想要以令牌开头的令牌,那么您需要使用StartsWith代替Contains