我想对字符串列表进行排序。我有1000个地址(一些自定义地址数据用空格分隔)。第二件事是我的搜索查询。现在我想获得所有单词标记(没有数字)并按最小距离排序。
e.g。
string query = "123 HAM";
// 1. get only "HAM" token
// 2. count distances
// 3. sort by them
//distance("HAM", "12 HAM DRIVE") -> 0
//distance("HAM", "13 HAM DRIVE") -> 0
//distance("HAM", "14 HAMER DRIVE") -> 2
//distance("HAM", "37 HAMMERSMITH AVENUE") -> 8
如果我的查询标记为HAM
,则HAM
与HAM
之间的距离为0,HAM
与HAMER
之间的距离为2(因为{{1}更多2个字母)等等。
我得到'word'代币:
HAMER
现在,对于每个地址,我想计算这些距离并按它们排序。有没有快速的方法呢?我是说,例如使用private static IEnumerable<string> GetLetterTokens(string location)
{
string[] words = location.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries);
return words.Where(word => Regex.IsMatch(word.Trim(), @"^[a-zA-Z]+$"));
}
。
建议:)
答案 0 :(得分:1)
我认为你可以使用Levenshtein Distance - L.B
var result = addresses.OrderBy(a =>
string.Join(" ", GetLetterTokens(a))
, new LevenshteinDistance());
public class LevenshteinDistance : IComparer<String>
{
/// <summary>
/// Compute the distance between two strings.
/// </summary>
public int Compare(string s, string t)
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
// Step 1
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
// Step 3
for (int i = 1; i <= n; i++)
{
//Step 4
for (int j = 1; j <= m; j++)
{
// Step 5
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
// Step 6
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
}
答案 1 :(得分:1)
我认为这就是你要找的东西:
string token = "HAM";
List<string> addresses = new List<string>
{
"12 HAM DRIVE",
"13 HAM DRIVE",
"14 HAMER DRIVE",
"37 HAMMERSMITH AVENUE",
"15 HAM HAMER DRIVE",
};
var result = from a in addresses
let tokens = GetLetterTokens(a)
let distances = from t in tokens
where t.Contains(token)
select t.Length - token.Length
where distances.Any()
let distance = distances.Min()
orderby distance
select new
{
Address = a,
Distance = distance,
};
如果您只想要以令牌开头的令牌,那么您需要使用StartsWith
代替Contains
。