SQL按百分比匹配字符串

时间:2020-10-22 15:19:12

标签: sql database entity-framework linq linq-to-entities

我有一个带有2列的SQL表-ID,Value 例如

.as-console-wrapper {
  min-height: 100%!important;
  width: 50%;
  top: 0;
  left: auto!important;
  bottom: auto!important;
}


h2, h3, p, ul, button {
  font-size: .85em;
  margin: 1px 0;
}
h2 {
  font-size: .7em;
}
li {
  margin-bottom: 3px;
}

我需要构建一个查询来搜索我的字符串是否在表的“值”列中找到,或者如果我通过“下划线”将字符串打断/分割成单词,则它至少匹配70% 例如,下面的字符串与我的表ID = 1匹配,因为90%的“单词”(按“下划线”分割后)相同,高于70%

ID   VALUE
1    008_ADL_S81_PCIE_L2_B2B_Cycling_Failure_Phystatus_PP 
2    008_ADL_S81_ABC
3    008_ADL_DEF 
4    008_ADL_XYZ

我该怎么办?

1 个答案:

答案 0 :(得分:0)

假设您的表足够小,以至于无法在客户端本地加载它,就可以了,并且假设您正在使用LINQ进行数据库并且需要LINQ查询,则可以使用:

var target = "008_ADL_A0_S81_PCIE_L2_B2B_Cycling_Failure_Phystatus_PP";
var targetHS = target.Split('_', StringSplitOptions.RemoveEmptyEntries).ToHashSet();

var matchingIDs = db.AsEnumerable()
                    .Select(s => new { s.ID, words = s.VALUE.Split('_', StringSplitOptions.RemoveEmptyEntries) })
                    .Where(s => s.words.Count(w => targetHS.Contains(w)) / s.words.Length >= 0.70)
                    .Select(s => s.ID)
                    .ToList();

或者,如果表很大,则可以将表过滤到包含至少一个匹配项的行,然后在客户端上处理百分比。

使用一些基于LINQKit的扩展方法来构建查询谓词:

public static class LinqKitExt {
    // string fieldExpr(T row) - function returning multiple value string field to test
    // delimiter - string separator between values in test field
    // value - string value to find in values of test field
    // r => fieldExpr(r).Split(delimiter).Contains(value)
    public static Expression<Func<T, bool>> SplitContains<T>(this Expression<Func<T, string>> fieldExpr, string delimiter, string value) {
        var pred = PredicateBuilder.New<T>(r => fieldExpr.Invoke(r) == value);
        pred = pred.Or(r => fieldExpr.Invoke(r).StartsWith(value + delimiter));
        pred = pred.Or(r => fieldExpr.Invoke(r).EndsWith(delimiter + value));
        pred = pred.Or(r => fieldExpr.Invoke(r).Contains(delimiter + value + delimiter));

        return pred;
    }

    // values - string values, one of which to find in values of test field
    // string fieldExpr(T row) - function returning multiple value string field to test
    // delimiter - string separator between values in test field
    // r => values.Any(value => fieldExpr(r).Split(delimiter).Contains(value))
    public static Expression<Func<T, bool>> AnySplitContains<T>(this IEnumerable<string> values, Expression<Func<T, string>> fieldExpr, string delimiter) {
        var pred = PredicateBuilder.New<T>();
        foreach (var value in values)
            pred = pred.Or(fieldExpr.SplitContains(delimiter, value));

        return pred;
    }

    // values - string values, one of which to find in values of test field
    // string fieldExpr(T row) - function returning multiple value string field to test
    // delimiter - string separator between values in test field
    // dbq.Where(r => values.Any(value => fieldExpr(r).Split(delimiter).Contains(value)))
    public static IQueryable<T> WhereAnySplitContains<T>(this IQueryable<T> dbq, IEnumerable<string> values, Expression<Func<T, string>> fieldExpr, string delimiter) =>
        dbq.AsExpandable().Where(values.AnySplitContains(fieldExpr, delimiter));
}

您可以先过滤行,然后将可能的匹配项拉到客户端并计算百分比:

var matchingIDs = db.WhereAnySplitContains(targetHS, r => r.VALUE, "_")
                    .AsEnumerable()
                    .Select(s => new { s.ID, words = s.VALUE.Split('_', StringSplitOptions.RemoveEmptyEntries) })
                    .Where(s => s.words.Count(w => targetHS.Contains(w)) / s.words.Length >= 0.70)
                    .Select(s => s.ID)
                    .ToList();