我有以下代码可以正常工作:
string[] userSelect = new string[] {"the", "sled", "had", "not", "moved", ";", "the", "driver", "was", "surprised", "."};
string[] original = new string[] {"the", "driver", "was", "surprised", ",", "too", ";", "the", "sled", "had", "not", "moved", "."};
var matches =
(from l in userSelect.Select((s, i) => new { s, i })
join r in original.Select((s, i) => new { s, i })
on l.s equals r.s
group l by r.i - l.i into g
from m in g.Select((l, j) => new { l.i, j = l.i - j, k = g.Key })
group m by new { m.j, m.k } into h
select h.Select(t => t.i).ToArray())
.ToArray();
// remove filter overlaps
int take = 0;
var filtered = matches.Where(m => !matches.Take(take++)
.Any(n => m.All(i => n.Contains(i))))
.ToArray();
使用上面我得到的结果:
{{0,1,2,3,4}, {6,7,8,9}, {5,6}, {10}}
注意6的重叠。因为{“the”,“driver”,“was”,“surprise”}和{“;”,“the”}都在原始句子中。
对于像这样的情况,我需要二次过滤器。它应该找到所有值的重叠,并将它们提取到独立的数组,这样就没有索引值重叠。输出应将重叠分开,如下所示:
{{0,1,2,3,4}, {7,8,9}, {10}, {6}, {5}}