我想做的是创建一个应用程序,该应用程序从字符串列表中生成Solr查询。我有一长串主机名,需要尽可能缩小主机名。这是我需要完成的示例。
给出以下数据集:
AAABBBCCC-1234
AAABBBCBC-1334
AAABBCCBC-1324
QEUVWISKPWW1114
QEUSPISGPWW2114
QEUSPISTPWW1614
输出应如下所示:
AAABB?C?C-1??4
QEU??IS?PWW???4
首先,我尝试使用.GroupBy(item.SubString(0,5)
,但问题是存在太多差异以致于无法准确显示。现在,我正在尝试提出一种在列表中循环查找的方法,以找到数量最多的连续字符并将它们组合在一起。至少这将是一个很好的起点,然后从组中的位置开始,查找字符串的每个索引与集合中所有其他项目匹配的位置,并用?
代替不匹配的位置。
答案 0 :(得分:0)
我设法在这里找到解决问题的方法,但是如果有人有更好的解决方法,我会很乐意更改标记的答案。
//Variables class I created elsewhere
Vars.hostnameInput =
@"AAABBBCCC-1234
AAABBBCBC-1334
AAABBCCBC-1324
QEUVWISKPWW1114
QEUSPISGPWW2114
QEUSPISTPWW1614";
//Split the string into a list
var hostnameList = Vars.hostnameInput.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
//Create groups where the first three characters match
var groups = from item in hostnameList
group item by item.Substring(0, 3)
into g
select g;
//Iterate through each group
foreach (var _group in groups)
{
var wildcard = "";
//Order the list so that the longest string in the group is at the top
var hostnames = _group.OrderByDescending(t => t.Length).ToList();
bool charMatch = false;
//Split longest string in the group into a Char array to compare to the rest in the group
var hostnameChars = hostnames[0].ToCharArray();
for (var i = 0; i < hostnameChars.Length; i++)
{
foreach (var hostname in hostnames)
{
try
{
//Check the character in each string at the same index
if (hostnameChars[i] == hostname[i])
{
charMatch = true;
}
else
{
charMatch = false;
break;
}
}
//If the current string is shorter, the extra characters should result in a '?'
catch (IndexOutOfRangeException)
{
charMatch = false;
break;
}
}
//If all characters at index i match, leave it in, if not, replace with '?'
if (charMatch)
{
wildcard += hostnameChars[i];
}
else
{
wildcard += "?";
}
}
//Add new wildcard terms to output
Vars.solrScript += $"{wildcard}\r\n";
foreach (var hostname in _group)
{
Vars.solrScript += $@" {hostname}{Environment.NewLine}";
}
}
输出:
AAABB?C?C-1??4
AAABBBCCC-1234
AAABBBCBC-1334
AAABBCCBC-1324
QEU??IS?PWW??14
QEUVWISKPWW1114
QEUSPISGPWW2114
QEUSPISTPWW1614