从类似字符串列表生成通配符术语

时间:2018-07-18 20:14:30

标签: c# .net

我想做的是创建一个应用程序,该应用程序从字符串列表中生成Solr查询。我有一长串主机名,需要尽可能缩小主机名。这是我需要完成的示例。

给出以下数据集:

AAABBBCCC-1234
AAABBBCBC-1334
AAABBCCBC-1324
QEUVWISKPWW1114
QEUSPISGPWW2114
QEUSPISTPWW1614

输出应如下所示:

AAABB?C?C-1??4
QEU??IS?PWW???4

首先,我尝试使用.GroupBy(item.SubString(0,5),但问题是存在太多差异以致于无法准确显示。现在,我正在尝试提出一种在列表中循环查找的方法,以找到数量最多的连续字符并将它们组合在一起。至少这将是一个很好的起点,然后从组中的位置开始,查找字符串的每个索引与集合中所有其他项目匹配的位置,并用?代替不匹配的位置。

1 个答案:

答案 0 :(得分:0)

我设法在这里找到解决问题的方法,但是如果有人有更好的解决方法,我会很乐意更改标记的答案。

        //Variables class I created elsewhere
        Vars.hostnameInput = 
@"AAABBBCCC-1234
AAABBBCBC-1334
AAABBCCBC-1324
QEUVWISKPWW1114
QEUSPISGPWW2114
QEUSPISTPWW1614";

        //Split the string into a list
        var hostnameList = Vars.hostnameInput.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
        //Create groups where the first three characters match
        var groups = from item in hostnameList
            group item by item.Substring(0, 3)
            into g
            select g;

        //Iterate through each group
        foreach (var _group in groups)
        {
            var wildcard = "";
            //Order the list so that the longest string in the group is at the top
            var hostnames = _group.OrderByDescending(t => t.Length).ToList();
            bool charMatch = false;
            //Split longest string in the group into a Char array to compare to the rest in the group
            var hostnameChars = hostnames[0].ToCharArray();
            for (var i = 0; i < hostnameChars.Length; i++)
            {
                foreach (var hostname in hostnames)
                {
                    try
                    {
                        //Check the character in each string at the same index
                        if (hostnameChars[i] == hostname[i])
                        {
                            charMatch = true;
                        }
                        else
                        {
                            charMatch = false;
                            break;
                        }
                    }
                    //If the current string is shorter, the extra characters should result in a '?'
                    catch (IndexOutOfRangeException)
                    {
                        charMatch = false;
                        break;
                    }
                }
                //If all characters at index i match, leave it in, if not, replace with '?'
                if (charMatch)
                {
                    wildcard += hostnameChars[i];
                }
                else
                {
                    wildcard += "?";
                }
            }
            //Add new wildcard terms to output
            Vars.solrScript += $"{wildcard}\r\n";
            foreach (var hostname in _group)
            {
                Vars.solrScript += $@"  {hostname}{Environment.NewLine}";
            }

        }

输出:

AAABB?C?C-1??4
                AAABBBCCC-1234
                AAABBBCBC-1334
                AAABBCCBC-1324
QEU??IS?PWW??14
                QEUVWISKPWW1114
                QEUSPISGPWW2114
                QEUSPISTPWW1614