Question

您好我正在使用包含字符串的List<T>结果 - 为了简化它，让我使用这样的词，但方案是相同的

01:01 A car consists of : wheels, engine, seats, 2 screws, a cotton lamp
01:02 A bike consists of : wheels
01:03 A car consists of : wheels, engine, seats, speakers, 5 screws, an indicator light
01:04 A small truck consists of : wheels, engine, seats, bed

因此伪匹配器和所需的输出将是。

00-99:0-99(space)A|An(space){get the car/bike or any other as object}(space)consists(space)of(space):{get the elements in here exploding the commas as attributes}

现在我在foreach循环中使用，它通过我的列表然后将行写入文本框。

Foreach(Message _msg in _objects.Messages){
    richTextBox1.AppendText(_msg.Text);
}

伪显示器，将整个句子添加到我的文本框中。

Foreach(Message _msg in _objects.Messages){
    richTextBox1.AppendText(parsefunction(_msg.Text));
}

parse function
{ 
    count(the elements exploaded , and list them)
    remove the unwanted parts of text
}

提取对象和属性后，我想根据它们是否包含计数来对它们求和，并从中删除a /。这部分是我被困住的地方。

所需的输出是 - 对任何重复项和出现的数量求和

2x Car
4x Wheels
3x Engine
3x Seats
7x Screws
1x Cotton Lamp
1x Bike
1x Speakers
1x Indicator Light
1x Small Truck
1x Bed

你能指点我至少Regex，也许我会自己计算其余部分，并在完成后分享。我认为它必须是一个将在循环中调用的函数。

Answer 1

这是我想出的（我确信它可以改进）：

public static List<KeyValuePair<string, string[]>> ParseData(List<string> data)
{
    Regex regex = new Regex(@"^[\d]{2}:[\d]{2} A[n]? ([a-zA-Z\s]+) consists of : ([a-zA-Z,\s0-9]+)$");
    var elementMap = new List<KeyValuePair<string, string[]>>();

    for (int i = 0; i < data.Count; i++)
    {
        var match = regex.Match(data[i]);
        var attributes = match.Groups[2].Value.Split(new string[] { ", " }, StringSplitOptions.RemoveEmptyEntries);

        if (match.Success && match.Groups[1].Value.Length > 0)
            elementMap.Add(new KeyValuePair<string, string[]>(match.Groups[1].Value, attributes));
    }

    return elementMap;
}

public static Dictionary<string, int> GetIndexedData(List<KeyValuePair<string, string[]>> data)
{
    Dictionary<string, int> displayObjects = new Dictionary<string, int>();

    foreach (KeyValuePair<string, string[]> item in data)
    {
        if (displayObjects.ContainsKey(item.Key))
            displayObjects[item.Key]++;
        else
            displayObjects.Add(item.Key, 1);

        foreach (string key2 in item.Value)
        {
            string[] attributeValues = key2.Split(' ');
            int add = 1;
            string addValue = key2;
            int c = 0;

            if (attributeValues.Length > 1 && int.TryParse(attributeValues[0], out c))
            {
                add = c;
                addValue = attributeValues[1];
            }

            if (addValue.Substring(0, 2) == "a ")
                addValue = addValue.Substring(2);
            else if (addValue.Substring(0, 3) == "an ")
                addValue = addValue.Substring(3);

            if (displayObjects.ContainsKey(addValue))
                displayObjects[addValue] += add;
            else
                displayObjects.Add(addValue, add);
        }
    }

    return displayObjects;
}

使用：

List<string> data = new List<string>();
data.Add("01:01 A car consists of : wheels, engine, seats, 2 screws, a cotton lamp");
data.Add("01:02 A bike consists of : wheels");
data.Add("01:03 A car consists of : wheels, engine, seats, speakers, 5 screws, an indicator light");
data.Add("01:04 A small truck consists of : wheels, engine, seats, bed");
var elementMap = ParseData(data);

var displayObjects = GetIndexedData(elementMap);

foreach (string key in displayObjects.Keys)
{
    Console.WriteLine(key + ": " + displayObjects[key]);
}

基本上;此Regex模式（^[\d]{2}:[\d]{2} A[n]? ([a-zA-Z\s]+) consists of : ([a-zA-Z,\s0-9]+)$）将匹配您指示的任何构建完全的内容。你所要做的就是：

var match = regex.Match(data[i]);
// 'match.Groups[1].Value' is the name of the item
// 'match.Groups[2].Value' is the comma-separated list

// The following line will split all the attributes on ', ' therefore leaving them as just the words. (`wheels`, `engine`, `seats`)
var attributes = match.Groups[2].Value.Split(new string[] { ", " }, StringSplitOptions.RemoveEmptyEntries);

使用所有这些信息做你想做的事。

这做出以下假设：

数据将始终包含两个数字（[\d]{2}），冒号（:）和另外两个数字（[\d]{2}），一个空格（），a（A）和可选的n（[n]?）（对于A或An）和另一个空格（）;所有这一切都在行的最开始（^）
object（([a-zA-Z\s]+)的名称可以包含：
1. 信件（a-z，A-Z）
2. 空格（\s）
3. 至少有一个这样的角色，并且尽可能多
接下来的单词将为空格（），consists of，空格（）和冒号（: ）。
attributes（([a-zA-Z,\s0-9]+)）的字词可以包含：
1. 信件（a-z，A-Z）
2. 逗号（,）
3. 空格（\s）
4. 数字（0-9）
5. 至少有一个这样的角色，并且尽可能多
这些属性将在字符串的末尾（$）

最后，假设attributes不是null或nothing - attributes中有至少一个字符。< / p>

此外，此处还有否错误检查。你应该根据需要添加它。

使用正则表达式提取对象及其属性

1 个答案: