对包含索引的多个行进行分组,并为每个索引创建自定义对象列表

时间:2018-03-06 11:51:07

标签: c# string linq parsing

我有一个字符串列表(从文件中读取)按此​​顺序和格式,需要转换为类的列表。

1.0.1.0.1, Type: DateTime, Value: 06/03/2013 11:06:10
1.0.1.0.2, Type: DateTime, Value: 06/03/2014 11:06:10
1.0.1.0.3, Type: DateTime, Value: 06/03/2015 11:06:10
1.0.1.0.4, Type: DateTime, Value: 06/03/2016 11:06:10
1.0.1.0.5, Type: DateTime, Value: 06/03/2017 11:06:10
1.0.1.1.1, Type: Integer, Value: 1
1.0.1.1.2, Type: Integer, Value: 2
1.0.1.1.3, Type: Integer, Value: 3
1.0.0.1.4, Type: Integer, Value: 4
1.0.1.1.5, Type: Integer, Value: 5
1.0.1.2.1, Type: String, Value: Hello
1.0.1.2.2, Type: String, Value: Hello1
1.0.1.2.3, Type: String, Value: Hello2
1.0.1.2.4, Type: String, Value: Hello3
1.0.1.2.5, Type: String, Value: Hello4

这是我的班级

public class MyData
{
    public DateTime DateTime {get;set;}
    public int Index {get;set;}
    public string Value {get;set;}
}

现在我想要的是将它转换为C#类列表

像这样......

List<MyData> myDataList  = new List<MyData>();

MyData data1 = new MyData();
data1.DateTime = "06/03/2013 11:06:10";
data1.Index = 1;
data1.Value = "Hello";
myDataList.Add(data1);

MyData data2 = new MyData();
data2.DateTime = "06/03/2014 11:06:10";
data2.Index = 2;
data2.Value = "Hello1";
myDataList.Add(data2);

and so on..

这是我到目前为止所尝试的。

 List<List<string>> allLists = lines
        .Select(str => new { str, token = str.Split('.') })
        .Where(x => x.token.Length >= 4)
        .GroupBy(x => string.Concat(x.token.Take(4)))
        .Select(g => g.Select(x => x.str).ToList())
        .ToList();

我真的需要迭代还是可以修改My LINQ以获得所需的输出? 这是我的迭代。

    foreach (var list in allLists)
    {
          MyData data = new MyData();
          var splittedstring = list[0].Split(',').ToList();
          if (splittedstring.Count == 3)
          {
               var valueData = splittedstring [2];
               var indexof = valueData.IndexOf(':');
               var value = valueData.Substring(indexof + 1);
               // But Over here, how will get DateTime and Index ?
               data.Value = value;
          }
    }

5 个答案:

答案 0 :(得分:1)

这是我的解决方案,使用正则表达式。它可以通过提供基于匹配类型命名组(字符串)的条件正则表达式匹配来改进,但我认为这个概念更清晰,并且正则表达式更容易使用。按照目前的情况,日期格式不会像OP写的那样被验证,它们是假设,就像OP写的一样。

此解决方案可以容忍一些额外的空格和包含逗号的参数,但不容忍不精确的匹配,即将来在行中添加或删除的额外字段等。

我们的想法是首先将行解析为更“友好”的格式,然后按索引对友好格式进行分组,并通过迭代每个组(按索引)返回MyData行。

Sample

Regex r = new Regex(@"^(?<fieldName>(\d\.)+(?<index>\d*)), *Type: *(?<dataType>.*), *Value: (?<dataValue>.*)$");

public class MyData
{
    public DateTime DateTime { get; set; }
    public int Index { get; set; }
    public string Value { get; set; }
}

class LogRow
{
    public int Index { get; set; }
    public string Type { get; set; }
    public string Value { get; set; }
}

//In a parser I would rather not be too defensive, I let exceptions bubble up
IEnumerable<LogRow> ParseRows(IEnumerable<string> lines)
{
    foreach (var line in lines)
    {
       var match = r.Matches(line).AsQueryable().Cast<Match>().Single();
       yield return new LogRow()
       {
          Index = int.Parse(match.Groups["index"].Value),
          Type = match.Groups["dataType"].Value,
              Value = match.Groups["dataValue"].Value
       };
   }
}

IEnumerable<MyData> RowsToData(IEnumerable<LogRow> rows)
{
   var byIndex = rows.GroupBy(b => b.Index).OrderBy(b=> b.Key);
   //assume that rows exist for all MyData fields for a given index
   foreach (var group in byIndex)
   {
      var rawRow = group.ToDictionary(g => g.Type, g => g);
      var date = DateTime.ParseExact(rawRow["DateTime"].Value, "dd/MM/yyyy HH:mm:ss", CultureInfo.InvariantCulture);

      yield return new MyData() { Index = group.Key, DateTime = date, Value = rawRow["String"].Value };
  }
}

用法:

var myDataList = RowsToData(ParseRows(File.ReadAllLines("input.txt"))).ToList();

答案 1 :(得分:1)

首先,修复您的GroupBystring.Concat(x.token.Take(4))可能会在点分隔数字不明确时产生不确定性。例如,1.23.4.512.3.4.5都会生成"12345"字符串。请使用string.Join代替非数字分隔符:

.GroupBy(x => string.Join("|", x.token.Take(4)))

现在,对于问题的主要部分,一个简单的解决方法是添加一个静态方法来解析三个字符串的列表,并在LINQ查询中使用它:

List<MyData> dataList = lines
    .Select(str => new { str, token = str.Split('.') })
    .Where(x => x.token.Length >= 4)
    .GroupBy(x => string.Concat(x.token.Take(4)))
    .Select(g => g.Select(x => x.str).ToList())
    .Where(list => list.Count == 3)
    .Select(MyDataFromList)
    .ToList();
...
private static MyData MyDataFromList(List<string> parts) {
    if (parts.Count != 3) {
        throw new ArgumentException(nameof(parts));
    }
    var byType = parts
        .Select(ToTypeAndValue)
        .ToDictionary(t => t.Item1, t => t.Item2)
    return new MyData {
        DateTime = DateTime.Parse(byType["DateTime"])
    ,   Index = int.Parse(byType["Integer"])
    ,   Value = byType["String"]
    };
}
private static Tuple<string,string> ToTypeAndValue(string s) {
    var tokens = s.Split(',');
    if (tokens.Length != 3) return null;
    var typeParts = tokens[1].Split(':');
    if (typeParts.Length != 2 || typeParts[0] != "Type") return null;
    var valueParts = tokens[2].Split(':');
    if (valueParts.Length != 2 || valueParts[0] != "Value") return null;
    return Tuple.Create(typeParts[1].Trim(), typeParts[2].Trim());
}

请注意,上面的代码假设这三种类型是唯一的(因此使用Dictionary<string,string>)。这是必需的,因为数据结构没有提供将值绑定到MyData字段的其他方法。

答案 2 :(得分:1)

您可以使用正则表达式执行此操作。它看起来像是:

public List<MyData> GetData(string str){
    var regexDate = new Regex(@"\d\.\d\.\d\.\d\.(?<id>\d).*DateTime.*Value:\s*(?<val>.*)");
    var regexInteger = new Regex(@"\d\.\d\.\d\.\d\.(?<id>\d).*Integer.*Value:\s*(?<val>.*)");
    var regexString = new Regex(@"\d\.\d\.\d\.\d\.(?<id>\d).*String.*Value:\s*(?<val>.*)");

    var dict = new Dictionary<int, MyData>();

    foreach (Match myMatch in regexDate.Matches(str))
    {
        if (!myMatch.Success) continue;

        var index = int.Parse(myMatch.Groups["id"].Value);
        dict[index] = new MyData()
        {
            Index = index,
            DateTime = DateTime.ParseExact(myMatch.Groups["val"].Value, "dd/MM/yyyy HH:mm:ss", CultureInfo.InvariantCulture)
        };
    }

    foreach (Match myMatch in regexInteger.Matches(str))
    {
        if (!myMatch.Success) continue;

        var index = int.Parse(myMatch.Groups["id"].Value);
        dict[index].Index = Int32.Parse(myMatch.Groups["val"].Value);
    }

    foreach (Match myMatch in regexString.Matches(str))
    {
        if (!myMatch.Success) continue;

        var index = int.Parse(myMatch.Groups["id"].Value);
        dict[index].Value = myMatch.Groups["val"].Value;
    }

    return dict.Values
}

答案 3 :(得分:1)

我只是采用手动方法...并且因为开始时的整数列表包含对象和属性的索引,所以使用这些而不是类型字符串是合乎逻辑的。 / p>

使用Dictionary,您可以使用该对象索引在找到任何属性时创建新对象,并使用该索引存储它。每当遇到同一索引的其他属性时,都会检索该对象并在其上填写该属性。

public static List<MyData> getObj(String[] lines)
{
    Dictionary<Int32, MyData> myDataDict = new Dictionary<Int32, MyData>();
    const String valueStart = "Value: ";
    foreach (String line in lines)
    {
        String[] split = line.Split(',');
        // Too many fail cases; I just ignore any line that stops matching at any point.
        if (split.Length < 3)
            continue;
        String[] numData = split[0].Trim().Split('.');
        if (numData.Length < 5)
           continue;
        // Using the 4th number as property identifier. Could also use the
        // type string, but switch/case on a numeric value is more elegant.
        Int32 prop;
        if (!Int32.TryParse(numData[3], out prop))
           continue;
        // Object index, used to reference the objects in the Dictionary.
        Int32 index;
        if (!Int32.TryParse(numData[4], out index))
           continue;
        String typeDef = split[1].Trim();
        String val = split[2].TrimStart();
        if (!val.StartsWith(valueStart))
           continue;
        val = val.Substring(valueStart.Length);
        MyData data;
        if (myDataDict.ContainsKey(index))
            data = myDataDict[index];
        else
        {
            data = new MyData();
            myDataDict.Add(index, data);
        }
        switch (prop)
        {
            case 0:
                if (!"Type: DateTime".Equals(typeDef))
                    continue;
                DateTime dateVal;
                // Don't know if this date format is correct; adapt as needed.
                if (!DateTime.TryParseExact(val, "dd/MM/yyyy HH:mm:ss", System.Globalization.CultureInfo.InvariantCulture, System.Globalization.DateTimeStyles.None, out dateVal))
                    continue;
                data.DateTime = dateVal;
                break;
            case 1:
                if (!"Type: Integer".Equals(typeDef))
                    continue;
                Int32 numVal;
                if (!Int32.TryParse(val, out numVal))
                    continue;
                data.Index = numVal;
                break;
            case 2:
                if (!"Type: String".Equals(typeDef)) continue;
                data.Value = val;
                break;
        }
    }
    return new List<MyData>(myDataDict.Values);
}

答案 4 :(得分:1)

这是我解决您问题的方法。我已经测试了它,你可以在这里测试它:Raw To Custom List

string text = rawData;

//Raw Data Is the exact data you read from textfile without modifications.
List<MyData> myDataList  = new List<MyData>();

string[] eElco = text.Split( new[] { Environment.NewLine }, StringSplitOptions.None );
var tmem = eElco.Count();
var eachP = tmem / 3;

List<string> unDefVal = new List<string>();
foreach (string rw in eElco)
{
    String onlyVal = rw.Split(new[] { "Value: " } , StringSplitOptions.None)[1];
    unDefVal.Add(onlyVal);
}

for (int i = 0; i < eachP; i++)
{
    int ind = Int32.Parse(unDefVal[i + eachP]);
    DateTime oDate = DateTime.ParseExact(unDefVal[i], "dd/MM/yyyy hh:mm:ss",System.Globalization.CultureInfo.InvariantCulture);

    MyData data1 = new MyData();
    data1.DateTime = oDate;
    data1.Index = ind;
    data1.Value = unDefVal[i + eachP + eachP];
    myDataList.Add(data1);

    Console.WriteLine("Val1 = {0}, Val2 = {1}, Val3 = {2}",
    myDataList[i].Index,
    myDataList[i].DateTime,
    myDataList[i].Value);    
}