如何拆分包含每行文本文件的多个分隔符的字符串?

时间:2015-08-14 07:57:30

标签: c# regex string split

这是我的文件包含的输入:

50|Hallogen|Mercury|M:4;C:40;A:1
90|Oxygen|Mars|M:10;C:20;A:00
5|Hydrogen|Saturn|M:33;C:00;A:3

现在我想分割我的文本文件的每一行并存储在我的类文件中,如:

预期输出

Planets[0]:
{
   Number:50
   name: Hallogen
   object:Mercury
   proportion[0]:
             {
                 Number:4
             },
    proportion[1]:
             {
                 Number:40
             },
proportion[2]:
             {
                 Number:1
             }
}

等........

我的类文件存储所有这些值:

public class Planets
    {
        public int Number { get; set; }  //This field points to first cell of every row.output 50,90,5
        public string name { get; set; } //This field points to Second cell of every row.output Hallogen,Oxygen,Hydrogen
        public string object { get; set; } ////This field points to third cell of every row.output Mercury,Mars,Saturn
        public List<proportion> proportion { get; set; } //This will store all proportions with respect to planet object.
         //for Hallogen it will store 4,40,1.Just store number.ignore M,C,A initials.
         //for oxygen it will store 10,20,00.Just store number.ignore M,C,A initials.
    }

    public class proportion
    {
        public int Number { get; set; } 
    }

这就是我所做的:

 List<Planets> Planets = new List<Planets>();
                        using (StreamReader sr = new StreamReader(args[0]))
                        {
                            String line;
                            while ((line = sr.ReadLine()) != null)
                            {
                                string[] parts = Regex.Split(line, @"(?<=[|;-])");
                                foreach (var item in parts)
                                {
                                     var Obj = new Planets();//Not getting how to store it but not getting proper output in parts
                                }

                               Console.WriteLine(line);
                            }
                        }

5 个答案:

答案 0 :(得分:1)

根据我的理解,维护多个分隔符以具有嵌套结构。

您需要先根据管道分割整个字符串,然后是分号,最后是冒号。

这里分裂的顺序很重要。我不认为你可以通过分割所有3个分隔符来同时拥有所有的令牌。

答案 1 :(得分:1)

尝试使用相同类型数据的代码

var values = new List<string>
{
     "50|Hallogen|Mercury|M:4;C:40;A:1",
     "90|Oxygen|Mars|M:10;C:20;A:00",
     "5|Hydrogen|Saturn|M:33;C:00;A:3"
};
foreach (var value in values)
{
     var pipeSplitted = value.Split('|');
     var firstNumber = pipeSplitted[0];
     var name = pipeSplitted[1];
     var objectName = pipeSplitted[2];
     var semiSpltted = value.Split(';');
     var secondNumber = semiSpltted[0].Split(':')[1];
     var thirdNumber = semiSpltted[1].Split(':')[1];
     var colenSplitted = value.Split(':');
     var lastNumber = colenSplitted[colenSplitted.Length - 1];
}

enter image description here

答案 2 :(得分:1)

如果我理解正确,您的输入就会很好。在这种情况下,你可以使用这样的东西:

string[] parts = Regex.Split(line, @"[|;-]");
var planet =  new Planets(parts);


...

public Planets(string[] parts) {
    int.TryParse(parts[0], this.Number);
    this.name = parts[1];
    this.object = parts[2];
    this.proportion = new List<proportion>();
    Regex PropRegex = new Regex("\d+");
    for(int i = 3; i < parts.Length; i++){
        Match PropMatch = PropRegex.Match(part[i]);
        if(PropMatch.IsMatch){
            this.proportion.Add(int.Parse(PropMatch.Value));
        }
    }

}

答案 3 :(得分:1)

如果您不必更改“行星”中的任何逻辑 - 类,我对您的问题的快速解决方案将如下所示:

List<Planets> Planets = new List<Planets>();
                        using (StreamReader sr = new StreamReader(args[0]))
                        {
                            String line;
                            while ((line = sr.ReadLine()) != null)
                            {
                                Planets planet = new Planets();
                                String[] parts = line.Split('|');
                                planet.Number = Convert.ToInt32(parts[0]);
                                planet.name = parts[1];
                                planet.obj = parts[2];

                                String[] smallerParts = parts[3].Split(';');
                                planet.proportion = new List<proportion>();
                                foreach (var item in smallerParts)
                                {
                                    proportion prop = new proportion();
                                    prop.Number =                                    
                                    Convert.ToInt32(item.Split(':')[1]);
                                    planet.proportion.Add(prop);
                                }
                                Planets.Add(planet);
                            }
                        }

哦,在我忘记它之前,你不应该将你的属性命名为Planets“object”,因为“object”是所有内容的基类的关键字,使用像“obj”,“myObject”,“planetObject”这样的东西不是“对象”你的编译器会告诉你相同的;)

答案 4 :(得分:1)

最直接的解决方案是使用正则表达式,其中每个(子)字段在组内匹配

var subjectString = @"50|Hallogen|Mercury|M:4;C:40;A:1
90|Oxygen|Mars|M:10;C:20;A:00
5|Hydrogen|Saturn|M:33;C:00;A:3";

    Regex regexObj = new Regex(@"^(.*?)\|(.*?)\|(.*?)\|M:(.*?);C:(.*?);A:(.*?)$", RegexOptions.Multiline);
    Match match = regexObj.Match(subjectString);
    while (match.Success) {

        match.Groups[1].Value.Dump();
        match.Groups[2].Value.Dump();
        match.Groups[3].Value.Dump();
        match.Groups[4].Value.Dump();
        match.Groups[5].Value.Dump();
        match.Groups[6].Value.Dump();

        match = match.NextMatch();
    }