从文件读取时正则表达式分裂

时间:2015-08-04 03:32:06

标签: c# asp.net regex string

我有一个文本文件,我正在逐行阅读。

我想用','分隔一行。

但我希望跳过引号“”内的逗号。

我试过跟随正则表达式,但它无法正常工作。

怎么做。

文件内容为

"Mobile","Custom1","Custom2","Custom3","First Name"
"61402818083","service","in Portsmith","is","First Name"
"61402818083","service","in Parramatta Park","is","First Name"
"61402818083","services","in postcodes 3000, 4000","are","First Name"
"61402818083","services","in postcodes 3000, 4000, 5000","are","First Name"
"61402818083","services",,"are","First Name"

正则表达式如下

,(?=([^\"]*\"[^\"]*\")*[^\"]*$)

此正则表达式为第5行输出以下内容

"61402818083"
,"First Name"
"services"
,"First Name"
"in postcodes 3000, 4000, 5000"
,"First Name"
"are"
"First Name"
"First Name"

结果应如下

"61402818083"
"services"
"in postcodes 3000, 4000, 5000"
"are"
"First Name"

4 个答案:

答案 0 :(得分:5)

不要重新发明轮子。似乎您正在尝试解析逗号分隔文件(即使文件扩展名与csv不同)。试试这个。

using (TextFieldParser reader = new TextFieldParser(@"c:\yourpath\file.csv"))
{
    reader.TextFieldType = FieldType.Delimited;
    reader.SetDelimiters(",");
    while (!reader.EndOfData) 
    {
        //Processing a line of the file
        string[] fields = reader.ReadFields();
        // now fields contains 5 elements, e.g.
        // fields[0] = "61402818083"
        // fields[1] = "services"
        // fields[2] = "in postcodes 3000, 4000, 5000"
        // fields[3] = "are"
        // fields[4] = "First Name"
    }
}

注意

需要在项目中添加Microsoft.VisualBasic作为参考

答案 1 :(得分:3)

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        string line = "\"61402818083\",\"services\",\"in postcodes 3000, 4000\",\"are\",\"First Name\"";
        var reg = new Regex("\".*?\"");
        var matches = reg.Matches(line);
        foreach (var item in matches)
        {
            Console.WriteLine(item.ToString());
        }
    }
}

<强>输出:

"61402818083"
"services"
"in postcodes 3000, 4000"
"are"
"First Name"

https://dotnetfiddle.net/5GxxIo

另一种可能的解决方案:

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        string line = "\"61402818083\",\"services\",\"in postcodes 3000, 4000\",\"are\",\"First Name\"";
        Console.WriteLine(line.ToString());
        var reg = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
        var matches = reg.Matches(line);
        foreach (Match match in reg.Matches(line))
        {
            Console.WriteLine(match.Value.TrimStart(','));
        }
    }
}

https://dotnetfiddle.net/rRml2D

答案 2 :(得分:1)

我认为你可以通过逐个加入字符串来做到这一点。

示例(未测试)

using System.IO;
using System.Text;

int counter = 0;
string line = String.Empty;

StringBuilder newString = new StringBuilder();

StreamReader file = new StreamReader("c:\\test.txt");

while((line = file.ReadLine()) != null)
{
    newString.Append(line + ",");
}

file.Close();

newString.ToString().TrimEnd(',');

答案 3 :(得分:1)

,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)

     ^^

你的正则表达式是正确的。它有一个不必要的capturing group,结果证明是邪恶的。参见演示。

https://regex101.com/r/fM9lY3/10