解析文本文件并删除双引号内的逗号

时间:2012-03-27 12:06:20

标签: c# vb.net-2010

我有一个需要转换为csv文件的文本文件。 我的计划是:

  • 逐行解析文件
  • 使用空格
  • 搜索并替换双引号内的逗号
  • 然后删除所有双引号
  • 将该行附加到新的csv文件

问题: 我需要一个能识别双引号内的逗号并替换它的函数。

以下是一个示例行:

“MRS Brown”,“4611 BEAUMONT ST”,“”,“WARRIOR RUN,PA”

8 个答案:

答案 0 :(得分:4)

您的文件似乎已经采用CSV投诉格式。任何好的CSV阅读器都能正确读取它。

如果您的问题只是正确读取字段值,那么您需要以正确的方式阅读它。

这是一种方法:

using Microsoft.VisualBasic.FileIO; 


    private void button1_Click(object sender, EventArgs e)
    {
        TextFieldParser tfp = new TextFieldParser("C:\\Temp\\Test.csv");
        tfp.Delimiters = new string[] { "," };
        tfp.HasFieldsEnclosedInQuotes = true;
        while (!tfp.EndOfData)
        {
            string[] fields = tfp.ReadFields();

            // do whatever you want to do with the fields now...
            // e.g. remove the commas and double-quotes from the fields.
            for (int i = 0; i < fields.Length;i++ )
            {
                fields[i] = fields[i].Replace(","," ").Replace("\"","");
            }

            // this is to show what we got as the output
            textBox1.AppendText(String.Join("\t", fields) + "\n");
        }
        tfp.Close();
    }

修改

我刚注意到这个问题是在C#,VB.NET-2010下提交的。 这是VB.NET版本,以防你在VB中编码。

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim tfp As New FileIO.TextFieldParser("C:\Temp\Test.csv")
    tfp.Delimiters = New String() {","}
    tfp.HasFieldsEnclosedInQuotes = True
    While Not tfp.EndOfData
        Dim fields() As String = tfp.ReadFields

        '' do whatever you want to do with the fields now...
        '' e.g. remove the commas and double-quotes from the fields.
        For i As Integer = 0 To fields.Length - 1
            fields(i) = fields(i).Replace(",", " ").Replace("""", "")
        Next
        '' this is to show what we got as the output
        TextBox1.AppendText(Join(fields, vbTab) & vbCrLf)
    End While
    tfp.Close()
End Sub

答案 1 :(得分:2)

这是一个简单的函数,它将删除嵌入在字符串中两个双引号之间的逗号。您可以传入一个长字符串,该字符串多次出现&#34; abc,123&#34;,10/13/12,&#34;某些描述&#34; ...等。它也会删除双引号。

Private Function ParseCommasInQuotes(ByVal arg As String) As String

    Dim foundEndQuote As Boolean = False
    Dim foundStartQuote As Boolean = False
    Dim output As New StringBuilder()

    '44 = comma
    '34 = double quote

    For Each element As Char In arg

        If foundEndQuote Then
            foundStartQuote = False
            foundEndQuote = False
        End If

        If element.Equals(Chr(34)) And (Not foundEndQuote) And foundStartQuote Then
            foundEndQuote = True
            Continue For
        End If


        If element.Equals(Chr(34)) And Not foundStartQuote Then
            foundStartQuote = True
            Continue For
        End If


        If (element.Equals(Chr(44)) And foundStartQuote) Then
            'skip the comma...its between double quotes
        Else
            output.Append(element)
        End If

    Next

    Return output.ToString()

End Function

答案 2 :(得分:2)

感谢Baz,VB中的Glockster答案,我只是用C#转换它,它的效果很好。使用此代码,您不需要任何第三方解析器。

string line = reader.ReadLine();                    
line = ParseCommasInQuotes(line);

private string ParseCommasInQuotes(string arg)
{

  bool foundEndQuote = false;
  bool foundStartQuote = false;
  StringBuilder output = new StringBuilder();

  //44 = comma
  //34 = double quote

  foreach (char element in arg)
  {
    if (foundEndQuote)
    {
      foundStartQuote = false;
      foundEndQuote = false;
    }

    if (element.Equals((Char)34) & (!foundEndQuote) & foundStartQuote)
    {
      foundEndQuote = true;
      continue;
    }

    if (element.Equals((Char)34) & !foundStartQuote)
    {
      foundStartQuote = true;
      continue;
    }

    if ((element.Equals((Char)44) & foundStartQuote))
    {
      //skip the comma...its between double quotes
    }
    else
    {
      output.Append(element);
    }
  }
  return output.ToString();
}

答案 3 :(得分:0)

听起来好像你所描述的内容最终会成为一个csv文件,但回答你的问题我会这样做。

首先,您需要将文本文件转换为可以循环使用的一些可用代码,如下所示:

    public static List<String> GetTextListFromDiskFile(String fileName)
    {
        List<String> list = new List<String>();
        try
        {
            //load the file into the streamreader 
            System.IO.StreamReader sr = new System.IO.StreamReader(fileName);

            //loop through each line of the file
            while (sr.Peek() >= 0)
            {
                list.Add(sr.ReadLine());
            }
            sr.Close();
        }
        catch (Exception ex)
        {
            list.Add("Error: Could not read file from disk. Original error: " + ex.Message);
        }

        return list;
    }

然后循环遍历列表并使用简单的foreach循环并在列表上运行replace,如下所示:

        foreach (String item in list)
        {
            String x = item.Replace("\",\"", "\" \"");
            x = x.Replace("\"", "");
        }

执行此操作后,您需要逐行创建csv文件。我会再次使用StringBuilder,然后只需执行一个sb.AppendLine(x)来创建将成为文本文件的String,然后使用类似的东西将其写入磁盘。

    public static void SaveFileToDisk(String filePathName, String fileText)
    {
        using (StreamWriter outfile = new StreamWriter(filePathName))
        {
            outfile.Write(fileText);
        }
    }

答案 4 :(得分:0)

我以前不理解你的问题。现在我很确定我做对了:

TextFieldParser parser = new TextFieldParser(@"c:\file.csv");
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData) 
{
    //Processing row
    string[] fields = parser.ReadFields();
    foreach (string field in fields) 
    {
        //TODO: Do whatever you need
    }
}
parser.Close();

答案 5 :(得分:0)

var result = Regex.Replace(input,
                           @"[^\""]([^\""])*[^\""]", 
                           m => m.Value.Replace(",", " ") );

答案 6 :(得分:0)

这对我有用。希望它可以帮助别人。

Private Sub Command1_Click()
Open "c:\\dir\file.csv" For Input As #1
Open "c:\\dir\file2.csv" For Output As #2
Do Until EOF(1)
Line Input #1, test$
99
c = InStr(test$, """""")
If c > 0 Then
test$ = Left$(test$, c - 1) + Right$(test$, Len(test$) - (c + 1))
GoTo 99
End If
Print #2, test$
Loop
End Sub

答案 7 :(得分:0)

在开始逐行处理它之前,我会做所有的事情。 另外,请签出CsvHelper。快速简便。只需将您的结果放入一个TextReader中,然后将其传递给CvsReader。

这是您的逗号(双引号),然后是随后的双引号剥离器。

        using (TextReader reader = File.OpenText(file))
        {
            // remove commas and double quotes inside file
            var pattern = @"\""(.+?,.+)+\""";
            var results = Regex.Replace(reader.ReadToEnd(), pattern, match => match.Value.Replace(",", " "));
            results = results.Replace("\"", "");
         }