我正在使用Microsoft.VisualBasic.FileIO.TextFieldParser解析CSV数据。与我在解析CSV时发现的免费软件库相比,它非常好。它做了我认为它应该WRT CSV的所有事情,除了它不保留用引号括起来的字段的前导/尾随空格。好吧,如果我将TrimWhiteSpace设置为false,但它不会修剪引号中包含的字段而不是的空格。对于CSV,我希望它修剪非引用字段而不修剪引用字段。
这就是我使用该课程的方式:
var parser = new TextFieldParser(textReader) {Delimiters = new[] {","}};
//TrimWhiteSpace is true by default
var row1 = _textFieldParser.ReadFields();
var row2 = _textFieldParser.ReadFields();
考虑这些数据:
1 , 2
" 1 ", " 2 "
对于TrimWhiteSpace == true,row1和row2都是[" 1"," 2"]。 对于TrimWhiteSpace == false,row1和row2都是[" 1"," 2"]。
我想要的是row1 == [" 1"," 2"]和row2 == [" 1"," 2"]。
答案 0 :(得分:0)
虽然回答起步很晚,但发现这个问题很有意思并且投票了,因为IMO令人惊讶的是,没有内置的方法可以在所描述的条件下保持空白区域。
假设输入与问题相同,添加一行也保留双引号转义字符(an immediately following double quote):
1 , 2
" 1 ", " 2 "
" a ""quoted"" word ", " hello world "
将HasFieldsEnclosedInQuotes
设置为false,并使用简单的Regex
处理引号括起来的任何字段:
var separator = new string('=', 40);
Console.WriteLine(separator);
// demo only - show the input lines read from a text file
var text = File.ReadAllText(inputPath);
var lines = text.Split(
new string[] { Environment.NewLine },
StringSplitOptions.None
);
using (var textReader = new StringReader(text))
{
using (var parser = new TextFieldParser(textReader))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
parser.TrimWhiteSpace = true;
parser.HasFieldsEnclosedInQuotes = false;
// remove double quotes, since HasFieldsEnclosedInQuotes is false
var regex = new Regex(@"
# match double quote
\""
# if not immediately followed by a double quote
(?!\"")
",
RegexOptions.IgnorePatternWhitespace
);
var rowStart = 0;
while (parser.PeekChars(1) != null)
{
Console.WriteLine(
"row {0}: {1}", parser.LineNumber, lines[rowStart]
);
var fields = parser.ReadFields();
for (int i = 0; i < fields.Length; ++i)
{
Console.WriteLine(
"parsed field[{0}] = [{1}]", i,
regex.Replace(fields[i], "")
);
}
++rowStart;
Console.WriteLine(separator);
}
}
}
输出:
========================================
row 1: 1 , 2
parsed field[0] = [1]
parsed field[1] = [2]
========================================
row 2: " 1 ", " 2 "
parsed field[0] = [ 1 ]
parsed field[1] = [ 2 ]
========================================
row 3: " a ""quoted"" word ", " hello world "
parsed field[0] = [ a "quoted" word ]
parsed field[1] = [ hello world ]
========================================