纠正跨语言的正则表达式模式

时间:2013-05-08 09:10:12

标签: c# .net regex

我在http://gskinner.com/RegExr/

找到了这个正则表达式模式
,(?=(?:[^"]*"[^"]*")*(?![^"]*"))

用于模式匹配的CSV分隔值(更具体地说,可以拆分的分隔逗号),在该网站上可以很好地使用我的测试数据。您可以在测试时看到我认为链接网站底部面板中的JavaScript实现。

但是当我尝试在C#/ .net中实现它时,匹配不能正常工作。 我的实施:

Regex r = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))", RegexOptions.ECMAScript);
//get data...
foreach (string match in r.Split(sr.ReadLine()))
{
    //lblDev.Text = lblDev.Text + match + "<br><br><br><p>column:</p><br>";
    dtF.Columns.Add(match);
}

//more of the same to get rows

在某些数据行上,结果与上面站点上生成的结果完全匹配,但在其他数据行中,前6行左右的行无法拆分或仅在拆分数组中不存在。

有人可以告诉我为什么这种模式似乎没有表现出同样的行为吗?

我的测试数据:

CategoryName,SubCategoryName,SupplierName,SupplierCode,ProductTitle,Product Company ,ProductCode,Product_Index,ProductDescription,Product BestSeller,ProductDimensions,ProductExpressDays,ProductBrandName,ProductAdditionalText ,ProductPrintArea,ProductPictureRef,ProductThumnailRef,ProductQuantityBreak1 (QB1),ProductQuantityBreak2 (QB2),ProductQuantityBreak3 (QB3),ProductQuantityBreak4 (QB4),ProductPlainPrice1,ProductPlainPrice2,ProductPlainPrice3,ProductPlainPrice4,ProductColourPrice1,ProductColourPrice2,ProductColourPrice3,ProductColourPrice4,ProductExtraColour1,ProductExtraColour2,ProductExtraColour3,ProductExtraColour4,SellingPrice1,SellingPrice2,SellingPrice3,SellingPrice4,ProductCarriageCost1,ProductCarriageCost2,ProductCarriageCost3,ProductCarriageCost4,BLACK,BLUE,WHITE,SILVER,GOLD,RED,YELLOW,GREEN,ProductOtherColors,ProductOrigination,ProductOrganizationCost,ProductCatalogEntry,ProductPageNumber,ProductPersonalisationType1 (PM1),ProductPrintPosition,ProductCartonQuantity,ProductCartonWeight,ProductPricingExpering,NewProduct,ProductSpecialOffer,ProductSpecialOfferEnd,ProductIsActive,ProductRepeatOrigination,ProductCartonDimession,ProductSpecialOffer1,ProductIsExpress,ProductIsEco,ProductIsBiodegradable,ProductIsRecycled,ProductIsSustainable,ProductIsNatural
Audio,Speakers and Headphones,The Prime Time Company,CM5064:In-ear headphones,Silly Buds,,10058,372,"Small, trendy ear buds with excellent sound quality and printing area actually on each ear- piece. Plastic storage box, with room for cables be wrapped around can also be printed.",FALSE,70 x 70 x 20mm,,,,10mm dia,10058.jpg,10058.jpg,100,250,500,1000,2.19,2.13,2.06,1.99,0.1,0.1,0.05,0.05,0.1,0.1,0.05,0.05,3.81,3.71,3.42,3.17,0,0,0,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,,30,,TRUE,24,Screen Printed,Earpiece,200,11,,TRUE,,,TRUE,15,,,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
Audio,Speakers and Headphones,The Prime Time Company,CM5058:Headstart,Head Start,,10060,372,"Lightweight, slimline, foldable and patented headphones ideal for the gym or exercise. These
headphones uniquely hang from the ears giving security, comfort and an excellent sound quality. There is also a secret cable winding facility.",FALSE,130 x 85 x 45mm,,,,30mm dia,10060.jpg,10060.jpg,100,250,500,1000,5.6,5.43,5.26,5.09,0.1,0.1,0.05,0.05,0.1,0.1,0.05,0.05,9.47,8.96,8.24,7.97,0,0,0,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,,30,,TRUE,24,Screen Printed,print plate on ear (s),100,11,,TRUE,,,TRUE,15,,,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE

2 个答案:

答案 0 :(得分:3)

使用正确的工具完成工作。正则表达式不适合解析可以具有无限数量的嵌套引号的CSV。

请改用:

快速CSV阅读器

http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

我们在生产代码中使用它。它工作得很好,让你欣赏复杂的解析。有关复杂性的更多信息,请查看解决方案中包含的800多个单元测试。

答案 1 :(得分:0)

你的C#正则表达式在LinqPad中对我来说很好,但是你的数据确实包含了最后一行&#34;行&#34;数据的。因此,您无法使用sr.ReadLine()来读取数据。