Python - 如何读取用逗号分隔的csv文件,这些逗号在值中有逗号?

时间:2016-07-08 15:33:19

标签: python csv

该文件的URL包含逗号。例如: 〜OREF = https://tuclothing.tests.co.uk/c/Girls/Girls_Underwear_Socks&Tights?INITD=GNav-CW-GrlsUnderwear&title=Underwear,+Socks+&+Tights

在Underwear和+ Socks之间有一个逗号让我的生活变得不容易。

有没有办法向读者(Pandas,CSV reader..etc)表明整个网址只是一个值?

这是一个包含列和值的更大样本:

Event Time,User ID,Advertiser ID,TRAN Value,Other Data,ORD Value,Interaction Time,Conversion ID,Segment Value 1,Floodlight Configuration,Event Type,Event Sub-Type,DBM Auction ID,DBM Request Time,DBM Billable Cost (Partner Currency),DBM Billable Cost (Advertiser Currency),
1.47E+15,CAESEKoMzQamRFTrkbdTDT5F-gM,2934701,,~oref=https://tuclothing.tests.co.uk/c/NewIn/NewIn_Womens?q=%3AnewArrivals&page=2&size=24,4.60E+12,1.47E+15,1,0,940892,CONVERSION,POSTCLICK,,,0,0,
1.47E+15,CAESEKQhGXdLq0FitBKF5EPPfgs,2934701,,~oref=https://tuclothing.tests.co.uk/c/Women/Women_Accessories?INITD=GNav-WW-Accesrs&q=%3AnewArrivals&title=Accessories&mkwid=sv5biFf2y_dm&pcrid=90361315613&pkw=leather%20bag&pmt=e&med=Search&src=Google&adg=Womens_Accessories&kw=leather+bag&cmp=TU_Women_Accessories&adb_src=4,4.73E+12,1.47E+15,1,0,940892,CONVERSION,POSTCLICK,,,0,0,
1.47E+15,CAESEEpNRaLne21k6juip9qfAos,2934701,,num=16512910;~oref=https://tuclothing.tests.co.uk/,1,1.47E+15,1,0,940892,CONVERSION,POSTCLICK,,,0,0,
1.47E+15,CAESEJ3a2YRrPSSeeRUFHDSoXNQ,2934701,,~oref=https://tuclothing.tests.co.uk/c/Girls/Girls_Underwear_Socks&Tights?INITD=GNav-CW-GrlsUnderwear&title=Underwear,+Socks+&+Tights,8.12E+12,1.47E+15,1,0,940892,CONVERSION,POSTCLICK,,0,0,0
1.47E+15,CAESEGmwaNjTvIrQ3MoIvqiRC8U,2934701,,~oref=https://tuclothing.tests.co.uk/login/checkout,1.75E+12,1.47E+15,1,0,940892,CONVERSION,POSTCLICK,,,0,0,
1.47E+15,CAESEM3G-Nh6Q0OhboLyOhtmtiI,2934701,,~oref=https://3984747.fls.doubleclick.net/activityi;~oref=http%3A%2F%2Fwww.tests.co.uk%2Fshop%2Fgb%2Fgroceries%2Ffrozen-%2Fbeef--pork---lamb,3.74E+12,1.47E+15,1,0,940892,CONVERSION,POSTCLICK,,,0,0,
1.47E+15,CAESENlK7oc-ygl637Y2is3a90c,2934701,,~oref=https://tuclothing.tests.co.uk/,5.10E+12,1.47E+15,1,0,940892,CONVERSION,POSTCLICK,,,0,0,

1 个答案:

答案 0 :(得分:1)

看起来,在这种情况下,您遇到问题的唯一逗号位于URL中。您可以通过预处理程序方法运行csv文件,该方法会删除网址中的逗号或对网址进行编码。

就个人而言,我会选择将逗号转换为%2E的URL编码方法,这样当您开始阅读csv行值时,您的URL中没有逗号,但URL仍然存在保留其到参考/目的地页面的工作链接。

如果您对其他字段(不是URL)或csv行中的其他未知/随机位置有此问题,那么解决方案根本不容易。但由于您每次都确切知道问题的确切位置,因此您可以对该字符执行静态查找,并在该特定字段中找到该替换。