我是Python的新手。我有一个CSV文件,其中的推文条目格式如下:
15,十月 11,785816454042124288,/ realDonaldTrump /状态/ 785816454042124288,FALSE"尽管 在山体滑坡(每次民意调查)中赢得第二次辩论,很难 当Paul Ryan和其他人给予零支持时做得好!",DonaldTrump
和另一个
16,十月 10,785563318652178432,/ realDonaldTrump /状态/ 785563318652178432,FALSE"哇, @CNN被抓到了他们的#34;"焦点小组""为了使弯曲 希拉里看起来更好。非常可怜而完全 !不诚实",唐纳德·特朗普
在Python中,我使用像这样的Pandas加载内容:
data = pd.read_csv(arg, sep=',')
现在,我想清理CSV文件,只保存用户ID(每行第3个条目)和推文本身(我认为第6行)。如你所见,我使用sep =','分开。问题是如果一些推文包含逗号,我不希望由于分裂而删除此字符..如果仅推文号,日期,user_id等之间的分隔符将是除逗号之外的其他内容,这会容易得多。有关如何做到这一点的任何建议?我只想要一个没有我不需要的信息的新CSV文件。
答案 0 :(得分:0)
问题是如果有些推文包含逗号,我不希望因分裂而删除此字符..
常规Python标准库CSV module可以很好地处理这种情况:
>>> import csv
>>> s = '''15,Oct 11,785816454042124288,/realDonaldTrump/status/785816454042124288,False,"Despite winning the second debate in a landslide (every poll), it is hard to do well when Paul Ryan and others give zero support!",DonaldTrump
16,Oct 10,785563318652178432,/realDonaldTrump/status/785563318652178432,False,"Wow, @CNN got caught fixing their ""focus group"" in order to make Crooked Hillary look better. Really pathetic and totally dishonest!",DonaldTrump
'''.splitlines()
>>> for fields in csv.reader(s):
print(fields[2], fields[5])
785816454042124288 Despite winning the second debate in a landslide (every poll), it is hard to do well when Paul Ryan and others give zero support!
785563318652178432 Wow, @CNN got caught fixing their "focus group" in order to make Crooked Hillary look better. Really pathetic and totally dishonest!