我正在将Twitter数据加载到熊猫数据框。预处理后,我将结果存储在一个csv文件中。当我这样做时,列表被存储为字符串。这使得很难进一步处理此csv文件。我想避免将列表存储为字符串,而是希望将它们存储为csv中的列表。我怎样才能做到这一点?
存储为csv之前, cleanedData.head(3).to_dict()
{'id': {0: 1042616899408945154, 1: 1042592536769044487, 2: 1042587702040903680}, 'month': {0: 9, 1: 9, 2: 9}, 'hour': {0: 3, 1: 1, 2: 1}, 'text': {0: [['are', 'red', 'violets', 'are', 'blue', 'if', 'you', 'want', 'to', 'buy', 'us', 'here', 'is', 'a', 'clue', 'our', 'eye', 'amp', 'cheek', 'palette', 'is', 'al']], 1: [['is', 'it', 'too', 'late', 'now', 'to', 'say', 'sorry']], 2: [['oh', 'no'], ['please', 'email', 'your', 'order', 'to', 'social', 'amp', 'we', 'can', 'help'], ['this', 'is', 'a', 'newest', 'offer'], []]}, 'hasMedia': {0: 0, 1: 1, 2: 0}, 'hasHashtag': {0: 1, 1: 1, 2: 0}, 'followers_count': {0: 801745, 1: 801745, 2: 801745}, 'retweet_count': {0: 17, 1: 94, 2: 0}, 'favourite_count': {0: 181, 1: 408, 2: 0}, 'sentiments': {0: {'neg': 0.0, 'neu': 0.949, 'pos': 0.051, 'compound': 0.0772}, 1: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}, 2: {'neg': 0.1, 'neu': 0.634, 'pos': 0.266, 'compound': 0.5684}}, 'text_posTagged': {0: [[('are', 'VBP'), ('red', 'JJ'), ('violets', 'NNS'), ('are', 'VBP'), ('blue', 'JJ'), ('if', 'IN'), ('you', 'PRP'), ('want', 'VBP'), ('to', 'TO'), ('buy', 'VB'), ('us', 'PRP'), ('here', 'RB'), ('is', 'VBZ'), ('a', 'DT'), ('clue', 'JJ'), ('our', 'PRP$'), ('eye', 'NN'), ('amp', 'NN'), ('cheek', 'NN'), ('palette', 'NN'), ('is', 'VBZ'), ('al', 'JJ')]], 1: [[('is', 'VBZ'), ('it', 'PRP'), ('too', 'RB'), ('late', 'RB'), ('now', 'RB'), ('to', 'TO'), ('say', 'VB'), ('sorry', 'NN')]], 2: [[('oh', 'UH'), ('no', 'DT')], [('please', 'VB'), ('email', 'VB'), ('your', 'PRP$'), ('order', 'NN'), ('to', 'TO'), ('social', 'JJ'), ('amp', 'IN'), ('we', 'PRP'), ('can', 'MD'), ('help', 'VB')], [('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('newest', 'NN'), ('offer', 'NN')], []]}}
在csv中存储数据
cleanedData.to_csv('preprocessed_data.csv', sep=',')
preprocessed_data.csv中的几行
1,1042592536769044487,9,1,"[['is', 'it', 'too', 'late', 'now', 'to', 'say', 'sorry']]",1,1,801745,94,408,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}","[[('is', 'VBZ'), ('it', 'PRP'), ('too', 'RB'), ('late', 'RB'), ('now', 'RB'), ('to', 'TO'), ('say', 'VB'), ('sorry', 'NN')]]"
2,1042587702040903680,9,1,"[['oh', 'no'], ['please', 'email', 'your', 'order', 'to', 'social', 'amp', 'we', 'can', 'help'], ['this', 'is', 'a', 'newest', 'offer'], []]",0,0,801745,0,0,"{'neg': 0.1, 'neu': 0.634, 'pos': 0.266, 'compound': 0.5684}","[[('oh', 'UH'), ('no', 'DT')], [('please', 'VB'), ('email', 'VB'), ('your', 'PRP$'), ('order', 'NN'), ('to', 'TO'), ('social', 'JJ'), ('amp', 'IN'), ('we', 'PRP'), ('can', 'MD'), ('help', 'VB')], [('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('newest', 'NN'), ('offer', 'NN')], []]"
3,1042587263643930626,9,1,"[['its', 'best', 'applied', 'with', 'our', 'buffer', 'brush']]",0,0,801745,0,0,"{'neg': 0.0, 'neu': 0.64, 'pos': 0.36, 'compound': 0.6696}","[[('its', 'PRP$'), ('best', 'JJS'), ('applied', 'VBN'), ('with', 'IN'), ('our', 'PRP$'), ('buffer', 'NN'), ('brush', 'NN')]]"
4,1042586780292276230,9,1,[['dead']],0,0,801745,0,14,"{'neg': 0.834, 'neu': 0.166, 'pos': 0.0, 'compound': -0.7213}","[[('dead', 'JJ')]]"
在上面的csv文件中,列表和字典存储为字符串。我想避免这种情况。
答案 0 :(得分:2)
像这样吗?
import csv
df.to_csv("preprocess.csv", quoting=csv.QUOTE_NONE, escapechar=' ')