如何使用csv.writer转义#和=符号?

时间:2017-10-12 16:35:30

标签: python csv escaping export-to-csv

enter image description here我正在使用Tweepy的Streamlistener(使用Python 3.6)来传输Twitter数据。到现在为止还挺好。遗憾的是,当将它们转换为CSV文件时,程序似乎会混淆或误解某些符号(特别是=和#),以便文本被拆分并且列混乱。 (更具体地说:包含推文消息的“文本”列经常由于句子中的#或a =而被拆分。然​​后句子的其余部分被填入下一列等)。 我已经读过,使用csv.writer和writer row(s)方法以及某种形式的转义选项很可能会阻止这种情况。有人可以帮我弄清楚如何在以下两个代码中加入这个代码吗?

代码1:用户的时间表(相关部分)

outtweets = [[tweet.created_at, tweet.place, tweet.lang, tweet.retweeted, tweet.retweet_count, tweet.favorite_count, tweet.entities, tweet.id_str, tweet.text] for tweet in alltweets]

    #write the csv  
    with open('%s_tweets.csv' % screen_name, 'w', encoding='utf-8-sig') as f:
        writer = csv.writer(f)
        writer.writerow(["created_at","place", "language", "retweet", "retweet_count", "favorite_count", "entities", "tweet_id", "text"])
        writer.writerows(outtweets)

pass

代码2:Streamlistener(此代码将现有的jsonl文件与推文一起转换为csv)

import json
import csv
import io

'''
creates a .csv file using a Twitter .json file
the fields have to be set manually
'''

def extract_json(fileobj):
    """
    Iterates over an open JSONL file and yields
    decoded lines.  Closes the file once it has been
    read completely.
    """
    with fileobj:
        for line in fileobj:
            yield json.loads(line)    


data_json = io.open('stream_____.jsonl', mode='r', encoding='utf-8-sig') # Opens in the JSONL file
data_python = extract_json(data_json)

csv_out = io.open('tweets_out_utf8.csv', mode='w', encoding='utf-8-sig') #opens csv file


fields = u'created_at,text,screen_name,followers,friends,rt,fav' #field names
csv_out.write(fields)
csv_out.write(u'\n')

for line in data_python:

    #writes a row and gets the fields from the json object
    #screen_name and followers/friends are found on the second level hence two get methods
    row = [line.get('created_at'),
           '"' + line.get('text').replace('"','""') + '"', #creates double quotes
           line.get('user').get('screen_name'),
           str(line.get('user').get('followers_count')),
           str(line.get('user').get('friends_count')),
           str(line.get('retweet_count')),
           str(line.get('favorite_count'))]

    row_joined = u','.join(row)
    csv_out.write(row_joined)
    csv_out.write(u'\n')

csv_out.close()

0 个答案:

没有答案