我正在使用tweepy捕获一些葡萄牙语葡萄牙语的推文,并将这些推文保存在csv文件中。我们保存的所有tweet文本都带有特殊字符,现在我无法将其转换为正确的格式。
我对推文捕获的编码是:
csvFile = open('ua.csv', 'a')
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.user_timeline,id=usuario,count=10,
lang="en",
since="2018-12-01").items():
csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])
我正在读取这样的结果:
test = pd.read_csv('ua.csv', header=None)
test.columns = ["date", "text"]
result = test['text'][0]
print(result)
'Aproveita essa promo\xc3\xa7\xc3\xa3o aqui!'
我需要的结果是:
print(result)
'Aproveita essa promoção aqui!'
我尝试了以下代码进行转换:
print(result.decode('utf-8'))
并收到以下错误消息:
AttributeError: 'str' object has no attribute 'decode'
我在哪里做错了?
答案 0 :(得分:1)
问题是您在bytes
上发布推文时正在创建.encode
对象,您不需要这样做。
csv.writer
对象将强制传递给您传递给它的任何字符串。
注意:
In [1]: import csv
In [2]: s = 'Aproveita essa promoção aqui!'
In [3]: print(s)
Aproveita essa promoção aqui!
In [4]: print(s.encode())
b'Aproveita essa promo\xc3\xa7\xc3\xa3o aqui!'
In [5]: with open('test.txt', 'a') as f:
...: writer = csv.writer(f)
...: writer.writerow([1, 3.4, 'Aproveita essa promoção aqui!'.encode()])
...:
In [6]: !cat test.txt
1,3.4,b'Aproveita essa promo\xc3\xa7\xc3\xa3o aqui!'
因此只需使用:
csvWriter.writerow([tweet.created_at, tweet.text])
答案 1 :(得分:0)
熊猫read_csv
有一个encoding
参数:
在读/写时用于UTF的编码(例如'utf-8')。
答案 2 :(得分:0)
使用要使用的编码打开文件。不要手动对其进行编码(Zen of Python:显式优于隐式):
# newline='' per csv documentation
# encoding='utf-8-sig' if you plan on using Excel to read the csv, else 'utf8' is fine.
with open('ua.csv','a',encoding='utf-8-sig',newline='') as csvFile:
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.user_timeline,id=usuario,count=10,
lang="en",
since="2018-12-01").items():
csvWriter.writerow([tweet.created_at, tweet.text)
这是一个可行的示例:
import csv
import pandas as pd
with open('ua.csv','w',encoding='utf-8-sig',newline='') as csvFile:
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['timestamp','Aproveita essa promoção aqui!'])
test = pd.read_csv('ua.csv', encoding='utf-8-sig', header=None)
print(test)
输出:
0 1
0 timestamp Aproveita essa promoção aqui!