Question

我使用tweepy以西班牙语下载推文，然后将它们写入CSV文件。我使用下面的代码执行此操作：

while True:
try:
    for tweet in tweets:
        print tweet.created_at, tweet.text.encode('utf-8')
        csvWriter.writerow([tweet.created_at, tweet.id_str, tweet.author.name.encode('utf8'), tweet.author.screen_name.encode('utf8'),
            tweet.user.location.encode('utf-8'), tweet.coordinates, tweet.text.encode('utf-8'), tweet.retweet_count, tweet.favorite_count])
except tweepy.TweepError:

现在，包含推文文本的行包含奇怪的字符，例如：México，D.F。出现为M©xico，D.F。我尝试将数据导出转换为数字中的utf-8，但这会将相同的字符串更改为：Mí©xico，D.F。

对于其他推文，我也会得到这样的结果：RT @taniarin：_ôÖ‰_ôÖ‰_ôÖ‰_ôÖ‰#UberSeQueda。

我正在使用pandas来读取文件：

pd.read_csv("uber_dataFULL_utf8.csv", encoding='utf-8')

但它似乎不起作用。

我不确切地知道问题是什么或可能是什么。我使用了chardet，它检测到要在utf-8中编码的文本。

谢谢！

utf-8编码文件中的奇怪字符

0 个答案: