Question

我试图用Twitter进行一些数据挖掘，但遇到了这个问题。当我尝试将tweet的大数字id写入CSV文件时，Python会不断将其转换为科学计数法。例如，如果ID为9381435503399854，python会将其转换为9.381435503399854E + 17。我尝试使用format(int(tweet.id), ".0f")，但是它给了我相同的结果。 Format(int(tweet.id), "f")似乎有效，但在ID的末尾附加了“ .000000”。任何建议将不胜感激。这是一些示例代码：

writeExtended(count, tweet.id, tweet.full_text.encode('utf8'), tweet.display_text_range, tweet.created_at)

def writeExtended(id, idstr, full_text, display_text_range, created_at):
    #Write Extended tweet details to CSV file
    with open('Extended.csv', mode='a+') as employee_file:
        employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        employee_writer.writerow([id,idstr, full_text, display_text_range, created_at])

Answer 1

我的猜测是您的脚本运行良好，并且您看到的效果（即转换为科学格式）是CSV文件转换为Excel（或其他电子表格应用程序）的结果。您应该尝试在文本编辑器（例如记事本）中打开CSV文件。

作为一种奇怪的解决方法，您可以将其转换为文本，并在其前面加上制表符。这应该停止转换：

def writeExtended(id, idstr, full_text, display_text_range, created_at):
    #Write Extended tweet details to CSV file
    with open('Extended.csv', mode='a+', newline='') as employee_file:
        employee_writer = csv.writer(employee_file)
        employee_writer.writerow([id, '\t{}'.format(idstr), full_text, display_text_range, created_at])

writeExtended(count, tweet.id, tweet.full_text.encode('utf8'), tweet.display_text_range, tweet.created_at)

Answer 2

MS Excel，Libre Office和Google Spreadsheets导入仅数字单元格作为数字，因此，如果它是数字的长字符串（例如tweetid），则它们会将其转换为科学计数法。

一种变通方法是从Python输出csv文件时，在数字后附加一个非数字文本（例如，下划线字符_）。

另一种只能与Libre Office Calc配合使用的解决方案，当您打开一个csv文件时，它将在加载之前弹出一个菜单。因此，如果选择列，您希望将其视为文本，然后从上方的菜单中将“列类型”更改为“文本” ，然后点击确定，那么它将加载该列而不将其转换为科学计数法。请参见下图以获取直观插图。

Python将tweepy tweet和用户ID转换为科学计数法

2 个答案: