Question

我的文字文件如下所示。

tweets = []
for line in open('tweets.txt').readlines():
    print line[1]
    tweets.append(line)

我想从列表中读取每个第二个元素，意味着将所有文本推文转换为数组。

我写了

{{1}}

但是当我看到输出时，它只占用每一行的第二个字符。

Answer 1

不应该猜测数据的格式，而应该找出。

如果您自己生成它，并且不知道如何解析您正在创建的内容，请更改您的代码以生成可以使用相同的库轻松解析的内容生成它，如JsonLines或CSV。
如果您从某些API中提取它，请阅读该API的文档并按照其记录的方式进行解析。
如果有人将文件交给您并告诉您解析它，请询问某人格式化的内容。

偶尔，你做必须处理一些从未记录过的某种格式的旧文件，并且没有人记得它是什么。在这种情况下，您必须对其进行逆向工程。但是你想要做的就是猜测可能的可能性，并尝试用尽可能多的验证和错误处理来解析它，以验证你猜对了。

在这种情况下，格式看起来很像JSON lines或ndjson。两种编码多个对象的方式略有不同，每行一个JSON文本，对这些文本及其编码方式和它们之间的空格具有特定限制。

所以，虽然像这样的快速和肮脏的解析器可能会起作用：

with open('tweets.txt') as f:
    for line in f:
        tweet = json.loads(line)
        dosomething(tweet)

您可能想要使用像jsonlines这样的库：

with jsonlines.open('tweets.txt') as f:
    for tweet in f:
        dosomething(tweet)

快速和肮脏的解析器在JSON行上工作的事实当然是该格式的一部分 - 但如果你实际上并不知道你是否有JSON行，那么你就是这样做的。最好确保。

Answer 2

由于您的输入看起来像Python表达式，我使用ast.literal_eval来解析它们。

以下是一个例子：

import ast

with open('tweets.txt') as fp:
    tweets = [ast.literal_eval(line)[1] for line in fp]

print(tweets)

输出：

['we break dance not hearts by Short Stack is my ringtone.... i LOVE that !!!.....\n', 'I want to write a . I think I will.\n', '@va_stress broke my twitter..\n', '" &quot;Y must people insist on talking about stupid politics on the comments of a bubblegum pop . Sorry\n', 'aww great  &quot;Picture to burn&quot;\n', '@jessdelight I just played ur joint two s ago. Everyone in studio was feeling it!\n', 'http://img207.imageshack.us/my.php?image=wpcl10670s.jpg her s are so perfect.\n', 'cannot hear the new  due to geographic location. i am geographically undesirable. and tune-less\n', '" couples in public\n', "damn wendy's commerical got that damn  in my head.\n", 'i swear to cheese &amp; crackers @zyuuup is in Detroit like every 2 months &amp; i NEVER get to see him!  i swear this blows monkeyballs!\n', '" getting ready for school. after i print out this\n']

Answer 3

当您在Python中读取文本文件时，这些行只是字符串。它们不会自动转换为其他一些数据结构。

在您的情况下，您的文件中的每一行看起来都包含一个JSON列表。在这种情况下，您可以先使用json.loads()解析该行。这会将字符串转换为Python list，然后您可以使用第二个元素：

import json
with open('tweets.txt') as fp:
    tweets = [json.loads(line)[1] for line in fp]

如何从文本文件中读取Python中的列表元素？

3 个答案: