Question

因此，我正在使用Twitter ID列表查询Twitter API。我需要做的是遍历这些ID，以便从Twitter获得相应的数据。然后，我需要将这些JSON文件存储到txt文件中，其中每个tweet的JSON数据都位于其单独的行上。稍后，我将不得不逐行阅读txt文件，以从中创建一个熊猫df。

我尝试给您一些假数据，以向您显示结构。

twt.tweet_id.head()

0    000000000000000001
1    000000000000000002
2    000000000000000003
3    000000000000000004
4    000000000000000005
Name: tweet_id, dtype: int64

我不知道如何共享JSON文件，我什至不知道是否可以。调用tweet._json后，得到的是一个JSON文件。

drop_lst = []     # this is needed to collect the IDs which don't work


for i in twt.tweet_id:   # twt.tweet_id is the pd.series with the IDs
    try:
        tweet = api.get_status(i)
        with open('tweet_json.txt', 'a') as f:
            f.write(str(tweet._json)+'\n')  #  tweet._json is the JSON file I need

    except tp.TweepError:
        drop_lst.append(i)

以上方法有效，但我认为我已经失去了稍后创建数据框所需的JSON结构

drop_lst = []

for i in twt.tweet_id:
    try:
        tweet = api.get_status(i)
        with open('data.txt', 'a') as outfile:  
            json.dump(tweet._json, outfile)

    except tp.TweepError:
        drop_lst.append(i)

以上内容并未将每个文件放在单独的行中。

希望我能为您提供足够的信息以帮助我。

在此先感谢您的帮助。

Answer 1

使用json将json.dump附加到文件中不包含换行符，因此它们都一起出现在同一行中。我建议您将所有json记录收集到list中，然后使用join并将其转储到文件中

tweets, drop_lst = [], []

for i in twt.tweet_id:
    try:
        tweet = api.get_status(i)
        tweets.append(tweet._json)

    except tp.TweepError:
        drop_lst.append(i)

with open('data.txt', 'a') as fh:
    fh.write('\n') # to ensure that the json is on its own line to start
    fh.write('\n'.join(json.dumps(tweet) for tweet in tweets)) # this will concatenate the tweets into a newline delimited string

然后，要创建数据框，您可以读取该文件并将所有内容缝合在一起

with open("data.txt") as fh:
    tweets = [json.loads(line) for line in fh if line]

df = pd.DataFrame(tweets)

这是假设json本身没有换行符（可能包含推文），因此请警惕

在一个txt文件中写入JSON文件，每个JSON数据都位于单独的行中

1 个答案: