Question

我正在尝试使用从twitter收集的数据来分析json文件，但是当我尝试搜索关键字时，它说它找不到，但我可以看到它存在。我尝试了这两种不同的方式。我会在下面发布它们。任何建议都会很棒。

尝试＃1 ：

import sys
import os
import numpy as np
import scipy
import matplotlib.pyplot as plt
import json
import pandas as pan

tweets_file = open('twitter_data.txt', "r")
for line in tweets_file:
     try:
            tweet = json.loads(line)
            tweets_data.append(tweet)
     except:
            continue
tweets = pan.DataFrame()
tweets['text'] = map(lambda tweet: tweet['text'], tweets_data)

尝试＃2 ：前面的步骤相同，但改为循环

t=tweets[0]
tweet_text = [t['text'] for t in tweets]

错误：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
KeyError: 'text'

如果我打印tweets_data，这就是我所看到的。 “文本”等，绝对是有的。我错过了一个角色吗？

>>> print(tweet_data[0])   
    {u'contributors': None, u'truncated': False, u'text': u'RT
    @iHippieVibes: \u2b50\ufe0fFAV For This Lace Cardigan \n\nUSE Discount
    code for 10% off: SOLO\n\nFree Shipping\n\nhttp://t.co/d8kiIt3J5f
    http://t.c\u2026', u'in_reply_to_status....

（仅粘贴部分输出）

谢谢！任何建议都将不胜感激。

Answer 1

不所有您的推文都有一个'text'密钥。将其过滤掉或使用dict.get()返回默认值：

tweet_text = [t['text'] for t in tweets if 'text' in t]

或

tweet_text = [t.get('text', '') for t in tweets]

在Python中读取Twitter json文件时的KeyErrors

1 个答案: