Question

我有一个JSON twitter文件列表，我在Python中读入列表，如下所示：

using my_tuple = apply_t<
    quote<std::tuple>, 
    partition_t<
        some_typelist,
        some_criteria_metafunc_class
    >>;

我可以选择“文字”例如给我一个选定的推文

data5=[]
with codecs.open('twitFile_5.txt','rU') as file5:
    for line in file5:
       data5.append(json.loads(line))

但我不知道如何

1）只列出所有“文字”项目

2）搜索“文本”列表并计算文本中提到的短语列表的次数，例如['apple'，'orange fruit'，'一堆香蕉']。

感谢。

Answer 1

听起来像map和reduce可以解决这些问题：

例如：

texts = map(lambda x: x['text'], data5)

和

texts = ['apple test', 'test orange fruit']

init = { 'apple': 0, 'orange fruit': 0, 'bunch of bananas': 0 }

def aggregate(agg,x):
  for k in agg:
    if k in x:
      agg[k] += 1
  return agg

counts = reduce(aggregate, texts, init)

修改

每条评论：

values = [ {'text': 'apple test', 'user': 'A'}, {'text': 'test orange fruit', 'user': 'B'} ] init = { 'apple': [], 'orange fruit': [], 'bunch of bananas': [] } def aggregate(agg,x): for k in agg: if k in x['text']: agg[k].append(x) return agg counts = reduce(aggregate, values, init)

Answer 2

1）使用列表理解

texts = [d["text"] for d in data5]

2）再次列出理解

count = len([t for t in texts if 'apple' in t])

我正在解释你的帖子意味着你想要计算提到“苹果”的文本数量。如果你想计算“苹果”出现的次数，你可以使用

count = sum([t.count('apple') for t in texts])

初学者Python查询 - 选择JSON列表中的项目

2 个答案: