导入用于Python分析的JSON文件

时间:2014-03-31 16:30:16

标签: python json twitter analysis

我正在尝试导入一个JSON文件,以便在Python编辑器中使用,以便我可以对数据执行分析。我是Python的新手,所以不确定我是如何实现这一目标的。我的JSON文件中充满了推文数据,例如:

{"id":441999105775382528,"score":0.0,"text":"blablabla","user_id":1441694053,"created":"Fri Mar 07 18:09:33 GMT 2014","retweet_id":0,"source":"<a href=\"http://twitterfeed.com\" rel=\"nofollow\">twitterfeed</a>","geo_long":null,"geo_lat":null,"location":"","screen_name":"SevenPS4","name":"Playstation News","lang":"en","timezone":"Amsterdam","user_created":"2013-05-19","followers":463,"hashtags":"","mentions":"","following":1062,"urls":"http://bit.ly/1lcbBW6","media_urls":"","favourites_count":4514,"reply_status_id":0,"reply_user_id":0,"is_truncated":false,"is_retweet":false,"original_text":null,"status_count":4514,"description":"Tweeting the latest Playstation news!","url":null,"utc_offset":3600}

我的问题:

如何导入JSON文件以便我可以在Python编辑器中对其进行分析?

如何仅对一定数量的数据进行分析(IE 100/200而不是全部数据)?

有没有办法摆脱某些字段,例如scoreuser_idcreated等,而无需手动完成所有数据操作?

有些推文中包含无效/不可用的符号,无论如何都可以摆脱那些而不必手动完成?

1 个答案:

答案 0 :(得分:1)

我将Pandas用于此作业,因为您不仅会加载json,还会对其执行一些数据分析任务。根据json文件的大小,这个应该这样做:

import pandas as pd
import json

# read a sample json-file (replace the link with your file location
j = json.loads("yourfilename")
# you might select the relevant keys before constructing the data-frame
df = pd.DataFrame.from_dict([{k:v} for k,v in j.iteritems() if k in ["id","retweet_count"]])
# select a subset (the first five rows)
df.iloc[:5]
# do some analysis
df.retweet_count.sum()
>>> 200