(最初来自previous question,但针对更笼统的问题进行了重新构架)
这是我正在处理的2条记录的示例json文件:
[{"Time":"2016-01-10",
"ID"
:13567,
"Content":{
"Event":"UPDATE",
"Id":{"EventID":"ABCDEFG"},
"Story":[{
"@ContentCat":"News",
"Body":"Related Meeting Memo: Engagement with target firm for potential M&A. Please be on call this weekend for news updates.",
"BodyTextType":"PLAIN_TEXT",
"DerivedId":{"Entity":[{"Id":"Amy","Score":70}, {"Id":"Jon","Score":70}]},
"DerivedTopics":{"Topics":[
{"Id":"Meeting","Score":70},
{"Id":"Performance","Score":70},
{"Id":"Engagement","Score":100},
{"Id":"Salary","Score":70},
{"Id":"Career","Score":100}]
},
"HotLevel":0,
"LanguageString":"ENGLISH",
"Metadata":{"ClassNum":50,
"Headline":"Attn: Weekend",
"WireId":2035,
"WireName":"IIS"},
"Version":"Original"}
]},
"yyyymmdd":"20160110",
"month":201601},
{"Time":"2016-01-12",
"ID":13568,
"Content":{
"Event":"DEAL",
"Id":{"EventID":"ABCDEFG2"},
"Story":[{
"@ContentCat":"Details",
"Body":"Test email contents",
"BodyTextType":"PLAIN_TEXT",
"DerivedId":{"Entity":[{"Id":"Bob","Score":100}, {"Id":"Jon","Score":70}, {"Id":"Jack","Score":60}]},
"DerivedTopics":{"Topics":[
{"Id":"Meeting","Score":70},
{"Id":"Engagement","Score":100},
{"Id":"Salary","Score":70},
{"Id":"Career","Score":100}]
},
"HotLevel":0,
"LanguageString":"ENGLISH",
"Metadata":{"ClassNum":70,
"Headline":"Attn: Weekend",
"WireId":2037,
"WireName":"IIS"},
"Version":"Original"}
]},
"yyyymmdd":"20160112",
"month":201602}]
我正在尝试获取实体ID级别的数据帧(从记录1和Amy
,Jon
,{{提取Bob
和Jon
1}}(来自记录2)。我该怎么做呢?
要澄清的是,这些级别是(内容>故事>派生ID>实体> ID)
答案 0 :(得分:2)
使用list comprehension,您可以进入该结构,例如:
with open('test.json', 'rU') as f:
data = json.load(f)
df = pd.DataFrame(sum([i['Content']['Story'][0]['DerivedId']['Entity']
for i in data], []))
print(df)
或者,如果您有大量数据并且不想做笨拙的sum()
,请使用itertools.chain.from_iterable
,例如:
import itertools as it
df = pd.DataFrame.from_records(it.chain.from_iterable(
i['Content']['Story'][0]['DerivedId']['Entity'] for i in data))
Id Score
0 Amy 70
1 Jon 70
2 Bob 100
3 Jon 70
4 Jack 60