我有一个数据框,其列为:event_name,带有json-objects(不同类型的对象)。我想将此列拆分为新列(如json对象)。
创建df:
d = [{'event_datetime': '2019-01-08 00:09:30',
'event_json': '{"lvl":"450","tok":"1212","snum":"257","udid":"122112"}',
'event_name': 'AdsClick'},
{'event_datetime': '2019-01-08 00:43:21',
'event_json': '{"lvl":"902","udid":"3123","tok":"4214","snum":"1387"}',
'event_name': 'AdsClick'},
{'event_datetime': '2019-02-08 00:05:01',
'event_json': '{"lvl":"1415","udid":"214124","tok":"213123","snum":"2416","col12":"2416","col13":"2416"}'}]
df12 = json_normalize(d)
示例:
event_datetime event_json event_name
0 2019-02-08 00:09:30 {"lvl":"450","tok":"1212","snum":"257","udid":... AdsClick
1 2019-02-08 00:43:21 {"lvl":"902","udid":"3123","tok":"4214","snum"... AdsClick
2 2019-02-08 00:05:01 {"lvl":"1415","udid":"214124","tok":"213123","... NaN
现在我使用此代码:
df12 = df12.merge(df12['event_json'].apply(lambda x: pd.Series(json.loads(x))), left_index=True, right_index=True)
结果:
event_datetime event_json event_name lvl snum tok udid col12 col13
0 2019-02-08 00:09:30 {"lvl":"450","tok":"1212","snum":"257","udid":... AdsClick 450 257 1212 122112 NaN NaN
1 2019-02-08 00:43:21 {"lvl":"902","udid":"3123","tok":"4214","snum"... AdsClick 902 1387 4214 3123 NaN NaN
2 2019-02-08 00:05:01 {"lvl":"1415","udid":"214124","tok":"213123","... NaN 1415 2416 213123 214124 2416 2416
但是它非常慢。您对更快的代码有任何想法吗?
答案 0 :(得分:1)
将列表理解与DataFrame
构造函数一起使用,并由DataFrame.join
添加到原始列表:
df = df12.join(pd.DataFrame([json.loads(x) for x in df12['event_json']]))
print (df)
event_datetime event_json \
0 2019-01-08 00:09:30 {"lvl":"450","tok":"1212","snum":"257","udid":...
1 2019-01-08 00:43:21 {"lvl":"902","udid":"3123","tok":"4214","snum"...
2 2019-02-08 00:05:01 {"lvl":"1415","udid":"214124","tok":"213123","...
event_name col12 col13 lvl snum tok udid
0 AdsClick NaN NaN 450 257 1212 122112
1 AdsClick NaN NaN 902 1387 4214 3123
2 NaN 2416 2416 1415 2416 213123 214124
如果还需要删除源列,请使用DataFrame.pop
:
df = df12.join(pd.DataFrame([json.loads(x) for x in df12.pop('event_json')]))
print (df)
event_datetime event_name col12 col13 lvl snum tok udid
0 2019-01-08 00:09:30 AdsClick NaN NaN 450 257 1212 122112
1 2019-01-08 00:43:21 AdsClick NaN NaN 902 1387 4214 3123
2 2019-02-08 00:05:01 NaN 2416 2416 1415 2416 213123 214124