我有一段数据,我需要从中提取具体信息。数据如下所示:
pid log Date
91 json D1
189 json D2
276 json D3
293 json D4
302 json D5
302 json D6
343 json D7
LOG是一个存储在excel文件列中的json文件,如下所示:
{"Before":{"freq_term":"Daily","ideal_pmt":"246.03","datetime":"2015-01-08 06:26:11},"After":{"freq_term":"Bi-Monthly","ideal_pmt":"2583.33"}}
{"Before":{"freq_term":"Daily","ideal_pmt":"637.5","datetime":"2015-01-08 06:26:11"},"After":{"freq_term":"Weekly","ideal_pmt":"3346.88","datetime":"2015-02-02 06:16:07"}}
{"Before":{"buy_rate":"1.180","irr":"31.63","uwfee":"","freq_term":"Weekly"}, "After":{"freq_term":"Bi-Monthly","ideal_pmt":"2583.33"}}
现在,我想要的是这样的输出:
{
"pid": 91,
"Date": "2016-05-15 03:54:24"
"Before": {
"freq_term": "Daily"
},
"After": {
"freq_term": "Weekly",
}
}
基本上我只希望日志文件中的"freq_term"
和"Datetime"
"Before"
和"After"
。到目前为止,我已经完成了以下代码。在我做了之后,它给了我错误:list object is not callable
。任何帮助赞赏。感谢。
import pandas as pd
data = pd.read_excel("C:\\Users\\Desktop\\dealChange.xlsx")
df = pd.DataFrame(data, columns = ['pid', 'log', 'date'])
li = df.to_dict('records')
dict(kv for d in li for kv in d.iteritems()) # error: list obj is not callable
如何将列表转换为字典,以便我只能访问所需的数据..
答案 0 :(得分:1)
我相信你需要:
df = pd.DataFrame({'log':['{"Before":{"freq_term":"Daily","ideal_pmt":"637.5","datetime":"2015-01-08 06:26:11"},"After":{"freq_term":"Weekly","ideal_pmt":"3346.88","datetime":"2015-02-02 06:16:07"}}','{"Before":{"buy_rate":"1.180","irr":"31.63","uwfee":"","freq_term":"Weekly"}, "After":{"freq_term":"Bi-Monthly","ideal_pmt":"2583.33"}}']})
print (df)
log
0 {"Before":{"freq_term":"Daily","ideal_pmt":"63...
1 {"Before":{"buy_rate":"1.180","irr":"31.63","u...
首先将值转换为嵌套的dictionaries
,然后按嵌套的字典理解进行过滤:
df['log'] = df['log'].apply(pd.io.json.loads)
L1 = ['Before','After']
L2 = ['freq_term','datetime']
f = lambda x: {k:{k1:v1 for k1,v1 in v.items() if k1 in L2} for k,v in x.items() if k in L1}
df['new'] = df['log'].apply(f)
print (df)
log \
0 {'After': {'ideal_pmt': '3346.88', 'freq_term'...
1 {'After': {'ideal_pmt': '2583.33', 'freq_term'...
new
0 {'After': {'freq_term': 'Weekly', 'datetime': ...
1 {'After': {'freq_term': 'Bi-Monthly'}, 'Before...
编辑:
要查找所有具有不可解析值的行,可以使用:
def f(x):
try:
return ast.literal_eval(x)
except:
return 1
print (df[df['log'].apply(f) == 1])