关于json到pandas数据帧有很多问题,但没有一个能解决我的问题。我在这个复杂的json文件上练习,看起来像这样
{
"type" : "FeatureCollection",
"features" : [ {
"Id" : 265068000,
"type" : "Feature",
"geometry" : {
"type" : "Point",
"coordinates" : [ 22.170376666666666, 65.57273333333333 ]
},
"properties" : {
"timestampExternal" : 1529151039629
}
}, {
"Id" : 265745760,
"type" : "Feature",
"geometry" : {
"type" : "Point",
"coordinates" : [ 20.329506666666667, 63.675425000000004 ]
},
"properties" : {
"timestampExternal" : 1529151278287
}
} ]
}
我想使用pd.read_json()
将此json直接转换为pandas数据帧我的主要目标是提取Id,Coordinates和timestampExternal。由于这是非常复杂的json,pd.read_json()
的正常方式,根本没有给出正确的输出。你能建议我,我怎样才能在这种情况下解决问题。预期的输出是这样的
Id,Coordinates,timestampExternal
265068000,[22.170376666666666, 65.57273333333333],1529151039629
265745760,[20.329506666666667, 63.675425000000004],1529151278287
答案 0 :(得分:4)
您可以阅读json以将其加载到字典中。然后,使用字典理解,将您想要的属性提取为列 -
import json
import pandas as pd
_json = json.load('/path/to/json')
df_dict = [{'id':item['Id'], 'coordinates':item['geometry']['coordinates'],
'timestampExternal':item['properties']['timestampExternal']} for item in _json['features']]
extracted_df = pd.DataFrame(extracted_df)
>>>
coordinates id timestampExternal
0 [22.170376666666666, 65.57273333333333] 265068000 1529151039629
1 [20.329506666666667, 63.675425000000004] 265745760 1529151278287
答案 1 :(得分:1)
您可以直接读取json,然后将features
数组作为dict给予pandas:
import json
with open('test.json', 'rU') as f:
data = json.load(f)
df = pd.DataFrame([dict(id=datum['Id'],
coords=datum['geometry']['coordinates'],
ts=datum['properties']['timestampExternal'],
)
for datum in data['features']])
print(df)
coords id ts
0 [22.170376666666666, 65.57273333333333] 265068000 1529151039629
1 [20.329506666666667, 63.675425000000004] 265745760 1529151278287