将JSON数据加载到pandas数据框中并创建自定义列

时间:2019-02-13 00:59:34

标签: python json pandas dataframe

这是与JSON即时通讯配合使用的示例。

meta_id post_id meta_key meta_value
37  15  _sku    PRODUCTSKU
38  15  _regular_price  14.00
39  15  _sale_price 
40  15  _sale_price_dates_from  
41  15  _sale_price_dates_to    
42  15  total_sales 0
43  15  _tax_status taxable
44  15  _tax_class  
45  15  _manage_stock   yes
46  15  _backorders no
47  15  _low_stock_amount   
48  15  _sold_individually  no
49  15  _weight 1
50  15  _length 

除了遇到“ lat_long”的麻烦之外,我已经能够提取想要的选择列。到目前为止,我的代码如下:

{
    ":@computed_region_amqz_jbr4": "587",
    ":@computed_region_d3gw_znnf": "18",
    ":@computed_region_nmsq_hqvv": "55",
    ":@computed_region_r6rf_p9et": "36",
    ":@computed_region_rayf_jjgk": "295",
    "arrests": "1",
    "county_code": "44",
    "county_code_text": "44",
    "county_name": "Mifflin",
    "fips_county_code": "087",
    "fips_state_code": "42",
    "incident_count": "1",
    "lat_long": {
      "type": "Point",
      "coordinates": [
        -77.620031,
        40.612749
      ]
    }

但是'lat_long'会这样添加到数据帧中:# PRINTS OUT SPECIFIED COLUMNS col_titles = ['county_name', 'incident_count', 'lat_long'] df = df.reindex(columns=col_titles)

我想过,一旦我弄清楚如何正确地将坐标添加到数据框中,就可以创建两个单独的列,一个用于纬度,一个用于经度。

在此问题上的任何帮助将不胜感激。谢谢。

2 个答案:

答案 0 :(得分:0)

如果我没有误解您的要求,那么您可以通过json_normalize尝试这种方式。我只是为单个json添加了演示,您可以对多个数据集使用applylambda

import pandas as pd
from pandas.io.json import json_normalize
df = {":@computed_region_amqz_jbr4":"587",":@computed_region_d3gw_znnf":"18",":@computed_region_nmsq_hqvv":"55",":@computed_region_r6rf_p9et":"36",":@computed_region_rayf_jjgk":"295","arrests":"1","county_code":"44","county_code_text":"44","county_name":"Mifflin","fips_county_code":"087","fips_state_code":"42","incident_count":"1","lat_long":{"type":"Point","coordinates":[-77.620031,40.612749]}}

df = pd.io.json.json_normalize(df)
df_modified = df[['county_name', 'incident_count', 'lat_long.type']] 
df_modified['lat'] = df['lat_long.coordinates'][0][0]
df_modified['lng'] = df['lat_long.coordinates'][0][1]
print(df_modified)

答案 1 :(得分:0)

这也是您可以执行的操作:

df1 = pd.io.json.json_normalize(df)

pd.concat([df1, df1['lat_long.coordinates'].apply(pd.Series) \
  .rename(columns={0: 'lat', 1: 'long'})], axis=1) \
  .drop(columns=['lat_long.coordinates', 'lat_long.type'])