如何在熊猫中展平JSON字段

时间:2018-12-29 05:31:03

标签: python pandas

我正在加载带有嵌套值的json文件。加载方式如下:

>>> pd.read_json('/Users/david/Desktop/validate_headers/json/example_array2.json')
                                  address firstname   lastname  zip_code
0     {'state': 'MI', 'town': 'Dearborn'}    Jimmie  Barninger     12345
1  {'state': 'CA', 'town': 'Los Angeles'}      John        Doe     90027

我想展平嵌套的对象,以便最终数据帧如下所示:

firstname   lastname    zip_code    address.state   address.town
Jimmie      Barninger   12345       MI              Dearborn
John        Doe         90027       CA              Los Angeles

我该怎么做,也就是说,如果dataframe列是一个对象,则将该列拆分为多列(并这样做,直到没有json对象为止)?

4 个答案:

答案 0 :(得分:2)

如果您的address列不是字典,则可以通过以下方法将其转换为一个字典:

import ast
df.address = [ast.literal_eval(df.address[i]) for i in df.index]

然后:

df.address.apply(pd.Series)

    state   town
0   MI  Dearborn
1   CA  Los Angeles

尽管不确定数据集的长度,也可以通过以下方式实现:

def literal_return(val):
try:
    return ast.literal_eval(val)
except (ValueError, SyntaxError) as e:
    return val
df.address.apply(literal_return)

>>%timeit [ast.literal_eval(df.address[i]) for i in df.index]
144 µs ± 2.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>%timeit df.address.apply(literal_return)
454 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

答案 1 :(得分:1)

简单得多:

df = pd.DataFrame({'address': [{'state': 'MI', 'town': 'Dearborn'} , {'state': 'CA', 'town': 'Los Angeles'}], 'name':['John', 'Jane']})

df = df.join(df['address'].apply(pd.Series))

然后

df.drop(columns='address')

答案 2 :(得分:1)

这是使用df['city'] = df.address.apply(lambda x: x.split(',')[0].split(':')[1].replace("'","").replace("}","")) df['state'] = df.address.apply(lambda x: x.split(',')[1].split(':')[1].replace("'","").replace("}","")) df.drop(columns=['address'], inplace=True) 的一种方法:

model = Sequential()
model.add(Conv3D(2, (3,3,3), padding = 'same', input_shape= [num_of_frame, 
          img_rows,img_cols, img_channels] ))
model.add(Activation('relu'))
model.add(Conv3D(64, (3,3,3)))
model.add(Activation('relu'))
model.add(MaxPooling3D(pool_size=(2, 2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(32))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

答案 3 :(得分:0)

在下面使用(reference

from pandas.io.json import json_normalize