Question

我有一个包含（记录格式化）json字符串的数据帧，如下所示：

In[9]: pd.DataFrame( {'col1': ['A','B'], 'col2': ['[{"t":"05:15","v":"20.0"}, {"t":"05:20","v":"25.0"}]', 
                                                '[{"t":"05:15","v":"10.0"}, {"t":"05:20","v":"15.0"}]']})

Out[9]: 
  col1                                               col2
0    A  [{"t":"05:15","v":"20.0"}, {"t":"05:20","v":"2...
1    B  [{"t":"05:15","v":"10.0"}, {"t":"05:20","v":"1...

我想提取json，并为每个记录向数据帧添加一个新行：

    co1 t           v
0   A   05:15:00    20
1   A   05:20:00    25
2   B   05:15:00    10
3   B   05:20:00    15

我一直在尝试以下代码：

def json_to_df(x):
    df2 = pd.read_json(x.col2)
    return df2

df.apply(json_to_df, axis=1)

但结果数据框被指定为元组，而不是创建新行。有什么建议吗？

Answer 1

apply的问题是您需要返回多行，并且只需要一行。一个可能的解决方案：

def json_to_df(row):
    _, row = row
    df_json = pd.read_json(row.col2)
    col1 = pd.Series([row.col1]*len(df_json), name='col1')
    return pd.concat([col1,df_json],axis=1)
df = map(json_to_df, df.iterrows())      #returns a list of dataframes
df = reduce(lambda x,y:x.append(y), x)   #glues them together
df

col1    t   v
0   A   05:15   20
1   A   05:20   25
0   B   05:15   10
1   B   05:20   15

Answer 2

好的，从上面的hellpanderrr的回答中得到一点启发，我想出了以下内容：

In [92]:
pd.DataFrame( {'X': ['A','B'], 'Y': ['fdsfds','fdsfds'], 'json': ['[{"t":"05:15","v":"20.0"}, {"t":"05:20","v":"25.0"}]', 
                                                                       '[{"t":"05:15","v":"10.0"}, {"t":"05:20","v":"15.0"}]']},)
Out[92]:
X   Y   json
0   A   fdsfds  [{"t":"05:15","v":"20.0"}, {"t":"05:20","v":"2...
1   B   fdsfds  [{"t":"05:15","v":"10.0"}, {"t":"05:20","v":"1...

In [93]:
dfs = []
def json_to_df(row, json_col):
    json_df = pd.read_json(row[json_col])
    dfs.append(json_df.assign(**row.drop(json_col)))

_.apply(json_to_df, axis=1, json_col='json')
pd.concat(dfs)

Out[93]:
t   v   X   Y
0   05:15   20  A   fdsfds
1   05:20   25  A   fdsfds
0   05:15   10  B   fdsfds
1   05:20   15  B   fdsfds

Pandas在列中解析json并扩展到dataframe中的新行

2 个答案: