Question

我已使用read_excel函数将excel文件读入pandas数据框。 dtype为'object'的'filter'列之一实际上包含json。

我尝试将json_normalize用作：

import json
pd.columns=pd.columns.astype(str)
json_normalize(data=json.loads(row['filter'])['and']).ffill().bfill().drop_duplicates(keep='first')

有关为什么使用填充和填充的说明：How to combine multiple rows in a pandas dataframe which have only 1 non-null entry per column into one row?

json的结构随着“或”键中的多个“类别”键而不断变化。在某些情况下，没有“或”键。例子：

{"and":[{"label_1":{"eq":"approved"}},{"or":[{"category":{"eq":"x > y"}},{"category":{"eq":"a > b > c"}}]},{"species":{"eq":"k"}}]}

{"and":[{"label_0":{"gt":4}},{"price":{"gt":999}},{"price":{"lt":199000}},{"label_1":{"eq":"other"}},{"cur":{"eq":"JPY"}}]}

解析预期结果的想法：我需要将所有观察中的所有键作为列。如果在特定观察中不存在该键，则该值不为NaN。

如何解析从.xlsx文件读取的pandas数据框中存在的嵌套字典列？

0 个答案: