将包含键值对的列扩展为自己的列

时间:2019-09-20 23:33:19

标签: python pandas

我有一个熊猫数据框,如下所示:

df = pd.DataFrame({'x':['''[{"key":"Gender","value":["Men"]},
  {"key":"Shoe Size","value":["M"]},
  {"key":"Shoe Category","value":["Men's Shoes"]},
  {"key":"Color","value":["Multicolor"]},
  {"key":"Manufacturer Part Number","value":["8190-W-NAVY-7.5"]},
  {"key":"Brand","value":["Josmo"]}]''',

  '''[{"key":"Gender","value":["Women"]},
  {"key":"Size","value":["XL"]},
 {"key":"Heel Height","value":["1 Inches"]}]'''], 

  'y':['A','B']})

基本上是我希望提取到它们自己的列的键值对的列表,并且行之间的键不一致。

有什么建议吗?

2 个答案:

答案 0 :(得分:1)

这是一个可能的解决方案。但是,您必须事先找出所有可能的键值。我想,可以通过编程方式完成,但是我在这里对其进行了硬编码。另外,如果值中有多个项目,它将采用第一个。

import pandas as pd
import json

# original dataframe
df = pd.DataFrame({'x':['''[{"key":"Gender","value":["Men"]},
  {"key":"Shoe Size","value":["M"]},
  {"key":"Shoe Category","value":["Men's Shoes"]},
  {"key":"Color","value":["Multicolor"]},
  {"key":"Manufacturer Part Number","value":["8190-W-NAVY-7.5"]},
  {"key":"Brand","value":["Josmo"]}]''',

  '''[{"key":"Gender","value":["Women"]},
  {"key":"Shoe Size","value":["M"]},
  {"key":"Shoe Category","value":["Women's Shoes"]},
  {"key":"Color","value":["Multicolor"]},
  {"key":"Manufacturer Part Number","value":["8190-W-NAVY-7.5"]}]'''], 

  'y':['A','B']})

expanded_columns = ['Gender', 'Shoe Size', 'Shoe Category', 'Color',
                    'Manufacturer Part Number', 'Brand']

# function to create list of values from json text
def json_to_cols(s):
  l = json.loads(s)
  d = {i:None for i in expanded_columns}

  for row in l:
    d[row['key']] = row['value'][0]

  return list(d.values())

# Create new dataframe with expanded columns
df1 = df.apply(lambda row: pd.Series(json_to_cols(row['x']), index=expanded_columns),
            axis=1)    
new_df = df.join(df1)
print(new_df)

答案 1 :(得分:0)

尚不清楚您想要什么,但是以下代码将生成一个数据帧,其中的列名称取自y,索引取自x的键,以及值每列的值均取自x中的值,NaN用于未出现的任何键。

output_df = pd.DataFrame(
            {input_row[1]['y']:
                {
                        pair['key']: pair['value'][0] 
                        for pair in ast.literal_eval(input_row[1]['x'])
                }
                for input_row in df.iterrows()
            }      
        )

输出:

                                   A         B
Brand                               Josmo       NaN
Color                          Multicolor       NaN
Gender                                Men     Women
Heel Height                           NaN  1 Inches
Manufacturer Part Number  8190-W-NAVY-7.5       NaN
Shoe Category                 Men's Shoes       NaN
Shoe Size                               M       NaN
Size                                  NaN        XL