Question

我有一个pandas数据框，其中包含几个id字段，另一个字段包含一个我需要绑定到id字段的附加值字典。现有数据框

下面的代码可以满足我的要求，但是速度很慢。有没有更有效的方法来获得相同的结果？

import pandas as pd

# Create sample table
a=[{'Feature1': 'aa1','Feature2': 'bb1','Feature3': 'cc2' },
 {'Feature1': 'aa2','Feature2': 'bb2', 'Feature3': 'abc' },
 {'Feature1': 'aa1','Feature2': 'cc1', 'Feature3': 'xyz' }
 ]
b=['num1','num2','num3']
c=['numa', 'numb', 'numc']

df = pd.DataFrame({'id1':b, 'id2':c, 'dic':a })

# Specify fields to construct the empty dataframe
cols = [
    'id1',
    'id2',
    'Feature1',
    'Feature2',
    'Feature3'
    ]
results = pd.DataFrame(columns=cols)

# Iterate through each row and grab values
for idx, row in df.iterrows():
    id_records = list(row[['id1', 'id2']])
    other_vals = list(row['dic'].values())
    results.loc[idx] = id_records+other_vals

编辑：在我的实际用例中，某些词典缺少某些键。例如，第二行可能没有“ Feature2”。我希望该字段对于该记录为空。我不确定如何以低效的方式执行此操作。

代码定义的内容更接近我的实际数据。

# Create sample table
a=[{'Feature1': 'aa1','Feature2': 'bb1','Feature3': 'cc2' },
 {'Feature1': 'aa2', 'Feature3': 'abc' },
 {'Feature1': 'aa1','Feature2': 'cc1', 'Feature3': 'xyz' }
 ]
b=['num1','num2','num3']
c=['numa', 'numb', 'numc']

df = pd.DataFrame({'id1':b, 'id2':c, 'dic':a })

解决方案：

除了以下解决方案之外，我的实际数据尚未注册为真实字典。它被注册为看起来像字典的字符串。我不得不将其转换为字典，然后以下解决方案起作用了。

这就是我的方式

import json

def convert_to_dict(string):
    return(json.loads(string))

df['fieldName'] = df.fieldName.apply(convert_to_dict)

完成此操作后，Andy的解决方案对我来说效果很好。

Answer 1

编辑：您更新了示例。缺少的SESSION_ID Play Length of session (Time between action=start & action=End 23 32 215 16 352 51被构造为Feature

NaN

尝试一下

df_final = df.drop('dic',1).join(pd.DataFrame.from_dict(df.dic.to_dict(), 
                                                        orient='index'))

Out[1082]:
    id1   id2 Feature1 Feature2 Feature3
0  num1  numa      aa1      bb1      cc2
1  num2  numb      aa2      NaN      abc
2  num3  numc      aa1      cc1      xyz

其他几种不同的方式：

来自@Shubham Sharma：

df_final = df.drop('dic',1).join(pd.DataFrame.from_dict(df.dic.to_dict(), 
                                                        orient='index'))

Out[1060]:
    id1   id2 Feature1 Feature2 Feature3
0  num1  numa      aa1      bb1      cc2
1  num2  numb      aa2      bb2      abc
2  num3  numc      aa1      cc1      xyz

来自@anky：

df_final = df.drop('dic', 1).join(pd.DataFrame(df['dic'].tolist()))

如何将单个pandas数据框列中包含的字典转换为单独的列？

1 个答案: