我有一个csv,我正在阅读大熊猫数据框。然而,其中一列是字典形式。这是一个例子:
ColA, ColB, ColC, ColdD
20, 30, {"ab":"1", "we":"2", "as":"3"},"String"
如何将其转换为如下所示的数据框:
ColA, ColB, AB, WE, AS, ColdD
20, 30, "1", "2", "3", "String"
修改的 我解决了这个问题,它看起来像这个但是需要解析的字符串,而不是dict对象。
答案 0 :(得分:11)
根据https://stackoverflow.com/a/38231651/454773,您可以使用.apply(pd.Series)
将包含dict的列映射到新列,然后将这些新列连接回原始数据帧,减去包含原始字典的列:
dw=pd.DataFrame( [[20, 30, {"ab":"1", "we":"2", "as":"3"},"String"]],
columns=['ColA', 'ColB', 'ColC', 'ColdD'])
pd.concat([dw.drop(['ColC'], axis=1), dw['ColC'].apply(pd.Series)], axis=1)
返回:
ColA ColB ColdD ab as we
20 30 String 1 3 2
答案 1 :(得分:9)
首先从你的一行开始df
Col A Col B Col C Col D
0 20 30 {u'we': 2, u'ab': 1, u'as': 3} String1
编辑:基于OP的评论,我假设我们需要先转换字符串
import ast
df["ColC"] = df["ColC"].map(lambda d : ast.literal_eval(d))
然后我们将Col C转换为dict,转置它然后将它连接到原始df
dfNew = df.join(pd.DataFrame(df["Col C"].to_dict()).T)
dfNew
给你这个
Col A Col B Col C Col D ab as we
0 20 30 {u'we': 2, u'ab': 1, u'as': 3} String1 1 3 2
然后我们只需在dfNew中选择我们想要的列
dfNew[["Col A", "Col B", "ab", "we", "as", "Col D"]]
Col A Col B ab we as Col D
0 20 30 1 2 3 String1
答案 2 :(得分:3)
如下:
import pandas as pd
# Create mock dataframe
df = pd.DataFrame([
[20, 30, {'ab':1, 'we':2, 'as':3}, 'String1'],
[21, 31, {'ab':4, 'we':5, 'as':6}, 'String2'],
[22, 32, {'ab':7, 'we':8, 'as':9}, 'String2'],
], columns=['Col A', 'Col B', 'Col C', 'Col D'])
# Create dataframe where you'll store the dictionary values
ddf = pd.DataFrame(columns=['AB','WE','AS'])
# Populate ddf dataframe
for (i,r) in df.iterrows():
e = r['Col C']
ddf.loc[i] = [e['ab'], e['we'], e['as']]
# Replace df with the output of concat(df, ddf)
df = pd.concat([df, ddf], axis=1)
# New column order, also drops old Col C column
df = df[['Col A', 'Col B', 'AB', 'WE', 'AS', 'Col D']]
print(df)
输出:
Col A Col B AB WE AS Col D 0 20 30 1 2 3 String1 1 21 31 4 5 6 String2 2 22 32 7 8 9 String2