通过列/值对列表附加列

时间:2017-06-12 09:02:20

标签: python pandas dataframe

我的数据框df包含A列和B

A       |     B
---------------
1       |     2
4       |     3

我想应用一个获得getData的函数A并返回一个元组列(列/值对):

示例,第一行:

[('C', 5), ('D', 1), ('Z', 1)]

和第二行:

[('E', 5), ('Z', 3)]

我的目标是看到这样的结果数据框(替换了缺失值):

A       |     B    |     C    |     D   |     E    |     Z
----------------------------------------------------------
1       |     2    |     5    |     1   |     0    |     1
4       |     3    |     0    |     0   |     5    |     3

有没有简短的解决方案?

1 个答案:

答案 0 :(得分:2)

如果可以修改功能,您可以将键值转换为dict,然后转换为Series

def getData(x):
    if x == 1:
        a = [('C', 5), ('D', 1), ('Z', 1)]
    else:
        a = [('E', 5), ('Z', 3)]

    return (pd.Series(dict(a)))

df1 = df['A'].apply(getData)
print (df1)
     C    D    E    Z
0  5.0  1.0  NaN  1.0
1  NaN  NaN  5.0  3.0

或者使用列表理解与DataFrame构造函数:

s = df['A'].apply(getData)
print (s)
0    [(C, 5), (D, 1), (Z, 1)]
1            [(E, 5), (Z, 3)]
Name: A, dtype: object

df1 = pd.DataFrame([dict(x) for x in s])
print (df1)

     C    D    E  Z
0  5.0  1.0  NaN  1
1  NaN  NaN  5.0  3

最后join原作,移除NaN并转换为int

df1 = df.join(df1).fillna(0).astype(int)
print (df1)
   A  B  C  D  E  Z
0  1  2  5  1  0  1
1  4  3  0  0  5  3

编辑:

Numpy solution

df['A'] = df['A'].apply(getData)
print (df)
                          A  B
0  [(C, 5), (D, 1), (Z, 1)]  2
1          [(E, 5), (Z, 3)]  3

tid1 = df.index
lens = [len(i) for i in df['A'].values]
tid2 = tid1.repeat(lens)
cat, prob = np.concatenate(df['A'].values).T
ucat, inv = np.unique(cat, return_inverse=True)
data = np.zeros((len(tid1), len(ucat)), dtype=float)
data[tid2, inv] = prob
df1 = pd.DataFrame(data, tid1, ucat)
print (df1)
     C    D    E    Z
0  5.0  1.0  0.0  1.0
1  0.0  0.0  5.0  3.0