加入两个数据透视表并在熊猫中获得每个单元格的多值

时间:2017-09-07 13:15:34

标签: python pandas join pivot-table

我有两个具有相同行和列的数据透视表,我需要创建一个表,其值由单元格中具有相等行和列的comas分隔。

例如

表1

       1    2    3    4
a     t1a1 t1a2 t1a3 t1a4
b     t1b1 t1b2 t1b3 t1b4

表2

       1    2    3    4
a     t2a1 t2a2 t2a3 t2a4
b     t2b1 t2b2 t2b3 t2b4

我想:

表结果

          1             2           3            4
a     (t1a1,t2a1)   (t1a2,t2a2)  (t1a3,t2a3)   (t1a4,t2a4)
b     (t1b1,t2b1)   (t1b2,t2b2)   (t1b3,t2b3)   (t1b4,t2b4)

concat函数返回

        1    2    3    4   1    2    3    4
a     t1a1 t1a2 t1a3 t1a4  t2a1 t2a2 t2a3 t2a4
b     t1b1 t1b2 t1b3 t1b4  t2b1 t2b2 t2b3 t2b4  

我在python中使用pandas库

谢谢

2 个答案:

答案 0 :(得分:3)

如果需要字符串输出,您可以使用所有DataFrame的concnecation:

df = '(' + df1 + ' , ' + df2 + ')'
#if numeric columns first cast to str
#df = '(' + df1.astype(str) + ' , ' + df2.astype(str) + ')'
print (df)
               1              2              3              4
a  (t1a1 , t2a1)  (t1a2 , t2a2)  (t1a3 , t2a3)  (t1a4 , t2a4)
b  (t1b1 , t2b1)  (t1b2 , t2b2)  (t1b3 , t2b3)  (t1b4 , t2b4)

如果需要元组:

df = pd.concat([df1, df2],  keys=['a','b']) \
       .groupby(level=1).agg(lambda x: tuple(x))
print (df)
              1             2             3             4
a  (t1a1, t2a1)  (t1a2, t2a2)  (t1a3, t2a3)  (t1a4, t2a4)
b  (t1b1, t2b1)  (t1b2, t2b2)  (t1b3, t2b3)  (t1b4, t2b4)

答案 1 :(得分:1)

这是一个简单的方法

    df1 = pd.DataFrame(np.array([
    ['a','t1a1','t1a2','t1a3','t1a4'],
    ['b','t1b1','t1b2','t1b3','t1b4'],
    ['c','t1c1','t1c2','t1c3','t1c4']]),
    columns=['name', 'attr11', 'attr12', 'attr13', 'attr14'])
df2 = pd.DataFrame(np.array([
    ['a','t2a1','t2a2','t2a3','t2a4'],
    ['b','t2b1','t2b2','t2b3','t2b4'],
    ['c','t2c1','t2c2','t2c3','t2c4']]),
    columns=['name', 'attr21', 'attr22', 'attr23', 'attr24'])
df3 =pd.merge(df1,df2,on='name')


df3["attr1"] = '('+ df3['attr11']+ ',' +df3['attr21'] +')'
df3["attr2"] = '('+ df3['attr12']+ ',' +df3['attr22'] +')'
df3["attr3"] = '('+ df3['attr13']+ ',' +df3['attr23'] +')'
df3["attr4"] = '('+ df3['attr14']+ ',' +df3['attr24'] +')'
print (df3[['name','attr1','attr2','attr3','attr4',]])

输出

  name        attr1        attr2        attr3        attr4
0    a  (t1a1,t2a1)  (t1a2,t2a2)  (t1a3,t2a3)  (t1a4,t2a4)
1    b  (t1b1,t2b1)  (t1b2,t2b2)  (t1b3,t2b3)  (t1b4,t2b4)
2    c  (t1c1,t2c1)  (t1c2,t2c2)  (t1c3,t2c3)  (t1c4,t2c4)