在python中使用panda合并行数据

时间:2018-05-22 15:10:39

标签: python pandas

我正在尝试编写一个小python应用程序,它创建一个包含配方系统数据的csv文件,

想象一下excel数据的以下结构

Manufacturer    Product Data 1  Data 2  Data 3
Test 1  Product 1   1   2   3
Test 1  Product 2   4   5   6
Test 2  Product 1   1   2   3
Test 3  Product 1   1   2   3
Test 3  Product 1   4   5   6
Test 3  Product 1   7   8   9

合并后我会像以下格式显示的数据一样,

Test 1  Product 1   1   2   3   0   0   0   0   0   0
Test 2  Product 2   4   5   6   0   0   0   0   0   0
Test 2  Product 1   1   2   3   0   0   0   0   0   0
Test 3  Product 1   1   2   3   4   5   6   7   8   9

任何帮助都会得到很好的回复,到目前为止我可以阅读熊猫数据集并转换为CSV

此致 李

3 个答案:

答案 0 :(得分:2)

使用melt,groupby,pd.Series和unstack:

(df.melt(['Manufacturer','Product'])
  .groupby(['Manufacturer','Product'])['value']
  .apply(lambda x: pd.Series(x.tolist()))
  .unstack(fill_value=0)
  .reset_index())

输出:

  Manufacturer    Product  0  1  2  3  4  5  6  7  8
0       Test 1  Product 1  1  2  3  0  0  0  0  0  0
1       Test 1  Product 2  4  5  6  0  0  0  0  0  0
2       Test 2  Product 1  1  2  3  0  0  0  0  0  0
3       Test 3  Product 1  1  4  7  2  5  8  3  6  9

答案 1 :(得分:2)

cols = ['Manufacturer', 'Product']
d = df.set_index(cols + [df.groupby(cols).cumcount()]).unstack(fill_value=0)
d

给我

                       Data 1       Data 2       Data 3      
                            0  1  2      0  1  2      0  1  2
Manufacturer Product                                         
Test 1       Product 1      1  0  0      2  0  0      3  0  0
             Product 2      4  0  0      5  0  0      6  0  0
Test 2       Product 1      1  0  0      2  0  0      3  0  0
Test 3       Product 1      1  4  7      2  5  8      3  6  9

随后跟进

d.sort_index(1, 1).pipe(lambda d: d.set_axis(range(d.shape[1]), 1, False).reset_index())

  Manufacturer    Product  0  1  2  3  4  5  6  7  8
0       Test 1  Product 1  1  2  3  0  0  0  0  0  0
1       Test 1  Product 2  4  5  6  0  0  0  0  0  0
2       Test 2  Product 1  1  2  3  0  0  0  0  0  0
3       Test 3  Product 1  1  2  3  4  5  6  7  8  9

或者

cols = ['Manufacturer', 'Product']
pd.Series({
    n: d.values.ravel() for n, d in df.set_index(cols).groupby(cols)
}).apply(pd.Series).fillna(0, downcast='infer').rename_axis(cols).reset_index()

  Manufacturer    Product  0  1  2  3  4  5  6  7  8
0       Test 1  Product 1  1  2  3  0  0  0  0  0  0
1       Test 1  Product 2  4  5  6  0  0  0  0  0  0
2       Test 2  Product 1  1  2  3  0  0  0  0  0  0
3       Test 3  Product 1  1  2  3  4  5  6  7  8  9

使用defaultdictitertools.count

from itertools import count
from collections import defaultdict

c = defaultdict(count)
pd.Series({(
    m, p, next(c[(m, p)])): v
    for _, m, p, *V in df.itertuples()
    for v in V
}).unstack(fill_value=0)

                  0  1  2  3  4  5  6  7  8
Test 1 Product 1  1  2  3  0  0  0  0  0  0
       Product 2  4  5  6  0  0  0  0  0  0
Test 2 Product 1  1  2  3  0  0  0  0  0  0
Test 3 Product 1  1  2  3  4  5  6  7  8  9

答案 2 :(得分:2)

使用groupby

df.groupby(['Manufacturer','Product']).agg(tuple).sum(1).apply(pd.Series).fillna(0)
Out[85]: 
                         0    1    2    3    4    5    6    7    8
Manufacturer Product                                              
Test1        Product1  1.0  2.0  3.0  0.0  0.0  0.0  0.0  0.0  0.0
             Product2  4.0  5.0  6.0  0.0  0.0  0.0  0.0  0.0  0.0
Test2        Product1  1.0  2.0  3.0  0.0  0.0  0.0  0.0  0.0  0.0
Test3        Product1  1.0  4.0  7.0  2.0  5.0  8.0  3.0  6.0  9.0