分组数据框,因为它们有共同点

时间:2018-01-29 05:19:14

标签: python pandas pandas-groupby

我有超过1000行的pandas数据框,看起来有点像这样:

Copy    name        type    ntv
G1       BA          X      0.45
G1       BB          X      0.878
G1       C           Z      0.19
G1       LA1         Y      1.234
G1       L           Y      0.09
G1       LB          Y      1.056
F2       BA1         X      -7.890
F2       BB          X      2.345
F2       MA          Y      -0.871
F2       LB1         Y      0.737

在上面的示例(df1)中,有两组'复制'列,G1和F2,具有各种名称,以及X,Y和Z三种类型。

我想创建另一个看起来像下面的数据框(df2),它们以X-Y或Z-Y的形式组合在一起。

Model      ntv_1       ntv_2    
G1BA-LA1   0.45        1.234        
G1BB-LB    0.878       1.056    
G1C-L      0.19        0.09    
F2BA1-MA   -7.890      -0.871       
F2BB-LB1   2.345       0.737    

对于组X-Y,他们的共同点是df1 [' name']的第二个字符。所以,我决定这样做:

c = df1[(df1['name'].str[0]=='B' & (df1['ntv'] != 0.0)]
h = df1[((df1['name'].str[0]=='L')|(df1['name'].str[0]=='M')) & (df['ntv'] != 0.0)]
b = (c.loc[:,c['name'].str[1]] == h.loc[:,h['name'].str[1]]).groupby('Copy')
df2['Model'] = c['Copy'].astype(str) + c['name'].astype(str) + '-' + h['name'].astype(str)
df2['ntv_1'] = c['ntv']
df2['ntv_2'] = h['ntv']

我收到了KeyError消息。所以我决定这样做:

ca = c['name'].str[1].dropna()
ha = h['name'].str[1].dropna()
if ca == ha:
  df2['Model'] = c['Copy'].astype(str) + c['name'].astype(str) + '-' + h['name'].astype(str)
  df2['ntv_1'] = c['ntv']
  df2['ntv_2'] = h['ntv']

但是我得到了一个ValueError:"系列长度必须匹配才能比较。"

请问如何将数据帧分组为X-Y或Z-Y形式?提前谢谢!

1 个答案:

答案 0 :(得分:1)

问题0c未对齐,因为不同的索引和可能的不同长度:

h
#added condition for remove all rows with no second value in name
c = df1[(df1['name'].str[0]=='B') & (df1['ntv'] != 0.0) &
        (df1['name'].str[1].notnull())].copy()

#created MultiIndex for align with Counter duplicates
ca = c['name'].str[1]
c.index = [ca, c.groupby(ca).cumcount()]

#added condition for remove all rows with no second value in name
h = df1[((df1['name'].str[0]=='L')|(df1['name'].str[0]=='M')) & 
         (df1['ntv'] != 0.0) & (df1['name'].str[1].notnull())].copy()

#created MultiIndex for align with Counter duplicates
ha = h['name'].str[1]
h.index = [ha, h.groupby(ha).cumcount()]
print (c)
       copy name type    ntv
name                        
A    0   G1   BA    X  0.450
B    0   G1   BB    X  0.878
A    1   F2  BA1    X -7.890
B    1   F2   BB    X  2.345

print (h)
       copy name type    ntv
name                        
A    0   G1  LA1    Y  1.234
B    0   G1   LB    Y  1.056
A    1   F2   MA    Y -0.871
B    1   F2  LB1    Y  0.737