合并特定行pandas df

时间:2018-07-27 01:36:19

标签: python pandas sorting dataframe merge

我当前正在将pandas df行中的所有值合并到任何4个字母string之前。但是我希望应用此特定行而不是所有行。具体来说,我只想将其应用于XCol A正下方的行。因此,如果是X,则将函数应用于下面的行。

d = ({
    'A' : ['X','Foo','No','X','Foo','X','F'],           
    'B' : ['','Bar','Merge','','Barr','','oo'],
    'C' : ['','XXXX','XXXX','','','','B'],
    'D' : ['','','','','','','ar'],
    'E' : ['','','','','','','XXXX'],          
    })

df = pd.DataFrame(data=d)

此代码合并所有4个字母字符串之前的所有值:

mask = (df.iloc[:, 1:].applymap(len) == 4).cumsum(1) == 0
df.A = df.A + df.iloc[:, 1:][mask].fillna('').apply(lambda x: x.sum(), 1)
df.iloc[:, 1:] = df.iloc[:, 1:][~mask].fillna('')

输出:

         A     B     C D     E
0        X                    
1   FooBar        XXXX        
2  NoMerge        XXXX        
3        X                    
4      Foo  Barr              
5        X                    
6   FooBar                XXXX

如您所见,这合并了整个Column。我试图将其仅应用于X中值Col A下的行。我想我需要

if val in Col.A == 'X':
##Do this to the row directly beneath
mask = (df.iloc[:, 1:].applymap(len) == 4).cumsum(1) == 0
df.A = df.A + df.iloc[:, 1:][mask].fillna('').apply(lambda x: x.sum(), 1)
df.iloc[:, 1:] = df.iloc[:, 1:][~mask].fillna('')

预期输出:

        A      B     C D     E
0       X                     
1  FooBar         XXXX        
2      No  Merge  XXXX        
3       X                     
4     Foo   Barr              
5       X                     
6  FooBar                 XXXX

2 个答案:

答案 0 :(得分:0)

IIUC

s = df.A == 'X'
s2 = df.C.str.len() == 4
s2 = s2[s2].index
ind = s[s].index + 1
df.loc[ind & s2, 'A'] = df.loc[ind & s2, 'A'] + df.loc[ind & s2, 'B']
df.loc[ind & s2, 'B'] = ''

'

    A       B       C
0   X       
1   FooBar          XXXX
2   No      Merge   XXXX
3   X       
4   Foo     Barr    XXX
5   X       
6   FooBar          XXXX

答案 1 :(得分:0)

我们还需要为X下行条件创建一个掩码。我为此准备了一系列maskX,然后使用它来更新您准备的mask。最终结果是所需的输出。

d = ({
    'A' : ['X','Foo','No','X','Foo','X','F'],
    'B' : ['','Bar','Merge','','Barr','','oo'],
    'C' : ['','XXXX','XXXX','','','','B'],
    'D' : ['','','','','','','ar'],
    'E' : ['','','','','','','XXXX'],
    })


df = pd.DataFrame(data=d)
print(df)

#Create the mask (as series) to handle the row-under-X condition
maskX = df.iloc[:,0].apply(lambda x: x=='X')

#In the below line use some jugglery to mark the row next to X as True
maskX.index += 1

maskX = pd.concat([pd.Series([False]), maskX])
maskX = maskX.drop(len(maskX)-1)


mask = (df.iloc[:, 1:].applymap(len) == 4).cumsum(1) == 0
#combine the effect of two masks
for i,v in maskX.items():
    mask.iloc[i,:] = mask.iloc[i,:].apply(lambda x: x and v)

df.A[maskX] = df.A + df.iloc[:, 1:][mask].fillna('').apply(lambda x: x.sum(), 1)
df.iloc[:, 1:] = df.iloc[:, 1:][~mask].fillna('')
print(df)