我有一个看起来像这样的数据框:
dcc3 manager1 manager2
party_num
L21635789 SBAS01030 A22677981 NaN
L21635789 SBAS02030 NaN A22810282
L21635789 SBAS03030 NaN A21721880
我正在尝试将存在的manager2的一行(无论哪个)“覆盖”到具有空白/ NaN的manager1的行中,如下所示:
dcc3 manager1 manager2
party_num
L21635789 SBAS01030 A22677981 A22810282
L21635789 SBAS02030 NaN NaN
L21635789 SBAS03030 NaN NaN
OR
dcc3 manager1 manager2
party_num
L21635789 SBAS01030 A22677981 A21721880
L21635789 SBAS02030 NaN NaN
L21635789 SBAS03030 NaN NaN
显然,我们需要在DCC3上重新索引,但是那又如何呢?它只需要覆盖这两列(并且只要存在其他列就可以覆盖这些列)
我真的可以使用帮助,在此先谢谢您。
对不起,我没有弄清楚,这是一个基本情况。在某些情况下,这可能只是一个值(不适用于此值),或最多5-6。我以3行为例。
答案 0 :(得分:1)
您可以使用np.where
完成此操作:
df['manager2'] = np.where(df['manager1'].notnull() & df['manager2'].isnull(),
df['manager2'].dropna().iloc[0], np.nan) # You could do df['manager2'].dropna().iloc[1] for the other value
df
Out[1]:
dcc3 manager1 manager2
party_num
L21635789 SBAS01030 A22677981 A22810282
L21635789 SBAS02030 NaN nan
L21635789 SBAS03030 NaN nan
答案 1 :(得分:1)
这两行代码应该可以为您解决问题。
df.manager2 = df.manager2.bfill().ffill()
df.loc[df.manager1.isnull(), 'manager2'] = np.NaN
以下是我尝试过的几种情况,代码是相同的。看看这是否是您想要的。
import pandas as pd
import numpy as np
c=['party_num','dcc3','manager1','manager2']
行1:manager1 = NaN,manager2 =值
结果:将manager2的值分配给第2行
print ('\nScenario 1')
print ('row 1: manager 1: NaN, manager 2: value; pick row2 manager 1 value')
d = [['L21635789','SBAS01030',np.NaN,'A22810282'],
['L21635789','SBAS02030','A22677981',np.NaN],
['L21635789','SBAS03030',np.NaN,'A21721880']]
df = pd.DataFrame(data=d,columns=c)
print (df)
df.manager2 = df.manager2.bfill().ffill()
df.loc[df.manager1.isnull(), 'manager2'] = np.NaN
print ()
print (df)
方案1的输出
Scenario 1
row 1: manager 1: NaN, manager 2: value; pick row2 manager 1 value
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 NaN A22810282
1 L21635789 SBAS02030 A22677981 NaN
2 L21635789 SBAS03030 NaN A21721880
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 NaN NaN
1 L21635789 SBAS02030 A22677981 A21721880
2 L21635789 SBAS03030 NaN NaN
行1:manager1 =值,manager2 = NaN
结果:将manager2的值分配给第1行
print ('\nScenario 2')
print ('row 1: manager 1: value, manager 2: NaN; pick row2 manager 2 value')
d = [['L21635789','SBAS01030','A22677981',np.NaN],
['L21635789','SBAS02030',np.NaN,'A22810282'],
['L21635789','SBAS03030',np.NaN,'A21721880']]
df = pd.DataFrame(data=d,columns=c)
print (df)
df.manager2 = df.manager2.bfill().ffill()
df.loc[df.manager1.isnull(), 'manager2'] = np.NaN
print ()
print (df)
方案2的输出
Scenario 2
row 1: manager 1: value, manager 2: NaN; pick row2 manager 2 value
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 A22677981 NaN
1 L21635789 SBAS02030 NaN A22810282
2 L21635789 SBAS03030 NaN A21721880
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 A22677981 A22810282
1 L21635789 SBAS02030 NaN NaN
2 L21635789 SBAS03030 NaN NaN
行1:manager1 = NaN,manager2 = NaN
第2行:manager1 =值; manager2 = NaN;第3行:manager2 =值
结果:将manager3的值分配给第2行
print ('\nScenario 3')
print ('row 1: manager 1: NaN, manager 2: NaN; pick row2 manager 1 & row 3 manager 2')
d = [['L21635789','SBAS01030',np.NaN,np.NaN],
['L21635789','SBAS02030','A22677981',np.NaN],
['L21635789','SBAS03030',np.NaN,'A21721880']]
df = pd.DataFrame(data=d,columns=c)
print (df)
df.manager2 = df.manager2.bfill().ffill()
df.loc[df.manager1.isnull(), 'manager2'] = np.NaN
print ()
print (df)
方案3的输出
Scenario 3
row 1: manager 1: NaN, manager 2: NaN; pick row2 manager 1 & row 3 manager 2
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 NaN NaN
1 L21635789 SBAS02030 A22677981 NaN
2 L21635789 SBAS03030 NaN A21721880
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 NaN NaN
1 L21635789 SBAS02030 A22677981 A21721880
2 L21635789 SBAS03030 NaN NaN
行1:manager1 =值,manager2 = NaN
第3行:manager1 =值,manager2 =值
结果:忽略第1行和第2行,因为第3行同时具有manager1和manager2的值
print ('\nScenario 4')
print ('row 1: manager 1: NaN, manager 2: value; row3 has both manager 1 & manager 2')
d = [['L21635789','SBAS01030',np.NaN,'A21721880'],
['L21635789','SBAS02030',np.NaN,np.NaN],
['L21635789','SBAS03030','A22677981','A21721882']]
df = pd.DataFrame(data=d,columns=c)
print (df)
df.manager2 = df.manager2.bfill().ffill()
df.loc[df.manager1.isnull(), 'manager2'] = np.NaN
print ()
print (df)
方案4的输出:
Scenario 4
row 1: manager 1: NaN, manager 2: value; row3 has both manager 1 & manager 2
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 NaN A21721880
1 L21635789 SBAS02030 NaN NaN
2 L21635789 SBAS03030 A22677981 A21721882
party_num dcc3 manager1 manager2
0 L21635789 SBAS01030 NaN NaN
1 L21635789 SBAS02030 NaN NaN
2 L21635789 SBAS03030 A22677981 A21721882