我有一个DataFrame:
import pandas as pd
df = pd.DataFrame({'First': ['Sam', 'Greg', 'Steve', 'Sam',
'Jill', 'Bill', 'Nod', 'Mallory', 'Ping', 'Lamar'],
'Last': ['Stevens', 'Hamcunning', 'Strange', 'Stevens',
'Vargas', 'Simon', 'Purple', 'Green', 'Simon', 'Simon'],
'Address': ['112 Fake St',
'13 Crest St',
'14 Main St',
'112 Fake St',
'2 Morningwood',
'7 Cotton Dr',
'14 Main St',
'20 Main St',
'7 Cotton Dr',
'7 Cotton Dr'],
'Status': ['Infected', '', 'Infected', '', '', '', '','', '', 'Infected'],
})
我应用以下分组代码
df_index = df.groupby(['Address', 'Last']).filter(lambda x: (x['Status'] == 'Infected').any()).index
df.loc[df_index, 'Status'] = 'Infected'
而不是将所有内容标记为" Infected"如在分组代码中。有没有一种方法可以选择要更新的值,以便将它们标记为其他值?例如:
df2 = df.copy(deep=True)
df2['Status'] = ['Infected', '', 'Infected', 'Infected2', '', 'Infected2', '', '', 'Infected2', 'Infected']
答案 0 :(得分:0)
我认为这会达到你想要的结果,但会有所不同:
def infect_new_people(group):
if (group['Status'] == 'Infected').any():
# Only affect people not already infected
group.loc[group['Status'] != 'Infected', 'Status'] = 'Infected2'
return group['Status']
# Need group_keys=False so that each group has the same index
# as the original dataframe
df['Status'] = df.groupby(['Address', 'Last'], group_keys=False).apply(infect_new_people)
df
Out[36]:
Address First Last Status
0 112 Fake St Sam Stevens Infected
1 13 Crest St Greg Hamcunning
2 14 Main St Steve Strange Infected
3 112 Fake St Sam Stevens Infected2
4 2 Morningwood Jill Vargas
5 7 Cotton Dr Bill Simon Infected2
6 14 Main St Nod Purple
7 20 Main St Mallory Green
8 7 Cotton Dr Ping Simon Infected2
9 7 Cotton Dr Lamar Simon Infected