我有数据框,即
Input Dataframe
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 Nan salem
3 III A Eng 80 gphss Nan
4 III A Mat 45 Nan salem
5 III A Eng 40 gphss Nan
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss Nan
当“班级”和“部分”列中的值匹配时,我需要替换“学校”和“城市”中的“南”。最终的结果应该是 输入数据框
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 jghss salem
3 III A Eng 80 gphss salem
4 III A Mat 45 gphss salem
5 III A Eng 40 gphss salem
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss salem
有人可以帮我吗?
答案 0 :(得分:7)
在DataFrame.groupby
列表中指定的列中使用lambda function
每组向前和向后填充缺失值-对于每种组合,每组相同的值是必需的:
cols = ['school','city']
df[cols] = df.groupby(['class','section'])[cols].apply(lambda x: x.ffill().bfill())
print (df)
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 jghss salem
3 III A Eng 80 gphss salem
4 III A Mat 45 gphss salem
5 III A Eng 40 gphss salem
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss salem
答案 1 :(得分:1)
假设每对class
和section
对应于一对唯一的school
和city
,我们可以使用groupby
:
# create a dictionary of class and section with school and city
# here we assume that for each pair and class there's a row with both school and city
# if that's not the case, we can separate the two series
school_city_dict = df[['class', 'section','school','city']].dropna().\
groupby(['class', 'section'])[['school','city']].\
max().to_dict()
# school_city_dict = {'school': {('I', 'A'): 'jghss', ('III', 'A'): 'gphss'},
# 'city': {('I', 'A'): 'salem', ('III', 'A'): 'salem'}}
# set index, prepare for map function
df.set_index(['class','section'], inplace=True)
df.loc[:,'school'] = df.index.map(school_city_dict['school'])
df.loc[:,'city'] = df.index.map(school_city_dict['city'])
# reset index to the original
df.reset_index()