我有一个包含2列的数据框DF:
CLASS STUDENT
'Sci' 'Francy'
'Sci' Vacant
'math' 'Alex'
'math' 'Arthur'
'math' 'Katy'
'eng' 'Jack'
'eng' Vacant
'eng' 'Francy'
'Hist' 'Francy'
'Hist' 'Francy'
我需要所有班级有1个vacant
学生。其中一些已经有。
结果
CLASS STUDENT
'Sci' 'Francy'
'Sci' Vacant
'math' 'Alex'
'math' 'Arthur'
'math' 'Katy'
'math' Vacant
'eng' 'Jack'
'eng' Vacant
'eng' 'Francy'
'Hist' 'Francy'
'Hist' 'Francy'
'Hist' Vacant
我尝试过
unique_class = DF['unique_class'].drop_duplicates()
vacant_column = pd.Series(['vacant'] * unique_class.shape[0])
temp_df = pd.concat([unique_class, vacant_column], axis=1, ignore_index=True)
DF = DF.append(temp_df, ignore_index=True)
DF.drop_duplicates(inplace=True)
它有效,但似乎太多了。还有更好的方法吗?
答案 0 :(得分:2)
这是另一种方法:
# Copy of your data
df = pd.DataFrame({
"class": ["Sci", "Sci", "math", "math", "math", "eng", "eng", "eng", "Hist", "Hist"],
"student": ["Francy", "vacant", "Alex", "Arthur", "Katy", "Jack", "vacant", "Francy", "Francy", "Francy"]
})
# An identical DF with all students equal to "vacant"
vacant_df = pd.DataFrame({"class": df["class"], "student": "vacant"})
# Remove existing 'vacant' from original DF and concatenate with de-duplicated vacant dataframe (to avoid duplicate 'vacant' entries)
final_df = pd.concat([df.loc[df.student != "vacant", vacant_df.drop_duplicates("class")])
原始DF:
class student
8 Hist Francy
9 Hist Francy
0 Sci Francy
1 Sci vacant
5 eng Jack
6 eng vacant
7 eng Francy
2 math Alex
3 math Arthur
4 math Katy
最终DF:
class student
8 Hist Francy
9 Hist Francy
8 Hist vacant
0 Sci Francy
0 Sci vacant
5 eng Jack
7 eng Francy
5 eng vacant
2 math Alex
3 math Arthur
4 math Katy
2 math vacant
答案 1 :(得分:2)
记录下来,您的解决方案没有错。您可以使用几乎相同的方法在“单线”中获得相同的结果:
df = df.append(df[['CLASS']].drop_duplicates().assign(STUDENT='Vacant')).drop_duplicates()
[出]
CLASS STUDENT
0 Sci Francy
1 Sci Vacant
2 math Alex
3 math Arthur
4 math Katy
5 eng Jack
6 eng Vacant
7 eng Francy
8 Hist Francy
2 math Vacant
8 Hist Vacant
如果需要,您可以在sort_values
和reset_index
上进行链接,以使表更易读:
df = (df.append(df[['CLASS']].drop_duplicates().assign(STUDENT='Vacant'))
.drop_duplicates()
.sort_values('CLASS')
.reset_index(drop=True))
[出]
CLASS STUDENT
0 Hist Francy
1 Hist Vacant
2 Sci Francy
3 Sci Vacant
4 eng Jack
5 eng Vacant
6 eng Francy
7 math Alex
8 math Arthur
9 math Katy
10 math Vacant
答案 2 :(得分:2)
使用pd.merge
df_new = pd.DataFrame({'CLASS': df['CLASS'].unique(), 'STUDENT':'vacant'})
df_new.merge(df, how='outer', on=['CLASS','STUDENT'])
# Use `.sort_values(by='CLASS') if sorted df needed
输出:
CLASS STUDENT
0 Sci vacant
1 math vacant
2 eng vacant
3 Hist vacant
4 Sci Francy
5 math Alex
6 math Arthur
7 math Katy
8 eng Jack
9 eng Francy
10 Hist Francy
11 Hist Francy