我目前有一个跟踪5个测试完成的数据集,但是它仅显示那些已经完成测试的人,而不是那些尚未参加测试的人-以下示例:
Name Test Completed
John Math-Test1 Yes
John Math-Test2 Yes
John Math-Test3 Yes
John Math-Test4 Yes
John Math-Test5 Yes
Lauren Math-Test1 Yes
Lauren Math-Test2 Yes
Lauren Math-Test3 Yes
Tom Math-Test1 Yes
Tom Math-Test2 Yes
Tom Math-Test3 Yes
Tom Math-Test4 Yes
Tom Math-Test5 Yes
如您所见,Lauren尚未参加'Math-Test4'和'Math-Test5'测试,因此她的名字没有出现。我想添加一个选项,当某人尚未完成测试时,让“完成”列说“否”。
期望的输出如下:
Name Test Completed
John Math-Test1 Yes
John Math-Test2 Yes
John Math-Test3 Yes
John Math-Test4 Yes
John Math-Test5 Yes
Lauren Math-Test1 Yes
Lauren Math-Test2 Yes
Lauren Math-Test3 Yes
*Lauren Math-Test4 No* - Add these rows automatically
*Lauren Math-Test5 No*
Tom Math-Test1 Yes
Tom Math-Test2 Yes
Tom Math-Test3 Yes
Tom Math-Test4 Yes
Tom Math-Test5 Yes
如何使用Python / Pandas / Numpy做到这一点?
感谢所有可以提供帮助的人!
编辑-更新:在尝试@Scott Boston的代码后,我得到了:
idx = pd.MultiIndex.from_product([df['Name'].unique(),
df['Test'].unique()],
names=['Name','Test'])
newidx = idx[~idx.isin(df.set_index(['Name','Test']).index)]
pd.concat([df,
newidx.to_series().reset_index().assign(Completed="No*")[['Name','Test','Completed']]], ignore_index=True)
输出:
Name1 Test Completed
John Math-Test1 Yes
John Math-Test2 Yes
John Math-Test3 Yes
John Math-Test4 Yes
John Math-Test5 Yes
Lauren Math-Test1 Yes
Lauren Math-Test2 Yes
Lauren Math-Test3 Yes
Tom Math-Test1 Yes
Tom Math-Test2 Yes
Tom Math-Test3 Yes
Tom Math-Test4 Yes
Tom Math-Test5 Yes
John Math-Test3 No*
John Math-Test4 No*
John Math-Test5 No*
John Math-Test2 No*
Lauren Math-Test3 No*
Lauren Math-Test4 No*
Lauren Math-Test5 No*
Lauren Math-Test2 No*
Lauren Math-Test5 No*
Lauren Math-Test1 No*
Lauren Math-Test2 No*
Lauren Math-Test4 No*
Lauren Math-Test5 No*
现在只需要找到方法来删除不需要的行,以获得所需的输出即可。
答案 0 :(得分:3)
尝试,让我们将多索引与from_product
,set_index
和reindex
一起使用,
该方法适用于所有“可见”值,如果看不到值,则需要在from_product方法中使用硬编码列表:
idx = pd.MultiIndex.from_product([df['Name'].unique(),
df['Test'].unique()],
names=['Name','Test'])
df.set_index(['Name','Test']).reindex(idx, fill_value='No*').reset_index()
输出:
Name Test Completed
0 John Math-Test1 Yes
1 John Math-Test2 Yes
2 John Math-Test3 Yes
3 John Math-Test4 Yes
4 John Math-Test5 Yes
5 Lauren Math-Test1 Yes
6 Lauren Math-Test2 Yes
7 Lauren Math-Test3 Yes
8 Lauren Math-Test4 No*
9 Lauren Math-Test5 No*
10 Tom Math-Test1 Yes
11 Tom Math-Test2 Yes
12 Tom Math-Test3 Yes
13 Tom Math-Test4 Yes
14 Tom Math-Test5 Yes
更新
idx = pd.MultiIndex.from_product([df['Name'].unique(),
df['Test'].unique()],
names=['Name','Test'])
newidx = idx[~idx.isin(df.set_index(['Name','Test']).index)]
pd.concat([df,
newidx.to_series().reset_index().assign(Completed="No*")[['Name','Test','Completed']]], sort=True, ignore_index=True)