我有以下 df:
df = {'Modality': {('002_S_0413', '1', '6/21/2017', 'DTI'): 1,
('002_S_0413', '1', '6/21/2017', 'FLAIR'): 1,
('002_S_0413', '1', '6/21/2017', 'T1'): 1,
('002_S_0413', '3', '8/27/2019', 'DTI'): 1,
('002_S_0413', '3', '8/27/2019', 'FLAIR'): 1,
('002_S_0413', '3', '8/27/2019', 'T1'): 1,
('002_S_1261', '1', '3/15/2017', 'DTI'): 1,
('002_S_1261', '1', '3/15/2017', 'FLAIR'): 1,
('002_S_1261', '1', '3/15/2017', 'T1'): 1,
('002_S_1261', '2', '4/24/2018', 'DTI'): 1,
('002_S_1261', '2', '4/24/2018', 'FLAIR'): 1,
('002_S_1261', '2', '4/24/2018', 'T1'): 1,
('002_S_1261', '3', '5/01/2019', 'DTI'): 1,
('002_S_1261', '3', '5/01/2019', 'FLAIR'): 1,
('002_S_1261', '3', '5/01/2019', 'T1'): 1,
('002_S_1280', '1', '3/13/2017', 'DTI'): 1,
('002_S_1280', '1', '3/13/2017', 'FLAIR'): 1,
('002_S_1280', '3', '3/06/2019', 'DTI'): 1,
('002_S_4213', '1', '8/14/2017', 'FLAIR'): 1,
('002_S_4213', '1', '8/14/2017', 'T1'): 1},
'Phase': {('002_S_0413', '1', '6/21/2017', 'DTI'): 1,
('002_S_0413', '1', '6/21/2017', 'FLAIR'): 1,
('002_S_0413', '1', '6/21/2017', 'T1'): 1,
('002_S_0413', '3', '8/27/2019', 'DTI'): 1,
('002_S_0413', '3', '8/27/2019', 'FLAIR'): 1,
('002_S_0413', '3', '8/27/2019', 'T1'): 1,
('002_S_1261', '1', '3/15/2017', 'DTI'): 1,
('002_S_1261', '1', '3/15/2017', 'FLAIR'): 1,
('002_S_1261', '1', '3/15/2017', 'T1'): 1,
('002_S_1261', '2', '4/24/2018', 'DTI'): 1,
('002_S_1261', '2', '4/24/2018', 'FLAIR'): 1,
('002_S_1261', '2', '4/24/2018', 'T1'): 1,
('002_S_1261', '3', '5/01/2019', 'DTI'): 1,
('002_S_1261', '3', '5/01/2019', 'FLAIR'): 1,
('002_S_1261', '3', '5/01/2019', 'T1'): 1,
('002_S_1280', '1', '3/13/2017', 'DTI'): 1,
('002_S_1280', '1', '3/13/2017', 'FLAIR'): 1,
('002_S_1280', '3', '3/06/2019', 'DTI'): 1,
('002_S_4213', '1', '8/14/2017', 'FLAIR'): 1,
('002_S_4213', '1', '8/14/2017', 'T1'): 1}}
抱歉,我无法显示标题,但它们与此图像中的一样:
我被困在这一步,非常感谢您的帮助!
我需要一个代码,用于每个主题 ID 查看描述列,如果 DTI、T1 和 FLAIR 存在于单个访问中,则进行该访问并删除其余部分,如果它们存在于多个访问中,则使用带有最小值并删除其余部分。如果 DTI、T1 和 FLAIR 在单次访问中不存在,也删除这些行。 我需要的是为每个主题 ID 获取在描述(DTI、T1 和 FLAIR)中具有三个值的最小访问值并删除其余的
我的输出看起来像这样:
谢谢!
答案 0 :(得分:0)
假设测试 'Description' 恰好是 'DTI'、'FLAIR'、'T1' 三个值是必需的,并且仅检查给定组中有 n
值是不够的:
# Remove Description and Visit from MultiIndex
new_df = df.reset_index(['Visit', 'Description'])
# Create Set of Values to Check against
check_values = {'DTI', 'FLAIR', 'T1'}
# Create Boolean Index
m = (
new_df.groupby(level=[0, 1])['Description'].transform(
lambda g: set(g) == check_values and len(g) == len(check_values)
)
& new_df.groupby(level=0)['Visit'].transform('min').eq(new_df['Visit'])
)
# Filter Dataframe with Index and Fix MultiIndex
new_df = new_df[m].set_index(['Visit', 'Description'], append=True)
面膜的两个方面:
set(g) == check_values and len(g) == len(check_values)
new_df.groupby(level=0)['Visit'].transform('min').eq(new_df['Visit'])
输出(new_df
):
Modality Phase
Subject ID Study Date Visit Description
002_S_0413 6/21/2017 1 DTI 1 1
FLAIR 1 1
T1 1 1
002_S_1261 3/15/2017 1 DTI 1 1
FLAIR 1 1
T1 1 1