我是一个相当新的python用户,我陷入了一个问题。任何指导将不胜感激。
我有一个熊猫数据框,其中有三列“ ID”,“干预”和“ GradeLevel”。参见下面的代码:
data = [[100,'Long', 0], [101,'Short', 1],[102,'Medium', 2],[103,'Long', 0],[104,'Short', 1],[105,'Medium', 2]]
intervention_df = pd.DataFrame(data, columns = ['ID', 'Intervention', 'GradeLevel'])
然后,我创建了一个按“干预”分组的数据帧字典。参见下面的代码:
intervention_dict = {Intervention: dfi for Intervention, dfi in df.groupby('Intervention')}
我的问题是,您可以遍历字典的值并操纵字典的每个值吗?具体来说,我正在尝试引用查找表。查找表可以被认为是名册。我的目标是将名单中的所有人标记为“是-干预名称”或“否干预”。之所以变得棘手,是因为例如说长期干预只有GradeLevel0。这意味着我想将干预级别0的任何人标记为'是-长',而将干预级别'0'中没有的人标记为'否-长”。 '这将成为名为'Value'的新列。我还需要创建另一个变量“ Category”,在该示例中将指定干预名称,即为“ Long”
lookup_data = [[100, 0], [101, 1],[102, 2],[103, 0],[104, 1],[105, 2], [106, 0], [107, 0],[108, 2],[109, 1]]
lookup_df = pd.DataFrame(lookup_data, columns = ['ID', 'GradeLevel'])
例如,“ Long”字典在处理后将如下所示:
longint_data = [[100,'Long', 'Yes - Long'],[103,'Long', 'Yes - Long'], [106,'Long', 'No - Long'], [107,'Long', 'No - Long']]
longint_df = pd.DataFrame(longint_data, columns = ['ID','Category', 'Value'])
所有操作后所需的最终输出如下所示:
result_data = [[100,'Long', 'Yes - Long'] , [101,'Short','Yes - Short'], [102,'Medium','Yes - Medium'], [103,'Long', 'Yes - Long'], [104,'Short','Yes - Short'] , [105, 'Medium','Yes - Medium'], [106,'Long', 'No - Long'], [107,'Long', 'No - Long'], [108,'Medium','No - Medium'], [109,'Short','No - Short']]
result_df = pd.DataFrame(result_data, columns = ['ID','Category', 'Value'])
谢谢!
答案 0 :(得分:2)
这就是我想要的。.但是如果没有更清晰的解释,我不确定。
data = [[100,'Long', 0], [101,'Short', 1],[102,'Medium', 2],[103,'Long', 0],[104,'Short', 1],[105,'Medium', 2]]
intervention_df = pd.DataFrame(data, columns = ['ID', 'Intervention', 'GradeLevel'])
lookup_data = [[100, 0], [101, 1],[102, 2],[103, 0],[104, 1],[105, 2], [106, 0], [107, 0],[108, 2],[109, 1]]
lookup_df = pd.DataFrame(lookup_data, columns = ['ID', 'GradeLevel'])
df= pd.merge(intervention_df.assign(y='Yes'), lookup_df, on=['ID', 'GradeLevel'], how='outer')
df.loc[df.y.isnull(), 'y'] = 'No'
ID Intervention GradeLevel y
0 100 Long 0 Yes
1 101 Short 1 Yes
2 102 Medium 2 Yes
3 103 Long 0 Yes
4 104 Short 1 Yes
5 105 Medium 2 Yes
6 106 NaN 0 No
7 107 NaN 0 No
8 108 NaN 2 No
9 109 NaN 1 No
答案 1 :(得分:1)
这里的解决方案不使用字典intervention_dict
。以下是我从您的命令中获得的数据:
In [1048]: intervention_df
Out[1048]:
ID Intervention GradeLevel
0 100 Long 0
1 101 Short 1
2 102 Medium 2
3 103 Long 0
4 104 Short 1
5 105 Medium 2
In [1049]: lookup_df
Out[1049]:
ID GradeLevel
0 100 0
1 101 1
2 102 2
3 103 0
4 104 1
5 105 2
6 106 0
7 107 0
8 108 2
9 109 1
步骤1 :在lookup_df
和intervention_df
之间进行外部合并,创建列Value
和set_index
至GradeLevel
>
In [1059]: df = lookup_df.merge(intervention_df, on=['ID', 'GradeLevel'], how='outer').assign(Value='Yes - '+intervention_df['Intervention']).set_index('GradeLevel')
In [1060]: df
Out[1060]:
ID Intervention Value
GradeLevel
0 100 Long Yes - Long
1 101 Short Yes - Short
2 102 Medium Yes - Medium
0 103 Long Yes - Long
1 104 Short Yes - Short
2 105 Medium Yes - Medium
0 106 NaN NaN
0 107 NaN NaN
2 108 NaN NaN
1 109 NaN NaN
第2步:创建df_fillna
以在NaN
中填充df
In [1063]: df_fillna = intervention_df.groupby('Intervention').head(1).assign(Value='No - '+intervention_df['Intervention']).set_index('GradeLevel')
In [1064]: df_fillna
Out[1064]:
ID Intervention Value
GradeLevel
0 100 Long No - Long
1 101 Short No - Short
2 102 Medium No - Medium
第3步(最终):使用combine_first
来填充NaN
值中的df
中的df_fillna
,并删除reset_index
'GradeLevel and doing
sort_values on
ID`
In [1068]: df.combine_first(df_fillna).sort_values('ID').reset_index(drop=True)
Out[1068]:
ID Intervention Value
0 100 Long Yes - Long
1 101 Short Yes - Short
2 102 Medium Yes - Medium
3 103 Long Yes - Long
4 104 Short Yes - Short
5 105 Medium Yes - Medium
6 106 Long No - Long
7 107 Long No - Long
8 108 Medium No - Medium
9 109 Short No - Short