我有一个数据框:
Drug_Name CurrentYear PMPM Cost
Drug A 201901 25 10
Drug B 201902 25 20
Drug C 201903 50 30
Drug D 202001 75 25
Drug E 202002 100 100
我想将其转换为:
Drug_Name Current Year ComparisionYear Measure_Name Measure_Value_Current Measure_Value_Comparision
Drug A 201901 201901 PMPM 25 25
Drug A 201901 201901 Cost 10 10
Drug B 201902 201901 PMPM 25 25
Drug B 201902 201901 Cost 20 10
Drug C 201903 201901 PMPM 50 25
Drug C 201903 201901 Cost 30 10
Drug C 201903 201902 PMPM 50 25
Drug C 201903 201902 Cost 30 20
Drug D 202001 201901 PMPM 75 25
Drug D 202001 201901 Cost 25 10
Drug D 202001 201902 PMPM 75 25
Drug D 202001 201902 Cost 25 20
Drug D 202001 201903 PMPM 75 50
Drug D 202001 201903 Cost 25 30
Drug E 202002 201901 PMPM 100 25
Drug E 202002 201901 Cost 100 10
Drug E 202002 201902 PMPM 100 25
Drug E 202002 201902 Cost 100 20
Drug E 202002 201903 PMPM 100 50
Drug E 202002 201903 Cost 100 30
Drug E 202002 202001 PMPM 100 75
Drug E 202002 202001 Cost 100 25
这个想法不仅是要进行透视,而且还要为每种可能的组合添加这些派生的列
答案 0 :(得分:1)
因此,我们将执行3个步骤。
-获得所有药物的所有信息-
我们将获得所有药物组合,然后将其保存在名为allCombo的df中
import itertools
import pandas as pd
allCombo = pd.DataFrame(list(itertools.product(df['Drug_Name'], repeat = 2)), columns = ['Drug1','Drug2'])
接下来,我们将把原始df毒品的dtype当作一个类别,并确保将其订购。这将使我们在需要时可以大大简化时间过滤。
我们还将融合您原来的df,以便将PMPM和成本放在最终df中的行上。
df['Drug_Name'] = df['Drug_Name'].astype(pd.api.types.CategoricalDtype(df['Drug_Name'].to_list(), ordered = True))
df= df.melt(['Drug_Name','CurrentYear'], var_name = 'Measure_Name', value_name = 'Measure_Value')
接下来,我们将融合原始的df与我们的allCombo df两次。在此之后,我们在df的每一行上都有了要比较的两种药物的所有信息。
merged = allCombo.merge(df, left_on = 'Drug1', right_on = 'Drug_Name').merge(df, left_on ='Drug2', right_on='Drug_Name')
-筛选出我们想要的内容-
接下来,我们将合并的df进行过滤,以仅显示所需的行。您将在这里看到我们如何使用订购的类别来帮助我们。
filtered = merged[(merged['Drug_Name_x'] < merged['Drug_Name_y'])& (merged['Measure_Name_x']==merged['Measure_Name_y']) |
((merged['Drug_Name_x'] == 'Drug A') &(merged['Drug_Name_y'] == 'Drug A') & (merged['Measure_Name_x']==merged['Measure_Name_y']))].copy()
在此过滤器中的或语句之后,具体是由于您只希望将药物A与药物A进行比较,而不希望将其他药物与自身进行比较。现在,我们有了您要查找的df。
-清理df(即col名称,重设索引ect)-
下一步是简单地删除列,并重命名列以匹配您想要的内容。
filtered.drop(columns = ['Drug_Name_x', 'Measure_Name_x'], inplace = True)
filtered.rename(columns = {'Drug_Name_y':'Drug_Name',
'CurrentYear_y':'Current Year',
'CurrentYear_x':'ComparisionYear',
'Measure_Name_y':'Measure_Name',
'Measure_Value_y':'Measure_Value_Current',
'Measure_Value_x':'Measure_Value_Comparision'
}, inplace = True)
final = filtered[['Drug_Name', 'Current Year', 'ComparisionYear', 'Measure_Name', 'Measure_Value_Current', 'Measure_Value_Comparision']].reset_index(drop = True)
现在您有了最终的df!