我有一个像这样的数据库。
ID Covid_pos Asymptomatic Fever Cough Rash
1 1 0 1 0 1
2 0 0 0 1 0
3 1 1 0 1 1
4 1 0 1 0 1
5 0 1 1 0 0
根据这些数据,我的目标是创建一个看起来像这样的输出
Symptom All Tested(5308, 100%) SARS-COV-2 PCR positive (N,%)
Asymptomatic 2528(47.63%) 163(6.45%)
Fever 958(23.85%) 43(3.53%)
Cough 159(3.95%) 22(9.72%)
Rash 19(23.05%) 88(18.40%)
我写了一个代码,它将为我的一个变量产生所需的输出;但是,我想创建一个宏或函数,以便可以将其应用于所有症状变量。因此,我很好奇是否建议您探索任何其他选项,而不是将代码复制和粘贴8次以上并在代码每次对下一个症状说“无症状”时进行更改,而是将其更改。对Python来说有些新知识,因此欢迎所有策略!
AsyOdds_Percent = pd.crosstab(df_merged2["Asymptomatic"],df_merged2.Covid_pos)
AsyOdds_Percent = pd.DataFrame(AsyOdds_Percent.to_records()).rename(columns={'Asymptomatic':'Asymptomatic','0':'Neg_%','1':'Pos_%'}).fillna(0)
AsyOdds_Percent["Total_%"] = AsyOdds_Percent.sum(axis=1)
AsyOdds_Count=pd.crosstab(df_merged2["Asymptomatic"],df_merged2.Covid_pos)
AsyOdds_Count1 = pd.DataFrame(AsyOdds_Count.to_records()).rename(columns={'Asymptomatic':'Asymptomatic','0':'Neg_N','1':'Pos_N'}).fillna(0)
AsyOdds_Count1["Total_N"] = AsyOdds_Count1.sum(axis=1)
cols = AsyOdds_Percent.columns[1:4]
AsyOdds_Percent[cols] = AsyOdds_Percent[cols]/AsyOdds_Percent[cols].sum()*100
Merged = pd.merge(AsyOdds_Count1,AsyOdds_Percent, on='Asymptomatic', how='left')
Merged['%_Pos'] = (Merged['Pos_N']/Merged['Total_N'])*100
Merged['%_Pos'] = round(Merged['%_Pos'], 2)
Merged['Total_%'] = round(Merged['Total_%'], 2)
Merged = Merged[['Asymptomatic','Pos_N','Pos_%','Neg_N','Neg_%','Total_N','Total_%','%_Pos']]
Merged = Merged.loc[Merged['Asymptomatic'] == 1]
Merged = Merged[['Asymptomatic','Total_N','Total_%','Pos_N','%_Pos']]
Merged = Merged.rename(columns = {"Asymptomatic": "Symptoms"})
a1 = (Merged["Symptoms"] == 1)
conditions = [a1]
Merged['Symptoms'] = np.select([a1], ['Asymptomatic'])
Merged['All Tested (5308, 100%)'] = Merged['Total_N'].map(str) + '(' + Merged['Total_%'].map(str) + '%)'
Merged['SARS-COV-2 PCR positive (N,%)'] = Merged['Pos_N'].map(str) + '(' + Merged['%_Pos'].map(str) + '%)'
Merged=Merged[['Symptoms','All Tested (5308, 100%)','SARS-COV-2 PCR positive (N,%)']]
print(Merged)
输出:
Symptoms All Tested (5308, 100%) SARS-COV-2 PCR positive (N,%)
1 Asymptomatic 2528(47.63%) 163(6.45%)
答案 0 :(得分:1)
我使用了以下数据样本( df ):
Covid_pos Asymptomatic Fever Cough
0 1 0 1 0
1 0 0 0 1
2 1 1 0 1
3 1 0 1 0
4 0 1 1 0
5 1 0 1 0
6 0 1 1 0
7 1 0 0 1
8 0 0 0 0
9 0 0 0 0
从定义3个功能开始:
def colSums(col):
return pd.Series([col.sum(), col.loc[1].sum()], index=['All', 'Pos'])
def withPct(x):
return f'{x}({x / total * 100}%)'
def colTitle(head, n1):
return f'{head}({n1}, {n1/total*100}%)'
然后计算所需总数:
total = df.index.size
totalPos = df.Covid_pos.sum()
整个处理(对于所有源列)归结为2 说明:
res = df.set_index('Covid_pos').apply(colSums).T.applymap(withPct)
res.columns = [colTitle('All Tested', total),
colTitle('SARS-COV-2 PCR positive', totalPos)]
结果是:
All Tested(10, 100.0%) SARS-COV-2 PCR positive(5, 50.0%)
Asymptomatic 3(30.0%) 1(10.0%)
Fever 5(50.0%) 3(30.0%)
Cough 3(30.0%) 2(20.0%)
计算“正”列中相对于数字的百分比 对于肯定的案例,请按以下步骤操作:
以绝对数字计算结果:
res = df.set_index('Covid_pos').apply(colSums).T
计算每一列除以相应除数的百分比:
wrk = res / [total, totalPos] * 100; wrk
使用“原始”值的串联覆盖 res 中的每一列 和括号中的百分比。
res.All = res.All.astype(str) + '(' + wrk.All.astype(str) + '%)'
res.Pos = res.Pos.astype(str) + '(' + wrk.Pos.astype(str) + '%)'
现在的结果是:
All Tested(10, 100.0%) SARS-COV-2 PCR positive(5, 50.0%)
Asymptomatic 3(30.0%) 1(20.0%)
Fever 5(50.0%) 3(60.0%)
Cough 3(30.0%) 2(40.0%)
现在不需要withPct 函数。
答案 1 :(得分:1)
也许这对您有用-
df = pd.DataFrame({'Covid_pos':[1,0,1,1,0], 'Asymptomatic':[0,0,1,0,1], 'Fever':[1,0,0,1,1], 'Cough':[0,1,1,0,0],'Rash':[1,0,1,1,0]})
df = df.rename(columns = {'Covid_pos':'SARS-COV-2 PCR positive'})
df['All Tested'] = 1 #Adding a dummy column with all values as 1 for ALL TESTED
symptoms = ['Asymptomatic','Fever','Cough', 'Rash']
targets = ['SARS-COV-2 PCR positive', 'All Tested']
df2 = df.set_index(targets).stack().reset_index().set_axis(targets+['symptoms','flg'], axis=1)
df3 = df2.groupby(['symptoms','flg'])[targets].sum().reset_index()
df4 = df3[df3['flg']==1].drop('flg', axis=1)
df4.columns = ['symptoms']+targets
df4[[i+' %' for i in targets]] = df4[targets].apply(lambda x : round(x/x.sum()*100,ndigits=2))
df4
symptoms SARS-COV-2 PCR positive All Tested \
1 Asymptomatic 1 2
3 Cough 1 2
5 Fever 2 3
7 Rash 3 3
SARS-COV-2 PCR positive % All Tested %
1 14.29 20.0
3 14.29 20.0
5 28.57 30.0
7 42.86 30.0