在数据框(df)中添加一个新列,用于存储每个人的(总存在)/(总存在数+总数不存在)的值

时间:2017-04-13 04:54:41

标签: python pandas numpy

enter image description here

它是python Pandas数据框,命名为df。如何为其添加新列,为每个indivodual存储total present/(total present + total absent)

2 个答案:

答案 0 :(得分:1)

考虑数据框df

df = pd.DataFrame(
    np.random.choice([None, 'Absent', 'Present'], (10, 10))
)

enter image description here

您可以将pd.value_countsnormalize=True

一起使用
df.join(df.apply(pd.value_counts, 1, normalize=True).Present)

enter image description here

答案 1 :(得分:1)

虚拟数据帧,为简单起见,我创建了a,b,c,d作为列名

import pandas as pd
df = pd.DataFrame({'a': ['jon','sam','dean','bob'],
               'b': ['present','present','absent','present'],
               'c':['absent','present','present','absent'],
               'd':['absent','absent','present','present']})

df['b1'] = df['b'].map({'present': 1, 'absent': 0})
df['c1'] = df['c'].map({'present': 1, 'absent': 0})
df['d1'] = df['d'].map({'present': 1, 'absent': 0})

df['sum_1'] = (df[['b1','c1','d1']] == 1).sum(axis=1)
df['sum_0'] = (df[['b1','c1','d1']] == 0).sum(axis=1)

df['present'] =((df['sum_1']* 1.0)/(df['sum_1']+df['sum_0']))

df[['a','b','c','d','present']]

或者你可以简单地使用@piRSquared建议的解决方案。

import pandas as pd
df = pd.DataFrame({'a': ['jon','sam','dean','bob'],
           'b': ['present','present','absent','present'],
           'c':['absent','present','present','absent'],
           'd':['absent','absent','present','present']})
df.assign(present=df.stack().map(dict(present=1, absent=0)).unstack().mean(1))

enter image description here