我有以下代码:
input= pd.DataFrame({'Police District Name': ['WHEATON', 'SILVER SPRING', 'BETHESDA','GERMANTOWN','WHEATON','MONTGOMERY VILLAGE'],
'cn1': ['Crime Against Person', 'Crime Against Person', 'Crime Against Person','other','other','other'],
'cn2': ['Aggravated Assault', 'bla', 'bla','blaa','bla','one more bla'],
'cn3': ['Aggravated Assault', 'bla', 'bla','blaa','bla','one more bla'],
})
input
所需的输出:
output= pd.DataFrame({'Police District Name': ['WHEATON', 'SILVER SPRING', 'BETHESDA','GERMANTOWN','WHEATON','MONTGOMERY VILLAGE'],
'total crime number':[6,3,3,3,6,3],
})
output
我怎么能得到这个? 谢谢!
答案 0 :(得分:2)
如果cn1
,cn2
中的每个值都充满了罪行,则可以使用列数。这个想法是通过value_counts
构造一系列计数,然后乘以cnx
列的数量。然后映射到您的数据框。
counts = df['Police District Name'].value_counts() * (len(df.columns) - 1)
df['total crime number'] = df['Police District Name'].map(counts)
print(df[['Police District Name', 'total crime number']])
Police District Name total crime number
0 WHEATON 6
1 SILVER SPRING 3
2 BETHESDA 3
3 GERMANTOWN 3
4 WHEATON 6
5 MONTGOMERY VILLAGE 3