我有this data set,并且我想显示具有3个以上受害者的所有犯罪的列(“警察区名称”,“犯罪数量”)。但是,“犯罪数量”列不存在并显示为创建状态,它指示(以及在该地区实施的犯罪总数)。注意:每一行表示1次犯罪。
数据集示例:
Incident ID Victims Police District Name Beat
0 201087096 1 GERMANTOWN 5N1
1 201087097 1 WHEATON 4K2
2 201087097 1 WHEATON 4K2
3 201087097 1 WHEATON 4K2
4 201087100 1 GERMANTOWN 5M1
这是我的代码:
import pandas as pd
crimes_df = pd.read_csv('data/Crime.csv', low_memory=False, dtype={'Incident ID': int, 'Beat':object})
more_than_three_victims = crimes_df[(crimes_df['Victims'] > 3)]
more_than_three_victims.groupby(['Police District Name']).sum()
我不知道从这里做什么,我将不胜感激。
答案 0 :(得分:1)
因此,最初读取数据时,不必从所有列中创建一个df:
crimes_df = pd.read_csv('./Desktop/Crime.csv', usecols=['Police District Name', 'Victims'])
# The above will only read in the columns listed
more_than_three_victims = crimes_df[(crimes_df['Victims'] > 3)] # filter based on 3 crimes
groupby_victims = more_than_three_victims.groupby('Police District Name')['Victims'].agg(['sum']).rename(columns = {'sum': 'Number of Victims'})
print(groupby_victims)
输出:
Number of Victims
Police District Name
BETHESDA 52
GERMANTOWN 106
MONTGOMERY VILLAGE 104
ROCKVILLE 73
SILVER SPRING 107
TAKOMA PARK 4
WHEATON 78
这将按“警区名称”分组并汇总每个分区中的受害者人数,然后将“ sum”列重命名为“犯罪数量”。我相信这就是您想要的。
如果您要统计3个以上的受害者的犯罪数量:
groupby_victims = more_than_three_victims.groupby('Police District Name')['Victims'].agg(['count']).rename(columns ={'count': 'Number of Crimes'})
# you just change 'sum' to 'count'
输出:
Number of Crimes
Police District Name
BETHESDA 9
GERMANTOWN 23
MONTGOMERY VILLAGE 21
ROCKVILLE 15
SILVER SPRING 21
TAKOMA PARK 1
WHEATON 18
同样,这将是犯罪数量,而不是受害者的总数。