我需要使用NYC OpenData-311服务请求数据集每100位居民提出投诉下方是数据的标题
Created Date Complaint Type Descriptor Location Type Incident Zip Street Name Cross Street 1 Cross Street 2 City Status Community Board Borough
0 2010-01-01 00:00:00 HEATING HEAT RESIDENTIAL BUILDING 10468 RESERVOIR AVENUE WEST 195 STREET GOULDEN AVENUE BRONX Open 08 BRONX BRONX
1 2010-01-01 00:00:00 GENERAL CONSTRUCTION DOORS RESIDENTIAL BUILDING 10468 RESERVOIR AVENUE WEST 195 STREET GOULDEN AVENUE BRONX Open 08 BRONX BRONX
2 2010-01-01 00:00:00 GENERAL CONSTRUCTION MOLD RESIDENTIAL BUILDING 10468 RESERVOIR AVENUE WEST 195 STREET GOULDEN AVENUE BRONX Open 08 BRONX BRONX
3 2010-01-01 00:03:00 Noise - Residential Loud Television Residential Building/House 11230 EAST 19 STREET AVENUE O AVENUE P BROOKLYN Closed 14 BROOKLYN BROOKLYN
4 2010-01-01 00:04:00 Building/Use SRO - Illegal Work/No Permit/Change In Occupan... NaN 10466 EAST 224 STREET BEND SCHIEFFELIN AVENUE BRONX Closed 12 BRONX BRONX
我正在使用的其他数据集是纽约市社区委员会(地区)
Borough CD Number CD Name 1970 Population 1980 Population 1990 Population 2000 Population 2010 Population Community Board Pop change %
0 Bronx 1 Melrose, Mott Haven, Port Morris 138557 78441 77214 82159 91497 01 Bronx -33.964361
1 Bronx 2 Hunts Point, Longwood 99493 34399 39443 46824 52246 02 Bronx -47.487763
2 Bronx 3 Morrisania, Crotona Park East 150636 53635 57162 68574 79762 03 Bronx -47.049842
3 Bronx 4 Highbridge, Concourse Village 144207 114312 119962 139563 146441 04 Bronx 1.549162
4 Bronx 5 University Hts., Fordham, Mt. Hope 121807 107995 118435 128313 128200 05 Bronx 5.248467
我当前拥有的代码是
#Assigning CD Number to test, while also converting to a string and making it two decimal places
test = comm_district_df['CD Number'].astype(str).str.zfill(2)
#Assigning Community Board from complaints df to cdf1, sorting the values and making them unique
cdf1 = pd.Series(complaints_df['Community Board'].sort_values().unique())
#Looking for anything that has Noise in the Complaint Type
noise = complaints_df['Complaint Type'].str.contains('Noise')
combo = complaints_df[noise]
#test = complaints_df['Community Board'].value_counts()
#returns the counts for each
count = pd.Series(combo['Community Board'].value_counts().unique())
#test = [int(i) for i in count.split() if i.isdigit()]
#getting the sum of all noise complaints
#ct = count[:].sum()
#Creating the dataframe from cdf1
df1 = pd.DataFrame({'Community Board':cdf1})
#creating the column for the inner join between the two data sets
comm_district_df['Community Board'] = test +' '+ comm_district_df['Borough']
#Building the second dataframe with the columns from omm_district_df
cd1 = pd.Series(comm_district_df['CD Name'])
#The created Community Board column that is converted to upper to allow for the merging
cd4 = pd.Series(comm_district_df['Community Board']).str.upper()
#the math to get the complaints per 100 people using the 2010 pop
cd2 = count / ((comm_district_df['2010 Population'])/100)
# the data frame it self from the last three items
df2 = pd.DataFrame({'Community Board':cd4,'CD Name':cd1,'Complaints / 100 Residents':cd2})
# the merge it self of the data frames
df_most = pd.merge(df1,df2).sort_values(by='Complaints / 100 Residents', ascending=False).round(2).head(10)
df_least = pd.merge(df1,df2).sort_values(by='Complaints / 100 Residents', ascending=True).round(2).head(10)
#Honestly didn't think this would work but it did, creating a tuple from the two dataframes
tuple1 = (df_most,df_least)
我遇到的问题是这部分 count = pd.Series(combo ['Community Board']。value_counts())
这将返回
12 MANHATTAN 8778
03 MANHATTAN 6499
07 MANHATTAN 5923
10 MANHATTAN 5631
01 BROOKLYN 5561
02 MANHATTAN 4794
09 BRONX 4692
04 BRONX 4652
如果我使用它,当我返回tuple1时,我将获得“投诉/ 100位居民”的NaN
如果我使用
count = pd.Series(combo ['Community Board']。value_counts()。unique())
这将返回
0 8778
1 6499
2 5923
3 5631
4 5561
5 4794
6 4692
7 4652
哪一个很近,但不是我要找的答案,当我返回tuple1时,我会得到
df_most
Community Board CD Name Complaints / 100 Residents
5 02 BRONX Hunts Point, Longwood 12.44
我想要的答案
df_most
Community Board CD Name Complaints / 100 Residents
34 05 MANHATTAN Midtown Business District 5.59