Question

我需要使用NYC OpenData-311服务请求数据集每100位居民提出投诉下方是数据的标题

Created Date    Complaint Type  Descriptor  Location Type   Incident Zip    Street Name     Cross Street 1  Cross Street 2  City    Status  Community Board     Borough
0   2010-01-01 00:00:00     HEATING     HEAT    RESIDENTIAL BUILDING    10468   RESERVOIR AVENUE    WEST 195 STREET     GOULDEN AVENUE  BRONX   Open    08 BRONX    BRONX
1   2010-01-01 00:00:00     GENERAL CONSTRUCTION    DOORS   RESIDENTIAL BUILDING    10468   RESERVOIR AVENUE    WEST 195 STREET     GOULDEN AVENUE  BRONX   Open    08 BRONX    BRONX
2   2010-01-01 00:00:00     GENERAL CONSTRUCTION    MOLD    RESIDENTIAL BUILDING    10468   RESERVOIR AVENUE    WEST 195 STREET     GOULDEN AVENUE  BRONX   Open    08 BRONX    BRONX
3   2010-01-01 00:03:00     Noise - Residential     Loud Television     Residential Building/House  11230   EAST 19 STREET  AVENUE O    AVENUE P    BROOKLYN    Closed  14 BROOKLYN     BROOKLYN
4   2010-01-01 00:04:00     Building/Use    SRO - Illegal Work/No Permit/Change In Occupan...   NaN     10466   EAST 224 STREET     BEND    SCHIEFFELIN AVENUE  BRONX   Closed  12 BRONX    BRONX

我正在使用的其他数据集是纽约市社区委员会（地区）

Borough     CD Number   CD Name     1970 Population     1980 Population     1990 Population     2000 Population     2010 Population     Community Board     Pop change %
0   Bronx   1   Melrose, Mott Haven, Port Morris    138557  78441   77214   82159   91497   01 Bronx    -33.964361
1   Bronx   2   Hunts Point, Longwood   99493   34399   39443   46824   52246   02 Bronx    -47.487763
2   Bronx   3   Morrisania, Crotona Park East   150636  53635   57162   68574   79762   03 Bronx    -47.049842
3   Bronx   4   Highbridge, Concourse Village   144207  114312  119962  139563  146441  04 Bronx    1.549162
4   Bronx   5   University Hts., Fordham, Mt. Hope  121807  107995  118435  128313  128200  05 Bronx    5.248467

我当前拥有的代码是

#Assigning CD Number to test, while also converting to a string and making it two decimal places
test = comm_district_df['CD Number'].astype(str).str.zfill(2)

#Assigning Community Board from complaints df to cdf1, sorting the values and making them unique
cdf1 = pd.Series(complaints_df['Community Board'].sort_values().unique())
#Looking for anything that has Noise in the Complaint Type
noise = complaints_df['Complaint Type'].str.contains('Noise')
combo = complaints_df[noise]

#test = complaints_df['Community Board'].value_counts()
#returns the counts for each 
count = pd.Series(combo['Community Board'].value_counts().unique())
#test = [int(i) for i in count.split() if i.isdigit()] 

#getting the sum of all noise complaints 
#ct = count[:].sum()
#Creating the dataframe from cdf1
df1 = pd.DataFrame({'Community Board':cdf1})
#creating the column for the inner join between the two data sets 
comm_district_df['Community Board'] = test +' '+ comm_district_df['Borough']
#Building the second dataframe with the columns from omm_district_df
cd1 = pd.Series(comm_district_df['CD Name'])
#The created Community Board column that is converted to upper to allow for the merging 
cd4 = pd.Series(comm_district_df['Community Board']).str.upper()
#the math to get the complaints per 100 people using the 2010 pop
cd2 = count / ((comm_district_df['2010 Population'])/100)
# the data frame it self from the last three items
df2 = pd.DataFrame({'Community Board':cd4,'CD Name':cd1,'Complaints / 100 Residents':cd2})
# the merge it self of the data frames
df_most = pd.merge(df1,df2).sort_values(by='Complaints / 100 Residents', ascending=False).round(2).head(10)
df_least = pd.merge(df1,df2).sort_values(by='Complaints / 100 Residents', ascending=True).round(2).head(10)
#Honestly didn't think this would work but it did, creating a tuple from the two dataframes 
tuple1 = (df_most,df_least)

我遇到的问题是这部分 count = pd.Series（combo ['Community Board']。value_counts（））

这将返回

12 MANHATTAN        8778
03 MANHATTAN        6499
07 MANHATTAN        5923
10 MANHATTAN        5631
01 BROOKLYN         5561
02 MANHATTAN        4794
09 BRONX            4692
04 BRONX            4652

如果我使用它，当我返回tuple1时，我将获得“投诉/ 100位居民”的NaN

如果我使用
count = pd.Series（combo ['Community Board']。value_counts（）。unique（））

这将返回

哪一个很近，但不是我要找的答案，当我返回tuple1时，我会得到

df_most
Community Board     CD Name     Complaints / 100 Residents
5   02 BRONX    Hunts Point, Longwood   12.44

我想要的答案

df_most
Community Board     CD Name     Complaints / 100 Residents
34  05 MANHATTAN    Midtown Business District   5.59

维持熊猫系列指数

0 个答案: