我在计算使用着名cenus.csv数据的县数量时遇到了麻烦。
任务:计算每个州的县数。
面对比较(我认为)/请在下面阅读?
我试过这个:
df = pd.read_csv('census.csv')
dfd = df[:]['STNAME'].unique() //Gives out names of state
serr = pd.Series(dfd) // converting to series (from array)
在此之后,我尝试了两种方法:
1:
df[df['STNAME'] == serr] **//ERROR: series length must match**
2:
i = 0
for name in serr: //This generate error 'Alabama'
df['STNAME'] == name
for i in serr:
serr[i] == serr[name]
print(serr[name].count)
i+=1
请指导我;这个东西已经有三天了。
答案 0 :(得分:6)
使用groupby
并使用COUNTY
汇总nunique
:
In [1]: import pandas as pd
In [2]: df = pd.read_csv('census.csv')
In [3]: unique_counties = df.groupby('STNAME')['COUNTY'].nunique()
现在结果
In [4]: unique_counties
Out[4]:
STNAME
Alabama 68
Alaska 30
Arizona 16
Arkansas 76
California 59
Colorado 65
Connecticut 9
Delaware 4
District of Columbia 2
Florida 68
Georgia 160
Hawaii 6
Idaho 45
Illinois 103
Indiana 93
Iowa 100
Kansas 106
Kentucky 121
Louisiana 65
Maine 17
Maryland 25
Massachusetts 15
Michigan 84
Minnesota 88
Mississippi 83
Missouri 116
Montana 57
Nebraska 94
Nevada 18
New Hampshire 11
New Jersey 22
New Mexico 34
New York 63
North Carolina 101
North Dakota 54
Ohio 89
Oklahoma 78
Oregon 37
Pennsylvania 68
Rhode Island 6
South Carolina 47
South Dakota 67
Tennessee 96
Texas 255
Utah 30
Vermont 15
Virginia 134
Washington 40
West Virginia 56
Wisconsin 73
Wyoming 24
Name: COUNTY, dtype: int64
答案 1 :(得分:3)
juanpa.arrivillaga有一个很好的解决方案。但是,代码需要稍作修改。
应过滤'SUMLEV' == 40
或'COUNTY' == 0
的“县”。否则,所有县的数量都太大了。
所以,正确的答案应该是:
unique_counties = census_df[census_df['SUMLEV'] == 50].groupby('STNAME')['COUNTY'].nunique()
具有以下结果:
STNAME
Alabama 67
Alaska 29
Arizona 15
Arkansas 75
California 58
Colorado 64
Connecticut 8
Delaware 3
District of Columbia 1
Florida 67
Georgia 159
Hawaii 5
Idaho 44
Illinois 102
Indiana 92
Iowa 99
Kansas 105
Kentucky 120
Louisiana 64
Maine 16
Maryland 24
Massachusetts 14
Michigan 83
Minnesota 87
Mississippi 82
Missouri 115
Montana 56
Nebraska 93
Nevada 17
New Hampshire 10
New Jersey 21
New Mexico 33
New York 62
North Carolina 100
North Dakota 53
Ohio 88
Oklahoma 77
Oregon 36
Pennsylvania 67
Rhode Island 5
South Carolina 46
South Dakota 66
Tennessee 95
Texas 254
Utah 29
Vermont 14
Virginia 133
Washington 39
West Virginia 55
Wisconsin 72
Wyoming 23
Name: COUNTY, dtype: int64
答案 2 :(得分:1)
@Bakhtawar-这是一种非常简单的方法:
df.groupby(df['STNAME']).count().COUNTY
答案 3 :(得分:0)
不使用'groupby'的Layman逻辑:
import pandas as pd
census_df= pd.read_csv('census.csv')
cdf= census_df.copy()
cdf= cdf[cdf['SUMLEV'] == 50]
ind= cdf['STNAME'].unique()
m=0
for i in ind:
c= len(cdf[cdf['STNAME'] == i])
if c>m:
m= c
state= i
print (state)
结果: '德州'