Data.csv文件(示例数据)
Taluka Crop Village Area
T1 C1 V1 11
T1 C1 V2 15
T1 C1 V3 3
T1 C1 V4 1
T1 C1 V5 2
T1 C2 V1 12
T1 C2 V2 16
T1 C2 V3 4
T1 C2 V4 100
T1 C2 V5 52
T1 C3 V1 47
T1 C3 V2 15
T1 C3 V3 21
T1 C3 V4 5
T1 C3 V5 7
T1 C4 V1 20
T1 C4 V2 14
T1 C4 V3 18
T1 C4 V4 5
T1 C4 V5 24
T2 C1 V1 21
T2 C1 V2 20
T2 C1 V3 14
T2 C1 V4 7
T2 C1 V5 8
T2 C2 V1 18
T2 C2 V2 3
T2 C2 V3 12
T2 C2 V4 78
T2 C2 V5 56
T2 C3 V1 16
T2 C3 V2 11
T2 C3 V3 15
T2 C3 V2 45
T2 C3 V3 2
T2 C4 V1 3
T2 C4 V2 12
T2 C4 V3 12
T2 C4 V4 44
T2 C4 V5 10
我想知道,
哪些村庄对特定的塔鲁卡特定作物具有高风险,中等风险和低风险区域。
我总共有500个taluka和500个taluka,有10到14个庄稼,每个taluka有100到200个村庄。
所以,我想知道,对于Crop-1(即Paddy),Taluka-1(即Thane)哪些村庄处于高风险,中等风险和低风险。使用百分位法。
我做了一些工作。但问题是我的代码不是动态的。我需要输入每个taluka - 每种作物都有很多组合。所以。我需要动态地使用一些循环(即for循环,if循环) 但是我被困在这一部分。
请参阅我的代码。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("/home/desktop/Data.csv")
df.head()
##part-1 Partition taluka's
T1= df[df['Taluka'] == 'T1']
T2= df[df['Taluka'] == 'T2']
##Part-2 Partition crop wise in each taluka's
T1_C1= T1[T1['Crop'] == 'C1']
T1_C2= T1[T1['Crop'] == 'C2']
T1_C3= T1[T1['Crop'] == 'C3']
T1_C4= T1[T1['Crop'] == 'C4']
T2_C1= T2[T2['Crop'] == 'C1']
T2_C2= T2[T2['Crop'] == 'C2']
T2_C3= T2[T2['Crop'] == 'C3']
T2_C4= T2[T2['Crop'] == 'C4']
##Descending order
T1_C1 = T1_C1.sort('Area', ascending=False)
T1_C2 = T1_C2.sort('Area', ascending=False)
T1_C3 = T1_C3.sort('Area', ascending=False)
T1_C4 = T1_C4.sort('Area', ascending=False)
T2_C1 = T2_C1.sort('Area', ascending=False)
T2_C2 = T2_C2.sort('Area', ascending=False)
T2_C3 = T2_C3.sort('Area', ascending=False)
T2_C4 = T2_C4.sort('Area', ascending=False)
#####Add levels for for each crops in each taluka's
T1_C1['Level'] = pd.qcut(T1_C1['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T1_C2['Level'] = pd.qcut(T1_C2['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T1_C3['Level'] = pd.qcut(T1_C3['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T1_C4['Level'] = pd.qcut(T1_C4['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T2_C1['Level'] = pd.qcut(T2_C1['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T2_C2['Level'] = pd.qcut(T2_C2['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T2_C3['Level'] = pd.qcut(T2_C3['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
T2_C4['Level'] = pd.qcut(T2_C4['Area'], 3, ['Low Risk','Medium Risk','High Risk'])
print(T1_C1)
所以,在这里我将获得作物C1,对于taluka T1,哪些村庄处于高风险区域,低风险区域......
循环如何?我减少代码的地方。和代码将用于500 taluka的?
答案 0 :(得分:2)
def f(x):
labels = ['Low Risk','Medium Risk','High Risk']
x['Level'] = pd.qcut(x['Area'].sort_values(ascending=False), 3, labels = labels)
return x
df1 = df.groupby(['Taluka','Crop']).apply(f)
print (df1)
Taluka Crop Village Area Level
0 T1 C1 V1 11 High Risk
1 T1 C1 V2 15 High Risk
2 T1 C1 V3 3 Medium Risk
3 T1 C1 V4 1 Low Risk
4 T1 C1 V5 2 Low Risk
5 T1 C2 V1 12 Low Risk
6 T1 C2 V2 16 Medium Risk
7 T1 C2 V3 4 Low Risk
8 T1 C2 V4 100 High Risk
9 T1 C2 V5 52 High Risk
10 T1 C3 V1 47 High Risk
11 T1 C3 V2 15 Medium Risk
12 T1 C3 V3 21 High Risk
13 T1 C3 V4 5 Low Risk
14 T1 C3 V5 7 Low Risk
15 T1 C4 V1 20 High Risk
16 T1 C4 V2 14 Low Risk
17 T1 C4 V3 18 Medium Risk
18 T1 C4 V4 5 Low Risk
19 T1 C4 V5 24 High Risk
20 T2 C1 V1 21 High Risk
21 T2 C1 V2 20 High Risk
22 T2 C1 V3 14 Medium Risk
23 T2 C1 V4 7 Low Risk
24 T2 C1 V5 8 Low Risk
25 T2 C2 V1 18 Medium Risk
26 T2 C2 V2 3 Low Risk
27 T2 C2 V3 12 Low Risk
28 T2 C2 V4 78 High Risk
29 T2 C2 V5 56 High Risk
30 T2 C3 V1 16 High Risk
31 T2 C3 V2 11 Low Risk
32 T2 C3 V3 15 Medium Risk
33 T2 C3 V2 45 High Risk
34 T2 C3 V3 2 Low Risk
35 T2 C4 V1 3 Low Risk
36 T2 C4 V2 12 Medium Risk
37 T2 C4 V3 12 Medium Risk
38 T2 C4 V4 44 High Risk
39 T2 C4 V5 10 Low Risk
编辑:最后可以添加sort_values
:
df1 = df1.sort_values(['Taluka','Crop', 'Area'], ascending=[True, True, False])
print (df1)
Taluka Crop Village Area Level
1 T1 C1 V2 15 High Risk
0 T1 C1 V1 11 High Risk
2 T1 C1 V3 3 Medium Risk
4 T1 C1 V5 2 Low Risk
3 T1 C1 V4 1 Low Risk
8 T1 C2 V4 100 High Risk
9 T1 C2 V5 52 High Risk
6 T1 C2 V2 16 Medium Risk
5 T1 C2 V1 12 Low Risk
7 T1 C2 V3 4 Low Risk
10 T1 C3 V1 47 High Risk
12 T1 C3 V3 21 High Risk
11 T1 C3 V2 15 Medium Risk
14 T1 C3 V5 7 Low Risk
13 T1 C3 V4 5 Low Risk
19 T1 C4 V5 24 High Risk
15 T1 C4 V1 20 High Risk
17 T1 C4 V3 18 Medium Risk
16 T1 C4 V2 14 Low Risk
18 T1 C4 V4 5 Low Risk
20 T2 C1 V1 21 High Risk
21 T2 C1 V2 20 High Risk
22 T2 C1 V3 14 Medium Risk
24 T2 C1 V5 8 Low Risk
23 T2 C1 V4 7 Low Risk
28 T2 C2 V4 78 High Risk
29 T2 C2 V5 56 High Risk
25 T2 C2 V1 18 Medium Risk
27 T2 C2 V3 12 Low Risk
26 T2 C2 V2 3 Low Risk
33 T2 C3 V2 45 High Risk
30 T2 C3 V1 16 High Risk
32 T2 C3 V3 15 Medium Risk
31 T2 C3 V2 11 Low Risk
34 T2 C3 V3 2 Low Risk
38 T2 C4 V4 44 High Risk
36 T2 C4 V2 12 Medium Risk
37 T2 C4 V3 12 Medium Risk
39 T2 C4 V5 10 Low Risk
35 T2 C4 V1 3 Low Risk
或(慢)在每个循环中排序:
def f(x):
labels = ['Low Risk','Medium Risk','High Risk']
x = x.sort_values('Area', ascending=False)
x['Level'] = pd.qcut(x['Area'], 3, labels = labels)
return x
df1 = df.groupby(['Taluka','Crop']).apply(f).reset_index(drop=True)
print (df1)
Taluka Crop Village Area Level
0 T1 C1 V2 15 High Risk
1 T1 C1 V1 11 High Risk
2 T1 C1 V3 3 Medium Risk
3 T1 C1 V5 2 Low Risk
4 T1 C1 V4 1 Low Risk
5 T1 C2 V4 100 High Risk
6 T1 C2 V5 52 High Risk
7 T1 C2 V2 16 Medium Risk
8 T1 C2 V1 12 Low Risk
9 T1 C2 V3 4 Low Risk
10 T1 C3 V1 47 High Risk
11 T1 C3 V3 21 High Risk
12 T1 C3 V2 15 Medium Risk
13 T1 C3 V5 7 Low Risk
14 T1 C3 V4 5 Low Risk
15 T1 C4 V5 24 High Risk
16 T1 C4 V1 20 High Risk
17 T1 C4 V3 18 Medium Risk
18 T1 C4 V2 14 Low Risk
19 T1 C4 V4 5 Low Risk
20 T2 C1 V1 21 High Risk
21 T2 C1 V2 20 High Risk
22 T2 C1 V3 14 Medium Risk
23 T2 C1 V5 8 Low Risk
24 T2 C1 V4 7 Low Risk
25 T2 C2 V4 78 High Risk
26 T2 C2 V5 56 High Risk
27 T2 C2 V1 18 Medium Risk
28 T2 C2 V3 12 Low Risk
29 T2 C2 V2 3 Low Risk
30 T2 C3 V2 45 High Risk
31 T2 C3 V1 16 High Risk
32 T2 C3 V3 15 Medium Risk
33 T2 C3 V2 11 Low Risk
34 T2 C3 V3 2 Low Risk
35 T2 C4 V4 44 High Risk
36 T2 C4 V2 12 Medium Risk
37 T2 C4 V3 12 Medium Risk
38 T2 C4 V5 10 Low Risk
39 T2 C4 V1 3 Low Risk