这是我原来的数据:
fi_diameter ever_percent
0 -1.000000 0.00
1 -0.694212 0.00
2 -0.499782 0.00
3 -0.249749 0.00
4 -0.000000 0.00
5 0.249822 0.00
6 0.500218 0.00
7 0.749038 0.00
8 0.985645 0.00
9 1.251539 0.00
10 1.498179 0.00
11 2.122177 0.78
12 2.000000 3.70
13 2.251539 6.23
14 2.498179 8.66
15 2.746616 10.79
16 3.000000 12.13
17 3.251539 11.93
18 3.506353 10.78
19 3.756331 8.55
20 3.988504 5.90
21 4.237864 4.34
22 4.506353 2.89
23 5.011588 2.84
24 5.506353 1.88
25 5.965784 1.65
26 6.965784 2.49
27 7.965784 1.33
28 7.965784 3.13
我想将我的数据更改为此类型:
1-2 4.48
2-3 37.81
3-4 37.16
4-5 7.23
5-6 6.37
6-7 2.49
7-8 4.46
我尝试使用" groupby"的pandas ,但它不会将数据转换为我想要的数据。
如何获得此输出?
答案 0 :(得分:1)
pd.cut
和bin fi_diameter
ever_percent
df = df.query('fi_diameter >= 1')
grouper = pd.cut(
df['fi_diameter'],
bins=[1, 2, 3, 4, 5, 6, 7, 8],
labels=['1-2', '2-3', '3-4', '4-5', '5-6', '6-7', '7-8']
)
df.groupby(grouper).ever_percent.sum().reset_index()
fi_diameter ever_percent
0 1-2 3.70
1 2-3 38.59
2 3-4 37.16
3 4-5 7.23
4 5-6 6.37
5 6-7 2.49
6 7-8 4.46
以下是对任何上限进行概括的方法。
mx = int(np.ceil(df['fi_diameter'].max()))
rn = np.arange(1, mx + 1)
df = df.query('fi_diameter >= 1')
grouper = pd.cut(
df['fi_diameter'],
bins=rn,
labels=[
str(i) + '-' + str(j) for i, j in zip(rn, rn[1:])
]
)
df.groupby(grouper).ever_percent.sum().reset_index()
fi_diameter ever_percent
0 1-2 3.70
1 2-3 38.59
2 3-4 37.16
3 4-5 7.23
4 5-6 6.37
5 6-7 2.49
6 7-8 4.46
答案 1 :(得分:0)
这会有帮助吗?
bins = [1,2,3,4,5,6,7,8]
data['bins'] = pd.cut(data.fi_diameter, bins=bins, labels=['1-2', '2-3', '3-4', '4-5', '5-6', '6-7', '7-8'])
df = data.groupby(['bins']).sum()
df = df.drop('fi_diameter', axis=1)
df
ever_percent
bins
1-2 3.70
2-3 38.59
3-4 37.16
4-5 7.23
5-6 6.37
6-7 2.49
7-8 4.46