使每个分布下的面积为1

时间:2018-07-22 01:39:44

标签: python-3.x dataframe math distribution

我在数据框中有多个数据。每个数据都有某种分布。

0   4   6
-0.276  4403    4403
-0.138  4640    4640
0   0   0
0.138   12  0
0.276   0   0
0.414   0   0
0.552   0   0
0.69    0   0
0.828   0   12
0.966   0   0
1.104   0   12
1.242   0   0
1.38    0   0
1.518   0   0
1.656   0   0
1.794   0   0
1.932   0   0
2.07    0   0
2.208   0   0
2.346   0   0
2.484   0   0
2.622   0   12
2.76    0   0
2.898   0   0
3.036   0   0
3.174   0   0
3.312   0   0
3.45    0   0
3.588   0   0
3.726   0   0
3.864   12  0
4.002   0   0
4.14    0   0
4.278   12  0
4.416   0   0
4.554   0   12
4.692   0   0
4.83    0   0
4.968   0   0
5.106   0   0
5.244   0   0
5.382   12  0
5.52    0   0
5.658   0   0
5.796   127 60
5.934   357 275
6.072   1882    2144
6.21    6726    6609
6.348   9398    11180
6.486   12784   18389
6.624   15863   20111
6.762   6739    10202
6.9 1684    1921
7.038   249 376
7.176   47  103
7.314   0   26
7.452   17  0
7.59    0   0
7.728   0   0
7.866   0   0
8.004   0   0
8.142   0   0
8.28    0   0
8.418   0   0
8.556   0   0
8.694   0   0
8.832   0   0
8.97    0   0
9.108   0   0
9.246   0   0
9.384   0   0
9.522   0   0
9.66    0   0
9.798   0   0
9.936   0   0
10.074  0   0
10.212  0   0
10.35   0   12
10.488  0   0
10.626  0   0
10.764  0   0
10.902  0   0
11.04   0   0
11.178  0   0
11.316  0   0
11.454  0   0
11.592  0   0
11.73   0   0
11.868  0   0
12.006  0   0
12.144  0   0
12.282  0   0
12.42   0   0
12.558  0   0
12.696  12  0
12.834  0   0
12.972  0   0
13.11   0   0
13.248  0   0
13.386  12  0
13.524  0   0
13.662  0   12
13.8    0   0
13.938  0   0
14.076  0   0
14.214  0   0
14.352  0   0
14.49   0   0
14.628  12  0
14.766  0   0
14.904  12  0
15.042  0   0
15.18   0   0
15.318  0   0
15.456  0   0
15.594  0   0
15.732  0   0
15.87   0   0
16.008  0   0
16.146  0   0
16.284  0   0
16.422  0   0
16.56   12  0
16.698  0   0
16.836  0   0
16.974  0   0
17.112  0   0
17.25   0   0
17.388  0   0
17.526  0   0
17.664  0   12
17.802  0   0
17.94   0   0
18.078  0   0
18.216  0   0
18.354  0   0
18.492  0   0
18.63   12  0
18.768  0   0
18.906  0   0
19.044  0   0
19.182  0   0
19.32   0   0
19.458  0   0
19.596  0   0
19.734  0   0
19.872  0   0
20.01   0   0
20.148  0   12
20.286  12  0
20.424  0   12
20.562  0   0
20.7    0   0
20.838  0   0
20.976  0   0
21.114  0   0
21.252  0   0
21.39   0   12
21.528  0   0
21.666  0   0
21.804  12  0
21.942  0   0
22.08   0   0
22.218  0   0
22.356  0   0
22.494  0   0
22.632  0   0
22.77   0   0
22.908  0   0
23.046  0   0
23.184  0   0
23.322  0   0
23.46   12  0
23.598  0   12
23.736  0   0
23.874  0   0
24.012  0   0
24.15   0   0
24.288  0   0
24.426  0   0
24.564  0   0
24.702  0   0
24.84   0   0
24.978  0   0
25.116  0   0
25.254  0   0
25.392  0   0
25.53   0   0
25.668  0   0
25.806  12  0
25.944  12  0
26.082  0   0
26.22   0   0
26.358  0   12
26.496  0   0
26.634  0   0
26.772  0   0
26.91   0   0
27.048  13  0
27.186  0   0
27.324  0   0
27.462  0   0

'0'列是我的x值。列“ 4”和“ 6”是数据“ 4”和“ 6”的y值。

如果我将列“ 4”与列“ 0”相对应,则会得到单峰分布。但是我想调整y值,使曲线下的面积为1。

我有两种策略。 (一个:梯形方法,第二个:将所有y值除以y的和)。

哪个是正确的?或者我该如何检查哪一个是正确的?

df = pd.read_csv('my_data.csv',header=0).iloc[2:,:]
x = df.iloc[:,0].values

df.iloc[:,1:] = df.iloc[:,1:]/df.iloc[:,1:].apply(lambda y: np.trapz(y,x=x), axis=0) #trapizoidal method

df.iloc[:,1:] = df.iloc[:,1:]/np.sum(df.iloc[:,1:]) #dividing by the sum 

0 个答案:

没有答案