Question

假设我创建了以下数据框：

df = pd.DataFrame({'A':np.random.random(20), 'B':np.random.random(20)})
df
Out[162]: 
           A         B
0   0.888651  0.380360
1   0.513343  0.605991
2   0.560978  0.076174
3   0.209426  0.498564
4   0.121748  0.771653
5   0.843299  0.279264
6   0.644060  0.725061
7   0.200187  0.349093
8   0.807808  0.657373
9   0.212760  0.384311
10  0.000725  0.023815
11  0.614540  0.534569
12  0.083690  0.228761
13  0.202334  0.266114
14  0.104520  0.757514
15  0.039944  0.014512
16  0.465300  0.164657
17  0.247370  0.894628
18  0.980589  0.833938
19  0.734673  0.745574

然后，我想：

了解＆＃39; B＆＃39;落在垃圾箱中的列：np.arange(0, 1.05, 0.05)
将该信息添加为列＆＃39; freq＆＃39;。因此，例如，row[0] 'B'其中[0.35, 0.40)为0.38且介于df['freq'][0] = 2之间，在数据框中发生了2次。因此，我们将'weights'
然后，我想要一个名为df.groupby(pd.cut(df['B'], np.arange(0, 1.05, 0.05))).count()的新列，对于每一行，max（freq）/ freq

我可以用.accordion-button { position: absolute; left: 11px; }之类的东西来解决1，尽管可能有更优雅的方法来做到这一点

我未能解决2

而且3很直接。

最终，我只需要权重＆＃39;由1,2和3创建的列。

Answer 1

您可以使用例如1 np.digitize和2使用transform()。

import pandas as pd 
import numpy as np
df = pd.DataFrame({'A': np.random.random(20), 'B': np.random.random(20)})

bins = np.arange(0, 1.05, 0.05)
df["bins"] = np.digitize(df["B"], bins)
df["count"] = df.groupby("bins")["bins"].transform("count")
df["weight"] = df["count"].max()/df["count"]

df
Out[32]: 
           A         B  bins  count  weight
0   0.032735  0.948836    19      1     3.0
1   0.728310  0.671117    14      2     1.5
2   0.307804  0.328636     7      1     3.0
3   0.794719  0.257233     6      3     1.0
4   0.137138  0.480473    10      1     3.0
5   0.145847  0.754164    16      2     1.5
6   0.929552  0.187502     4      1     3.0
7   0.700309  0.655163    14      2     1.5
8   0.590829  0.561370    12      1     3.0
9   0.236366  0.814549    17      2     1.5
10  0.409573  0.444851     9      1     3.0
11  0.611366  0.842374    17      2     1.5
12  0.184661  0.725729    15      1     3.0
13  0.643751  0.299513     6      3     1.0
14  0.421400  0.294158     6      3     1.0
15  0.293585  0.112387     3      1     3.0
16  0.790870  0.609906    13      1     3.0
17  0.980155  0.757171    16      2     1.5
18  0.733151  0.393027     8      2     1.5
19  0.512966  0.398919     8      2     1.5

Pandas DF - 测量频率，附加到适当的行并按max（freq）

1 个答案: