Question

我的df索引为日期，列也称为分数。现在我想保持df原样，但是添加列，该列给出当天0.7分数的分数。分位数的方法需要是中点，也可以四舍五入到最接近的整数。

Answer 1

我已经概述了您可以采取的一种方法，如下所示。

请注意，要将值舍入为最接近的整数，您应该使用Python的内置round()函数。有关详细信息，请参阅Python documentation中的round()。

import pandas as pd
import numpy as np
# set random seed for reproducibility
np.random.seed(748)

# initialize base example dataframe
df = pd.DataFrame({"date":np.arange(10), 
                   "score":np.random.uniform(size=10)})

duplicate_dates = np.random.choice(df.index, 5)

df_dup = pd.DataFrame({"date":np.random.choice(df.index, 5), 
                       "score":np.random.uniform(size=5)})

# finish compiling example data
df = df.append(df_dup, ignore_index=True)

# calculate 0.7 quantile result with specified parameters
result = df.groupby("date").quantile(q=0.7, axis=0, interpolation='midpoint')

# print resulting dataframe
# contains one unique 0.7 quantile value per date
print(result)

"""
0.7      score
date          
0     0.585087
1     0.476404
2     0.426252
3     0.363376
4     0.165013
5     0.927199
6     0.575510
7     0.576636
8     0.831572
9     0.932183
"""

# to apply the resulting quantile information to 
# a new column in our original dataframe `df`
# we can apply a dictionary to our "date" column

# create dictionary
mapping = result.to_dict()["score"]

# apply to `df` to produce desired new column
df["quantile_0.7"] = [mapping[x] for x in df["date"]]

print(df)

"""
    date     score  quantile_0.7
0      0  0.920895      0.585087
1      1  0.476404      0.476404
2      2  0.380771      0.426252
3      3  0.363376      0.363376
4      4  0.165013      0.165013
5      5  0.927199      0.927199
6      6  0.340008      0.575510
7      7  0.695818      0.576636
8      8  0.831572      0.831572
9      9  0.932183      0.932183
10     7  0.457455      0.576636
11     6  0.650666      0.575510
12     6  0.500353      0.575510
13     0  0.249280      0.585087
14     2  0.471733      0.426252
"""

根据groupby添加.75分位数列

1 个答案: