我有一个包含三列的数据框
| A | B | C |
我计算了分位数:
function colSpamDemo( )
{
let wb = new ExcelJS.Workbook();
let ws = wb.addWorksheet('Export');
ws.getCell('A1:C1').value = 'This price list supercedes all prior price lists.';
ws.getCell('A1:C1').alignment = { horizontal:'center'} ;
}
我想添加一个新列df.quantile(.25)
df.quantile(.75)
,它根据一条简单的规则使用Q
进行分类。如果值小于1个四分位数,则较小;如果它大于3个四分位数,则表示它很大,并且介于两者之间的所有内容都是中等。
我尝试使用qcut,但它仅接收一维输入。
谢谢
答案 0 :(得分:3)
pd.qcut
是你的朋友。
pd.qcut(s, q=[0, .25, .75, 1], labels=['small', 'medium', 'large'])
MWE
print(s)
0 1
1 1
2 2
3 3
4 4
5 2
6 4
7 6
8 4
9 6
10 5
11 4
12 6
13 7
14 3
15 2
16 1
17 1
18 2
dtype: int64
print (pd.qcut(s, q=[0, .25, .75, 1], labels=['small', 'medium', 'large']))
0 small
1 small
2 small
3 medium
4 medium
5 small
6 medium
7 large
8 medium
9 large
10 large
11 medium
12 large
13 large
14 medium
15 small
16 small
17 small
18 small
dtype: category
Categories (3, object): [small < medium < large]
对于DataFrame,请对每列重复apply
:
df.apply(pd.qcut, q=[0, .25, .75, 1], labels=['small', 'medium', 'large'], axis=0)
答案 1 :(得分:1)
np.random.seed([3, 1415])
df = pd.DataFrame(
np.random.randint(10, size=(10, 3)),
columns=list('ABC')
)
pandas.DataFrame.mask
仅熊猫且直观
is_small = df < df.quantile(.25)
is_large = df > df.quantile(.75)
is_medium = ~(is_small | is_large)
df.mask(is_small, 'small').mask(is_large, 'large').mask(is_medium, 'medium')
A B C
0 small small medium
1 medium large medium
2 small large large
3 medium small small
4 small medium large
5 large medium small
6 medium medium medium
7 medium large medium
8 medium medium medium
9 large medium large
numpy.where
is_small = df < df.quantile(.25)
is_large = df > df.quantile(.75)
pd.DataFrame(
np.where(is_small, 'small', np.where(is_large, 'large', 'medium')),
df.index, df.columns
)
A B C
0 small small medium
1 medium large medium
2 small large large
3 medium small small
4 small medium large
5 large medium small
6 medium medium medium
7 medium large medium
8 medium medium medium
9 large medium large