从pandas.quantile中分配值

时间:2018-02-01 13:07:42

标签: python pandas dataframe

我只是尝试将数据帧的分位数分配到另一个数据帧,如:

dataframe['pc'] = dataframe['row'].quantile([.1,.5,.7])

结果是 0 NaN ... 5758 NaN Name: pc, Length: 5759, dtype: float64

任何想法为什么dataframe['row']有足够的价值

1 个答案:

答案 0 :(得分:0)

预计会有不同的索引,因此Series创建的quantile与原始DataFrame无关,并获得NaN s:

#indices 0,1,2...6
dataframe = pd.DataFrame({'row':[2,0,8,1,7,4,5]})
print (dataframe)
   row
0    2
1    0
2    8
3    1
4    7
5    4
6    5

#indices 0.1, 0.5, 0.7
print (dataframe['row'].quantile([.1,.5,.7]))
0.1    0.6
0.5    4.0
0.7    5.4
Name: row, dtype: float64

#not align
dataframe['pc'] = dataframe['row'].quantile([.1,.5,.7])
print (dataframe)
   row  pc
0    2 NaN
1    0 NaN
2    8 NaN
3    1 NaN
4    7 NaN
5    4 NaN
6    5 NaN

如果想创建quantile的数据框,请添加rename_axis + reset_index

df = dataframe['row'].quantile([.1,.5,.7]).rename_axis('a').reset_index(name='b')
print (df)
     a    b
0  0.1  0.6
1  0.5  4.0
2  0.7  5.4

但如果某些指数相同(我认为这不是你想要的,只是为了更好的解释):

为默认索引0,1,2添加reset_index

print (dataframe['row'].quantile([.1,.5,.7]).reset_index(drop=True))
0    0.6
1    4.0
2    5.4
Name: row, dtype: float64

前三行是对齐的,因为0,1,2Series中的索引DataFrame相同:

dataframe['pc'] = dataframe['row'].quantile([.1,.5,.7]).reset_index(drop=True)
print (dataframe)
   row   pc
0    2  0.6
1    0  4.0
2    8  5.4
3    1  NaN
4    7  NaN
5    4  NaN
6    5  NaN

编辑: 对于需要DataFrame.quantile的多个列,它还会排除非数字列:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (df)
   A  B  C  D  E  F
0  a  4  7  1  5  a
1  b  5  8  3  3  a
2  c  4  9  5  6  a
3  d  5  4  7  9  b
4  e  5  2  1  2  b
5  f  4  3  0  4  b

df1 = df.quantile([.1,.2,.3,.4])
print (df1)
       B    C    D    E
0.1  4.0  2.5  0.5  2.5
0.2  4.0  3.0  1.0  3.0
0.3  4.0  3.5  1.0  3.5
0.4  4.0  4.0  1.0  4.0