我有一个data.frame,我想对其应用分位数以使数据看起来更简单:
> head(Quartile)
GSM1321374 GSM1321375 GSM1321376 GSM1321377 GSM1321378 GSM1321379
1415670_at 11.203302 11.374616 10.876187 11.23639 11.02051 10.926481
1415671_at 11.196427 11.492769 11.493717 11.01683 11.15016 11.576188
1415672_at 11.550974 11.267559 11.800991 11.57551 10.93359 11.222779
1415673_at 11.293390 10.978280 11.367316 10.45135 10.35822 10.234964
1415674_a_at 9.254073 10.572670 9.361991 11.26998 10.21125 10.245857
1415675_at 9.922985 9.228195 9.798156 10.02844 10.19928 9.749947
我应用了以下功能,它完成了这项工作。
quantfun <- function(x) as.integer(cut(x, quantile(x, probs=0:4/4), include.lowest=TRUE))
a <- apply(Quartile,1,quantfun)
b <- t(a)
colnames(b) <- colnames(Quartile)
输出是:
> head(b)
GSM1321374 GSM1321375 GSM1321376 GSM1321377 GSM1321378 GSM1321379
1415670_at 3 4 1 4 2 1
1415671_at 2 3 4 1 1 4
1415672_at 3 2 4 4 1 1
1415673_at 4 3 4 2 1 1
1415674_a_at 1 4 1 4 2 3
1415675_at 3 1 2 4 4 1
但问题是它在每个列上分别应用了分位数,我想为整个data.frame提供一个统一的分位数。
> duration = Quartile$GSM1321374
> quantile(duration)
0% 25% 50% 75% 100%
9.254073 9.922985 11.120381 11.203302 11.550974
> duration = Quartile$GSM1321375
> quantile(duration)
0% 25% 50% 75% 100%
9.228195 10.572670 10.946407 11.267559 11.492769
答案 0 :(得分:3)
首先找到数据框的四分位数范围以获取您的垃圾箱:
quantile(unlist(Quartile))
0% 25% 50% 75% 100%
9.228195 10.229036 10.997555 11.275832 11.800991
我们现在有每组的范围(即9.228 - 10.229)。然后创建四分位数据框:
Quartile[] <- matrix(quantfun(unlist(Quartile)), nrow(Quartile))
我们正在使用unlist(Quartile)
将数据框视为向量的事实。如果您希望保留原始数据框并使用副本:
Quartile2 <- Quartile
Quartile2[] <- matrix(quantfun(unlist(Quartile2)), nrow(Quartile2))