I am trying to calculate the percentiles of each column of a dataframe, to store as rows in a new dataframe. I will then go on to plot this new df as a line graph wrapped by different based on different subgroups in my data.
But my current attempts result in an empty, not updated df.
I am able to do the following on a single specified column:
dataframe:
col1
1 15
2 24
3 23
4 25
5 25
sequence <- seq(from=0, to=1, by=0.01)
quantiles_df <- as.data.frame(quantile(df$col1, sequence))
and I am able to do the following to draw multiple histograms, 1 for each column of my dataframe using this code
for (i in 1:length(df)){
print (i)
hist(df[[i]], main="histogram", breaks=20)
}
however merging this for loop with my quantile function returns either errors or only 1 column dataframe.
Returns quantile.df with 1 column
for (i in 1:length(df)){
print(i)
quantile.df <- as.data.frame(quantile(df[[i]], sequence, na.rm=TRUE))
}
Returns error when trying to use colnames, not col numbers
for (i in colnames(df)){
print(i)
quantile.df <- as.data.frame(quantile(genes2$[i], sequence, na.rm=TRUE))
}
Expected results:
dataframe of 120 columns by 101 rows, containing each results for each percentile 0 to 100
Actual results:
when using length() --> 1 column x 101 row dataframe
when using colnames() -->
Error: unexpected '[' in:
"print(i)
quantile.df <- as.data.frame(quantile(df$['
答案 0 :(得分:0)
Your main problem is that you don't change what you're assigning to, each time through the loop you try to overwrite quantile.df
, not telling R to put things in a new row.
However, there's a nicer way with sapply
. sapply
will be default loop over the columns of a data frame, apply a function, and simplify the result.
Here's a simple example with a few quantiles on the built-in mtcars
data:
quants = c(0.25, 0.5, 0.75)
sapply(mtcars, quantile, probs = quants)
# mpg cyl disp hp drat wt qsec vs am gear carb
# 25% 15.425 4 120.825 96.5 3.080 2.58125 16.8925 0 0 3 2
# 50% 19.200 6 196.300 123.0 3.695 3.32500 17.7100 0 0 4 2
# 75% 22.800 8 326.000 180.0 3.920 3.61000 18.9000 1 1 4 4
(Note that this is a matrix
, you might want to use as.data.frame()
on it.)
Similarly, you can get histograms with for each column with sapply(mtcars, hist)
.
To do this well with a loop, you should pre-allocate the result data frame (so it's the right size), then fill it in column by column. I can add an example if you'd like.
答案 1 :(得分:0)
Reproducible Data
df <- as.data.frame(matrix(rnorm(400), 100, 4))
Histogram and Quantile
You will get the histogram and quantile at the same time by the code below. I use mapply()
instead of sapply()
because I wanna design the titles of histograms to be each column name. If you don't have the request, you can revise it.
par(mfrow = c(1, 4))
quant <- mapply(function(value, name){
hist(value, main = paste0("Histogram of ", name), breaks = 20)
quantile(value, seq(0, 1, by = 0.1))
}, df, names(df), SIMPLIFY = T)
quant
# V1 V2 V3 V4
# 0% -2.44712416 -2.63463290 -3.08872658 -2.8410463
# 10% -0.88944226 -1.16264448 -1.24097984 -1.1701429
# 20% -0.71782990 -0.91843217 -0.75868358 -0.8962623
# 30% -0.51587838 -0.66932521 -0.52816811 -0.8046574
# ...
Notice that the output of mapply()
is a matrix. If you want it to be a data frame, try :
as.data.frame(quant)
If you want the quantile as a row, try :
as.data.frame(t(quant))