How to loop through a dataframe's columns in R and output quantiles() for each column as a row in new dataframe

时间:2019-01-18 18:19:54

标签: r dataframe

I am trying to calculate the percentiles of each column of a dataframe, to store as rows in a new dataframe. I will then go on to plot this new df as a line graph wrapped by different based on different subgroups in my data.

But my current attempts result in an empty, not updated df.

I am able to do the following on a single specified column:

dataframe:
    col1
1    15
2    24
3    23
4    25
5    25
sequence <- seq(from=0, to=1, by=0.01)
quantiles_df <- as.data.frame(quantile(df$col1, sequence))

and I am able to do the following to draw multiple histograms, 1 for each column of my dataframe using this code

for (i in 1:length(df)){
print (i)
hist(df[[i]], main="histogram", breaks=20)
}

however merging this for loop with my quantile function returns either errors or only 1 column dataframe.

Returns quantile.df with 1 column

for (i in 1:length(df)){
print(i)
quantile.df <- as.data.frame(quantile(df[[i]], sequence, na.rm=TRUE))
}

Returns error when trying to use colnames, not col numbers

for (i in colnames(df)){
print(i)
quantile.df <- as.data.frame(quantile(genes2$[i], sequence, na.rm=TRUE))
}

Expected results:

dataframe of 120 columns by 101 rows, containing each results for each percentile 0 to 100

Actual results:

when using length() --> 1 column x 101 row dataframe

when using colnames() -->

Error: unexpected '[' in:
"print(i)
quantile.df <- as.data.frame(quantile(df$['



2 个答案:

答案 0 :(得分:0)

Your main problem is that you don't change what you're assigning to, each time through the loop you try to overwrite quantile.df, not telling R to put things in a new row.

However, there's a nicer way with sapply. sapply will be default loop over the columns of a data frame, apply a function, and simplify the result.

Here's a simple example with a few quantiles on the built-in mtcars data:

quants = c(0.25, 0.5, 0.75)
sapply(mtcars, quantile, probs = quants)
#        mpg cyl    disp    hp  drat      wt    qsec vs am gear carb
# 25% 15.425   4 120.825  96.5 3.080 2.58125 16.8925  0  0    3    2
# 50% 19.200   6 196.300 123.0 3.695 3.32500 17.7100  0  0    4    2
# 75% 22.800   8 326.000 180.0 3.920 3.61000 18.9000  1  1    4    4

(Note that this is a matrix, you might want to use as.data.frame() on it.)

Similarly, you can get histograms with for each column with sapply(mtcars, hist).

To do this well with a loop, you should pre-allocate the result data frame (so it's the right size), then fill it in column by column. I can add an example if you'd like.

答案 1 :(得分:0)

Reproducible Data

df <- as.data.frame(matrix(rnorm(400), 100, 4))

Histogram and Quantile

You will get the histogram and quantile at the same time by the code below. I use mapply() instead of sapply() because I wanna design the titles of histograms to be each column name. If you don't have the request, you can revise it.

par(mfrow = c(1, 4))
quant <- mapply(function(value, name){
  hist(value, main = paste0("Histogram of ", name), breaks = 20)
  quantile(value, seq(0, 1, by = 0.1))
}, df, names(df), SIMPLIFY = T)

quant

#               V1          V2          V3         V4
# 0%   -2.44712416 -2.63463290 -3.08872658 -2.8410463
# 10%  -0.88944226 -1.16264448 -1.24097984 -1.1701429
# 20%  -0.71782990 -0.91843217 -0.75868358 -0.8962623
# 30%  -0.51587838 -0.66932521 -0.52816811 -0.8046574
# ...

enter image description here

Notice that the output of mapply() is a matrix. If you want it to be a data frame, try :

as.data.frame(quant)

If you want the quantile as a row, try :

as.data.frame(t(quant))