我有一个大的数据帧DF,我将其分成6个分位数,并为每个分位数分配了一个DF,每个DF具有相同的标头名称。
我想对所有6个数据帧应用相同的功能,并创建一个结果DF,以保存每个DF索引的结果。
例如,生成平均值,为每一列计数,获取每个变量的百分比(每列中vlaues的百分比)等等。
这些动作将在所有DF中通用。
截至目前,我如下图所示手动完成
res_df = data.frame ("col_headers" = c("names"),
"df1_out" = c(sum(df1$C1)/nrow(df1),
sum(df1$C1)/nrow(df1),...
mean(df1$C1))
"df2_out" = c(sum(df2$C1)/nrow(df2),
sum(df2$C2)/nrow(df2),...
mean(df2$C1))
.
.
.
"df6_out" = c(sum(df6$C1)/nrow(df6),
sum(df6$C2)/nrow(df6),...
mean(df6$C1))
,依此类推。手动为每个数据帧的每个变量创建一列。当列数增加时,这会带来问题。
我想知道是否有一种方法可以自动化DF->分位数拆分->分位数DF列表->平均值,百分比(每行的贡献)等->新DF->结果的整个过程比较图
答案 0 :(得分:1)
您可以使用分位数创建一个因子变量,然后使用它来对数据框进行split(),例如(使用虹膜的示例):
> data("iris")
>
> iris$quantiles <- cut(iris$Sepal.Width, quantile(iris$Sepal.Width, probs = seq(0, 1, 1/6)),
+ include.lowest = TRUE)
> lista <- split(iris, iris$quantile)
这将为您提供一个列表,其中数据框分为几部分。然后使用lapply / sapply对所有数据帧执行操作,如下所示:
> mediaCol <- sapply(lista, function(x) {
+ colMeans(x[colnames(x) != c("Species", "quantiles")])
+ })
>
> mediaCol
[2,2.7] (2.7,2.9] (2.9,3] (3,3.2] (3.2,3.42] (3.42,4.4]
Sepal.Length 5.757576 6.220833 6.015385 5.954167 5.550000 5.520
Sepal.Width 2.493939 2.841667 3.000000 3.154167 3.366667 3.752
Petal.Length 4.330303 4.754167 4.234615 3.770833 3.044444 2.052
Petal.Width 1.378788 1.545833 1.403846 1.254167 1.000000 0.508
列的百分比贡献可能是:
> percCont <- lapply(lista, function(x) {
+ x[colnames(x) != c("Species", "quantiles")] <-
+ apply(x[colnames(x) != c("Species", "quantiles")], 2, function(y) {y / sum(y)})
+ return(x)
+ })
>
> percCont[1]
$`[2,2.7]`
Sepal.Length Sepal.Width Petal.Length Petal.Width Species quantiles
42 0.02368421 0.02794654 0.009097271 0.006593407 setosa [2,2.7]
54 0.02894737 0.02794654 0.027991603 0.028571429 versicolor [2,2.7]
58 0.02578947 0.02916160 0.023093072 0.021978022 versicolor [2,2.7]
60 0.02736842 0.03280680 0.027291812 0.030769231 versicolor [2,2.7]
61 0.02631579 0.02430134 0.024492652 0.021978022 versicolor [2,2.7]
63 0.03157895 0.02673147 0.027991603 0.021978022 versicolor [2,2.7]
68 0.03052632 0.03280680 0.028691393 0.021978022 versicolor [2,2.7]
69 0.03263158 0.02673147 0.031490553 0.032967033 versicolor [2,2.7]
70 0.02947368 0.03037667 0.027291812 0.024175824 versicolor [2,2.7]
73 0.03315789 0.03037667 0.034289713 0.032967033 versicolor [2,2.7]
80 0.03000000 0.03159174 0.024492652 0.021978022 versicolor [2,2.7]
81 0.02894737 0.02916160 0.026592022 0.024175824 versicolor [2,2.7]
82 0.02894737 0.02916160 0.025892232 0.021978022 versicolor [2,2.7]
83 0.03052632 0.03280680 0.027291812 0.026373626 versicolor [2,2.7]
84 0.03157895 0.03280680 0.035689293 0.035164835 versicolor [2,2.7]
88 0.03315789 0.02794654 0.030790763 0.028571429 versicolor [2,2.7]
90 0.02894737 0.03037667 0.027991603 0.028571429 versicolor [2,2.7]
91 0.02894737 0.03159174 0.030790763 0.026373626 versicolor [2,2.7]
93 0.03052632 0.03159174 0.027991603 0.026373626 versicolor [2,2.7]
94 0.02631579 0.02794654 0.023093072 0.021978022 versicolor [2,2.7]
95 0.02947368 0.03280680 0.029391183 0.028571429 versicolor [2,2.7]
99 0.02684211 0.03037667 0.020993702 0.024175824 versicolor [2,2.7]
102 0.03052632 0.03280680 0.035689293 0.041758242 virginica [2,2.7]
107 0.02578947 0.03037667 0.031490553 0.037362637 virginica [2,2.7]
109 0.03526316 0.03037667 0.040587824 0.039560440 virginica [2,2.7]
112 0.03368421 0.03280680 0.037088873 0.041758242 virginica [2,2.7]
114 0.03000000 0.03037667 0.034989503 0.043956044 virginica [2,2.7]
119 0.04052632 0.03159174 0.048285514 0.050549451 virginica [2,2.7]
120 0.03157895 0.02673147 0.034989503 0.032967033 virginica [2,2.7]
124 0.03315789 0.03280680 0.034289713 0.039560440 virginica [2,2.7]
135 0.03210526 0.03159174 0.039188244 0.030769231 virginica [2,2.7]
143 0.03052632 0.03280680 0.035689293 0.041758242 virginica [2,2.7]
147 0.03315789 0.03037667 0.034989503 0.041758242 virginica [2,2.7]
您可以使用unsplit()统一数据帧:
> iris_percCont <- unsplit(percCont, iris$quantiles)
>
> head(iris_percCont)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species quantiles
1 0.03695652 0.03731343 0.02729045 0.015748031 setosa (3.42,4.4]
2 0.03132992 0.03846154 0.01271571 0.005479452 setosa (2.9,3]
3 0.03289013 0.04227213 0.01436464 0.006644518 setosa (3,3.2]
4 0.03219034 0.04095112 0.01657459 0.006644518 setosa (3,3.2]
5 0.03623188 0.03837953 0.02729045 0.015748031 setosa (3.42,4.4]
6 0.03913043 0.04157783 0.03313840 0.031496063 setosa (3.42,4.4]
此外,有很多绘制方法。 lapply函数也可以帮助您。
希望有帮助。