我正在尝试计算样本数,平均值,标准差,变异系数,95%置信下限和上限以及每列中此数据集的四分位数,并将其放入新数据框。
以下数字不一定都是正确的&我没有全部填写,只是提供了一个例子。这些值将用于创建箱形图,因此需要四分位数。行和列最后将成为标题。见下面的例子。
这是结构:
B1 <- c(8, 6, 13, 6, 27, 104, 18, 3)
B2 <- c(2, 13, 1, 64, 127, 24, 4, 3)
B3 <- c(8, 16, 113, 680, 227, 310, 138, 30)
B4 <- c(238, 46, 613, 69, 7, 14, 4, 8)
x <- data.frame(B1, B2, B3, B4)
> head(x)
B1 B2 B3 B4
1 8 2 8 238
2 6 13 16 46
3 13 1 113 613
4 6 64 680 69
5 27 127 227 7
6 104 24 310 14
期望的输出:
> y
B1 B2 B3 B4
n 8 8 8 8
mean 23 30 190 125
Stand dev 5 2 34 2
CoeffofVariation 0.3 0.4 0.7 1.3
LowerConfInterval 2 20 35 45
UpperConfInterval 50 120 122 120
LowerQuartile
Median
Upper Quantile
Inter Quartile Range
Minimum
Maximum
Regression equation
答案 0 :(得分:2)
您可以使用以下内容:
B1 <- c(8, 6, 13, 6, 27, 104, 18, 3)
B2 <- c(2, 13, 1, 64, 127, 24, 4, 3)
B3 <- c(8, 16, 113, 680, 227, 310, 138, 30)
B4 <- c(238, 46, 613, 69, 7, 14, 4, 8)
combDF <- data.frame(cbind(B1,B2,B3,B4))
data_long <- gather(combDF, factor_key=TRUE)
data_long%>% group_by(key)%>%
summarise(mean= mean(value), sd= sd(value), max = max(value),min = min(value))
,输出为:
# A tibble: 4 x 5
key mean sd max min
<fctr> <dbl> <dbl> <dbl> <dbl>
1 B1 23.125 33.60458 104 3
2 B2 29.750 44.59260 127 1
3 B3 190.250 224.72253 680 8
4 B4 124.875 212.08653 613 4
您尚未指定所查看的置信度,但我发布的代码可以根据您的问题进行调整。
答案 1 :(得分:1)
正如lmo所提到的,你可以使用sapply
,如下所示:
sapply(x, function(x) c( "Stand dev" = sd(x),
"Mean"= mean(x,na.rm=TRUE),
"n" = length(x),
"Median" = median(x),
"CoeffofVariation" = sd(x)/mean(x,na.rm=TRUE),
"Minimum" = min(x),
"Maximun" = max(x),
"Upper Quantile" = quantile(x,1),
"LowerQuartile" = quantile(x,0)
)
)
输出:
B1 B2 B3 B4
Stand dev 33.604581 44.592600 224.722527 212.086531
Mean 23.125000 29.750000 190.250000 124.875000
n 8.000000 8.000000 8.000000 8.000000
Median 10.500000 8.500000 125.500000 30.000000
CoeffofVariation 1.453171 1.498911 1.181196 1.698391
Minimum 3.000000 1.000000 8.000000 4.000000
Maximun 104.000000 127.000000 680.000000 613.000000
Upper Quantile.100% 104.000000 127.000000 680.000000 613.000000
LowerQuartile.0% 3.000000 1.000000 8.000000 4.000000
答案 2 :(得分:0)
在基地R
:
> summary(x)
B1 B2 B3 B4
Min. : 3.00 Min. : 1.00 Min. : 8.0 Min. : 4.00
1st Qu.: 6.00 1st Qu.: 2.75 1st Qu.: 26.5 1st Qu.: 7.75
Median : 10.50 Median : 8.50 Median :125.5 Median : 30.00
Mean : 23.12 Mean : 29.75 Mean :190.2 Mean :124.88
3rd Qu.: 20.25 3rd Qu.: 34.00 3rd Qu.:247.8 3rd Qu.:111.25
Max. :104.00 Max. :127.00 Max. :680.0 Max. :613.00
这更符合您的要求:
> library(skimr)
> skim(x)
-- Data Summary ------------------------
Values
Name x
Number of rows 8
Number of columns 4
_______________________
Column type frequency:
numeric 4
________________________
Group variables None
-- Variable type: numeric ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 4 x 11
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
* <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 B1 0 1 23.1 33.6 3 6 10.5 20.2 104 ▇▁▁▁▁
2 B2 0 1 29.8 44.6 1 2.75 8.5 34 127 ▇▁▁▁▁
3 B3 0 1 190. 225. 8 26.5 126. 248. 680 ▇▂▂▁▂
4 B4 0 1 125. 212. 4 7.75 30 111. 613 ▇▁▁▁▁