我有一个像这样的数据框
Step <- c("1","1","4","3","2","2","3","4","4","3","1","3","2","4","3","1","2")
Length <- c(0.1,0.5,0.7,0.8,0.2,0.1,0.3,0.8,0.9,0.15,0.25,0.27,0.28,0.61,0.15,0.37,0.18)
Breadth <- c(0.13,0.35,0.87,0.38,0.52,0.71,0.43,0.8,0.9,0.15,0.45,0.7,0.8,0.11,0.11,0.47,0.28)
Height <- c(0.31,0.35,0.37,0.38,0.32,0.51,0.53,0.48,0.9,0.15,0.35,0.32,0.22,0.11,0.17,0.27,0.38)
Width <- c(0.21,0.25,0.27,0.8,0.2,0.21,0.3,0.28,0.29,0.65,0.55,0.37,0.26,0.31,0.5,0.7,0.8)
df <- data.frame(Step,Length,Breadth,Height,Width)
我正在尝试计算按步骤分组的测量值的最大值,最小值,平均值,中值,标准差,然后将具有测量值的列作为列进行旋转。
所需的输出是
Measurement max_1 min_1 mean_1 median_1 sd_1 max_2 min_2 mean_2 median_2 sd_2 max_3 min_3 mean_3 median_3 sd_3 max_4 min_4 mean_4 median_4 sd_4
Length 0.50 0.10 0.3050 0.31 0.17058722 0.28 0.10 0.1900 0.190 0.07393691 0.80 0.15 0.334 0.27 0.2693139 0.90 0.61 0.7525 0.750 0.12526638
Breadth 0.47 0.13 0.3500 0.40 0.15577760 0.80 0.28 0.5775 0.615 0.23012680 0.70 0.11 0.354 0.38 0.2383904 0.90 0.11 0.6700 0.835 0.37567720
Height 0.35 0.27 0.3200 0.33 0.03829708 0.51 0.22 0.3575 0.350 0.12120919 0.53 0.15 0.310 0.32 0.1570032 0.90 0.11 0.4650 0.425 0.32888701
Width 0.70 0.21 0.4275 0.40 0.23669601 0.80 0.20 0.3675 0.235 0.28952547 0.80 0.30 0.524 0.50 0.2040343 0.31 0.27 0.2875 0.285 0.01707825
我试图用这种方式来计算摘要统计数据,但这不是一种有效的方法。
library(dplyr)
df1 <- df %>%
group_by(Step) %>%
summarise(Length_Mean = mean(Length),
Breadth_Mean = mean(Breadth),
Height_Mean = mean(Height),
Width_Mean = mean(Width))
如何使用最少的代码高效地完成所需的输出?有人能指出我正确的方向吗?
答案 0 :(得分:4)
您可以使用summarize
版?scoped
来计算相同的摘要
一次显示多列的统计信息。来自summarize_all
:
以_if,_at或_all为后缀的变体应用表达式 (有时是几个)指定子集中的所有变量。这个 子集可以包含所有变量(_all变量),vars()选择 (_at变体),或用谓词选择的变量(_if变体)。
这里library(tidyverse)
# Calculate the summary statistics
sums <- df %>%
group_by(Step) %>%
summarize_all(funs(max, min, mean, median, sd))
sums
#> # A tibble: 4 x 21
#> Step Length_max Breadth_max Height_max Width_max Length_min Breadth_min
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.5 0.47 0.35 0.7 0.1 0.13
#> 2 2 0.28 0.8 0.51 0.8 0.1 0.28
#> 3 3 0.8 0.7 0.53 0.8 0.15 0.11
#> 4 4 0.9 0.9 0.9 0.31 0.61 0.11
#> # ... with 14 more variables: Height_min <dbl>, Width_min <dbl>,
#> # Length_mean <dbl>, Breadth_mean <dbl>, Height_mean <dbl>,
#> # Width_mean <dbl>, Length_median <dbl>, Breadth_median <dbl>,
#> # Height_median <dbl>, Width_median <dbl>, Length_sd <dbl>,
#> # Breadth_sd <dbl>, Height_sd <dbl>, Width_sd <dbl>
可能是一个不错的选择;它选择除了以外的所有列
对于分组列。您还可以提供多个摘要功能
计算选择中的每个变量。
gather
现在我们有了摘要统计数据,剩下要做的就是
重塑数据以实现所需的输出。为此,spread
,separate
,
来自 tidyr 的unite
和sums %>%
# Reshape to long format
gather(col, val, -Step) %>%
# Separate the measurement and the summary statistic
separate(col, into = c("Measurement", "stat")) %>%
arrange(Step) %>%
# Create the desired column headings
unite(col, stat, Step) %>%
# Need to use factors to preserve order
mutate_at(vars(col, Measurement), fct_inorder) %>%
# Reshape back to wide format
spread(col, val)
#> # A tibble: 4 x 21
#> Measurement max_1 min_1 mean_1 median_1 sd_1 max_2 min_2 mean_2
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Length 0.5 0.1 0.305 0.31 0.171 0.28 0.1 0.19
#> 2 Breadth 0.47 0.13 0.35 0.4 0.156 0.8 0.28 0.578
#> 3 Height 0.35 0.27 0.32 0.330 0.0383 0.51 0.22 0.358
#> 4 Width 0.7 0.21 0.428 0.4 0.237 0.8 0.2 0.368
#> # ... with 12 more variables: median_2 <dbl>, sd_2 <dbl>, max_3 <dbl>,
#> # min_3 <dbl>, mean_3 <dbl>, median_3 <dbl>, sd_3 <dbl>, max_4 <dbl>,
#> # min_4 <dbl>, mean_4 <dbl>, median_4 <dbl>, sd_4 <dbl>
会派上用场:
let dateAndTime = moment(component.props.data.value, [moment.ISO_8601, 'HH:mm']);
由"scoped"(v0.2.0)创建于2018-05-24。