我有一个具有以下结构的数据框:
df <- read.table(text="
site date v1 v2 v3 v4
a 2019-08-01 0 17 94 150
b 2019-08-01 5 25 83 148
c 2019-08-01 6 39 43 148
d 2019-08-01 10 39 144 165
a 2019-03-31 4 15 106 154
b 2019-03-31 4 21 70 151
c 2019-03-31 8 30 44 148
d 2019-03-31 9 41 144 160
a 2019-01-04 3 10 104 153
b 2019-01-04 2 16 90 150
c 2019-01-04 8 40 62 151
d 2019-01-04 9 43 142 162
a 2019-07-07 3 14 93 152
b 2019-07-07 2 23 74 147
c 2019-07-07 9 31 58 147
d 2019-07-07 9 36 123 170
a 2019-06-17 0 12 91 153
b 2019-06-17 3 25 73 147
c 2019-06-17 7 35 45 146
d 2019-06-17 8 40 134 168
a 2019-01-11 4 14 104 153
b 2019-01-11 5 18 73 151
c 2019-01-11 7 35 65 147
d 2019-01-11 11 44 134 168
a 2019-11-11 4 20 103 152
b 2019-11-11 6 22 79 152
c 2019-11-11 5 38 52 147
d 2019-11-11 10 38 144 163
a 2019-09-06 3 13 102 155
b 2019-09-06 6 17 74 149
c 2019-09-06 9 32 45 146
d 2019-09-06 11 42 138 165
", header=TRUE, stringsAsFactors=FALSE)
现在,我想计算每个站点整整一年(夏季和冬季)的变量(v1-v4)的统计信息(最小值,最大值,平均值,中位数,标准差)。 / p>
首先,我使用以下代码对夏季和冬季的数据进行了细分:
df_summer <- selectByDate(df, month = c(4:9))
df_winter <- selectByDate(df, month = c(1,2,3,10,11,12))
然后我尝试为季节然后为变量建立循环。为此,我创建了两个列表:
df_list <- list(df, df_summer, df_winter)
col_names <- c("v1", "v2", "v3", "v4")
然后我尝试在循环中实现:
for (i in seq_along(df_list)){
for (j in col_names[,i]){
[j]_[i] <- describeBy([i]$[,j], [i]$site)
[j]_[i] <- data.frame(matrix(unlist([j]_[i]), nrow=length([j]_[i]), byrow=T))
[j]_[i]$site <- c("Frau2", "MW", "Sys1", "Sys4")
[j]_[i]$season <- c([i], [i], [i], [i])
[j]_[i]$type <- c([j], [j], [j], [j])
}
}
但这没用-我收到消息:
Error: unexpected '[' in:
"for (j in col_names[,i]){
["
Error: unexpected '[' in " ["
Error: unexpected '}' in " }"
我已经使用循环“工作流程”来生成所需的数据,但这是通过复制和粘贴来完成的,以使数据快速又脏。现在,我要整理代码。
您是否知道我该如何进行这项工作或我做错了什么?
谢谢!
马赛厄斯
更新所以我尝试了ekoam的建议-谢谢您! -并且发生以下问题。
与我在ekoam的答案下方写的评论相反,两个数据集均发生错误(此处提供的示例和我正在使用的实际一个-我不确定是否允许发布该数据集) 。 这是我使用的代码,也是我得到的错误:
df <- read_excel("C:/###/###/###/Example_data.xlsx")
df <- subset(data_watersamples, site %in% c("a","b","c", "d"))
my_summary <-
. %>%
group_by(site) %>%
summarise_at(vars(
c(v1, v2, v3, v4),
list(min = min, max = max, mean = mean, median = median, sd = sd)
)) %>%
pivot_longer(-site, names_to = c("type", "stat"), names_sep = "_") %>%
pivot_wider(names_from = "stat")
summer <- as.integer(format.Date(df$date, "%m")) %in% 4:9
df_list <- list(full_year = df, summer = df[summer, ], winter = df[!summer, ])
lapply(df_list, my_summary)
并收到此错误:
Error: Must subset columns with a valid subscript vector.
x Subscript has the wrong type `list`.
i It must be numeric or character.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
Error in `*tmp*`[[id - n]] :
attempt to select more than one element in integerOneIndex
感谢您的帮助!
马赛厄斯
答案 0 :(得分:0)
如果您希望事情整洁,那么这种tidyverse
解决问题的方法又如何呢?
library(dplyr)
library(tidyr)
my_summary <-
. %>%
group_by(site) %>%
summarise(across(
c(v1, v2, v3, v4),
list(min = min, max = max, mean = mean, median = median, sd = sd)
)) %>%
pivot_longer(-site, names_to = c("type", "stat"), names_sep = "_") %>%
pivot_wider(names_from = "stat")
summer <- as.integer(format.Date(df$date, "%m")) %in% 4:9
df_list <- list(full_year = df, summer = df[summer, ], winter = df[!summer, ])
lapply(df_list, my_summary)
输出
`summarise()` ungrouping output (override with `.groups` argument)
`summarise()` ungrouping output (override with `.groups` argument)
`summarise()` ungrouping output (override with `.groups` argument)
$full_year
# A tibble: 16 x 7
site type min max mean median sd
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a v1 0 4 2.62 3 1.69
2 a v2 10 20 14.4 14 3.07
3 a v3 91 106 99.6 102. 5.93
4 a v4 150 155 153. 153 1.49
5 b v1 2 6 4.12 4.5 1.64
6 b v2 16 25 20.9 21.5 3.52
7 b v3 70 90 77 74 6.63
8 b v4 147 152 149. 150. 1.92
9 c v1 5 9 7.38 7.5 1.41
10 c v2 30 40 35 35 3.78
11 c v3 43 65 51.8 48.5 8.84
12 c v4 146 151 148. 147 1.60
13 d v1 8 11 9.62 9.5 1.06
14 d v2 36 44 40.4 40.5 2.67
15 d v3 123 144 138. 140 7.38
16 d v4 160 170 165. 165 3.40
$summer
# A tibble: 16 x 7
site type min max mean median sd
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a v1 0 3 1.5 1.5 1.73
2 a v2 12 17 14 13.5 2.16
3 a v3 91 102 95 93.5 4.83
4 a v4 150 155 152. 152. 2.08
5 b v1 2 6 4 4 1.83
6 b v2 17 25 22.5 24 3.79
7 b v3 73 83 76 74 4.69
8 b v4 147 149 148. 148. 0.957
9 c v1 6 9 7.75 8 1.5
10 c v2 31 39 34.2 33.5 3.59
11 c v3 43 58 47.8 45 6.90
12 c v4 146 148 147. 146. 0.957
13 d v1 8 11 9.5 9.5 1.29
14 d v2 36 42 39.2 39.5 2.5
15 d v3 123 144 135. 136 8.85
16 d v4 165 170 167 166. 2.45
$winter
# A tibble: 16 x 7
site type min max mean median sd
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a v1 3 4 3.75 4 0.5
2 a v2 10 20 14.8 14.5 4.11
3 a v3 103 106 104. 104 1.26
4 a v4 152 154 153 153 0.816
5 b v1 2 6 4.25 4.5 1.71
6 b v2 16 22 19.2 19.5 2.75
7 b v3 70 90 78 76 8.83
8 b v4 150 152 151 151 0.816
9 c v1 5 8 7 7.5 1.41
10 c v2 30 40 35.8 36.5 4.35
11 c v3 44 65 55.8 57 9.60
12 c v4 147 151 148. 148. 1.89
13 d v1 9 11 9.75 9.5 0.957
14 d v2 38 44 41.5 42 2.65
15 d v3 134 144 141 143 4.76
16 d v4 160 168 163. 162. 3.40