我当前的数据框如下所示:
# Create sample data
my_df <- data.frame(seq(1, 100), rep(c("ind_1", "", "", ""), times = 25), rep(c("", "ind_2", "", ""), times = 25), rep(c("", "", "ind_3", ""), times = 25), rep(c("", "", "", "ind_4"), times = 25))
# Rename columns
names(my_df)[names(my_df)=="seq.1..100."] <- "value"
names(my_df)[names(my_df)=="rep.c..ind_1................times...25."] <- "ind_1"
names(my_df)[names(my_df)=="rep.c......ind_2............times...25."] <- "ind_2"
names(my_df)[names(my_df)=="rep.c..........ind_3........times...25."] <- "ind_3"
names(my_df)[names(my_df)=="rep.c..............ind_4....times...25."] <- "ind_4"
# Replace empty elements with NA
my_df[my_df==''] = NA
我想要编写脚本的是一个相当简单的for
循环,它计算四个value
列中每一列的ind_*
列的总和并打印结果。
到目前为止,我的微薄尝试一直是:
# Create a vector with all individuals
individuals <- c("ind_1", "ind_2", "ind_3", "ind_4")
# Calculate aggregates for each individual
for (i in individuals){
ind <- 1
sum_i <- aggregate(value~ind_1, data = my_df, sum)
print(paste("Individual", i, "possesses an aggregated value of", sum_i$value))
ind <- ind + 1
}
正如您所看到的,我目前很难包含正确的命令来计算基于一列接一列的总和作为当前输出,当然只计算ind_1
的结果。需要在aggregate
命令中更改哪些内容才能达到预期的结果(我是一个初学者,但是想过使用索引从一列到另一列?)?
答案 0 :(得分:4)
假设你想要计算总和,如果ind-column匹配个人载体中的表达式:
individuals <- c("ind_1", "ind_2", "ind_3", "ind_4")
for (i in 1:(ncol(my_df)-1)){
print(sum(my_df$value[which(my_df[,individuals[i]] == individuals[i])]))
}
为什么要使用print()
而不是将结果存储在单独的向量中?
答案 1 :(得分:2)
您也可以尝试tidyverse
:
my_df %>%
gather(key, Inds, -value) %>%
filter(!is.na(Inds)) %>%
group_by(key) %>%
summarise(Sum=sum(value))
# A tibble: 4 x 2
key Sum
<chr> <int>
1 ind_1 1225
2 ind_2 1250
3 ind_3 1275
4 ind_4 1300
想法是使用gather
使数据变长。过滤NA
,然后按Inds分组并汇总值。
更基础的R解决方案将是:
library(reshape2)
my_df_long <- melt(my_df, id.vars = "value",value.name = "ID")
aggregate(value ~ ID, my_df_long, sum, na.rm= T)
ID value
1 ind_1 1225
2 ind_2 1250
3 ind_3 1275
4 ind_4 1300