我有这个数据框:
split.test.input <- data.frame(matrix(ncol=7,nrow=10,
c(rep("a",4),rep("b",4),rep("c",2),1910:1913,1902:1905,1925:1926,
rep("year",4),rep("month",3),rep("year",3),
rep("ITA",4),rep("HVR",2),rep("ITA",2),rep("ESP",2),
rep("GSA 17",5),rep("GSA 1",2),rep("GSA 12",3),
rep("gear 1",4),rep("gear 2",6),75,45,230,89,45,78,96,100,125,200)))
colnames(split.test.input) <- c("species", "year", "Time.unit","country","GSA","Gear","Quantity")
我拆分了许多变量:
split.res <- dlply(split.test.input,.(species),
dlply,.(Time.unit),
dlply,.(country),
dlply,.(GSA),
dlply,.(Gear))
现在,我想为列表中每个元素的每个数量计算一些统计分析(在这种情况下为总和),例如,我提取第一个列表(列表等的列表等):< / p>
df.fromSplit <- data.frame(split.res[["a"]][["year"]][["ITA"]][["GSA 17"]][["gear 1"]][["Quantity"]])
colnames(df.fromSplit) <- "a,year,ITA,GSA 17,gear.1" #the name of my variables for the first list
df.fromSplit
a,year,ITA,GSA 17,gear.1
1 75
2 45
3 230
4 89
我想为此列计算sum
:
sum(as.numeric(levels(df.fromSplit[,1])[df.fromSplit[,1]] ))
439
但这并不优雅...
重要
我想动态计算每个数量的总和 我列表中的每个元素。结果可能是(或多或少)数据 帧或多个数据帧(每个列表一个),如下所示:
combination sum
a,year,ITA,GSA 17,gear.1 439
b,month,HVR,GSA.1,gear.2 78
[...]
and so on for each combination of list
我认为一个for
循环可以提取列表的每个元素,并计算每个列表的数量之和,但是对于for循环,我不知道如何基于变量提取每个列表(我列表的体验非常低)
答案 0 :(得分:1)
实际上很难想象有一个需要split.res
这样的复杂对象的目的。您的要求可以简单得多。
首先,让我们将Quantity
转换为数字类型(当前是一个因素)。
split.test.input$Quantity <- as.numeric(as.character(split.test.input$Quantity))
然后简单
tapply(split.test.input$Quantity, apply(split.test.input[c(1, 3:6)], 1, paste0, collapse = ", "), sum)
# a, year, ITA, GSA 17, gear 1 b, month, HVR, GSA 1, gear 2
# 439 78
# b, month, HVR, GSA 17, gear 2 b, month, ITA, GSA 1, gear 2
# 45 96
# b, year, ITA, GSA 12, gear 2 c, year, ESP, GSA 12, gear 2
# 100 325
或
(groups <- apply(split.test.input[c(1, 3:6)], 1, paste0, collapse = ", "))
# [1] "a, year, ITA, GSA 17, gear 1" "a, year, ITA, GSA 17, gear 1"
# [3] "a, year, ITA, GSA 17, gear 1" "a, year, ITA, GSA 17, gear 1"
# [5] "b, month, HVR, GSA 17, gear 2" "b, month, HVR, GSA 1, gear 2"
# [7] "b, month, ITA, GSA 1, gear 2" "b, year, ITA, GSA 12, gear 2"
# [9] "c, year, ESP, GSA 12, gear 2" "c, year, ESP, GSA 12, gear 2"
tapply(split.test.input$Quantity, groups, sum)
另外,由于您已经在使用dlply
,因此您可能会对类似
ddply(split.test.input, .(species, Time.unit, country, GSA, Gear), summarise, Sum = sum(Quantity))
species Time.unit country GSA Gear Sum
# 1 a year ITA GSA 17 gear 1 439
# 2 b month HVR GSA 1 gear 2 78
# 3 b month HVR GSA 17 gear 2 45
# 4 b month ITA GSA 1 gear 2 96
# 5 b year ITA GSA 12 gear 2 100
# 6 c year ESP GSA 12 gear 2 325
答案 1 :(得分:1)
考虑汇总多个列:
split.test.input$Quantity <- as.numeric(as.character(split.test.input$Quantity))
agg_df <- aggregate(Quantity ~ species + Time.unit + country + GSA + Gear,
data=split.test.input, FUN=sum)
agg_df
# species Time.unit country GSA Gear Quantity
# 1 a year ITA GSA 17 gear 1 439
# 2 b month HVR GSA 1 gear 2 78
# 3 b month ITA GSA 1 gear 2 96
# 4 c year ESP GSA 12 gear 2 325
# 5 b year ITA GSA 12 gear 2 100
# 6 b month HVR GSA 17 gear 2 45
如果需要列表,请对{em>组合列使用by
运行tapply
(面向对象的包装器paste(..., collapse="")
):
df_list <- by(split.test.input, split.test.input[c("species", "Time.unit", "country", "GSA", "Gear")],
function(sub) unique(transform(sub,
combination = paste(unique(sub[c("species", "Time.unit", "country", "GSA", "Gear")]), collapse=" "),
sum = sum(sub$Quantity))[c("combination", "sum")])
)
df_list <- Filter(NROW, df_list)
df_list
# [[1]]
# combination sum
# 1 a year ITA GSA 17 gear 1 439
# [[2]]
# combination sum
# 6 b month HVR GSA 1 gear 2 78
# [[3]]
# combination sum
# 7 b month ITA GSA 1 gear 2 96
# [[4]]
# combination sum
# 9 c year ESP GSA 12 gear 2 325
# [[5]]
# combination sum
# 8 b year ITA GSA 12 gear 2 100
# [[6]]
# combination sum
# 5 b month HVR GSA 17 gear 2 45
答案 2 :(得分:0)
我们可以使用tidyverse
library(tidyverse)
split.test.input %>%
group_by_at(vars(names(.)[c(1, 3:6)])) %>%
summarise(Quantity = sum(parse_number(Quantity)))
# A tibble: 6 x 6
# Groups: species, Time.unit, country, GSA [?]
# species Time.unit country GSA Gear Quantity
# <fct> <fct> <fct> <fct> <fct> <dbl>
#1 a year ITA GSA 17 gear 1 439
#2 b month HVR GSA 1 gear 2 78
#3 b month HVR GSA 17 gear 2 45
#4 b month ITA GSA 1 gear 2 96
#5 b year ITA GSA 12 gear 2 100
#6 c year ESP GSA 12 gear 2 325