如何动态计算r中嵌套列表的每个元素的总和?

时间:2018-11-22 16:11:12

标签: r list dataframe for-loop split

我有这个数据框:

split.test.input <- data.frame(matrix(ncol=7,nrow=10,
                        c(rep("a",4),rep("b",4),rep("c",2),1910:1913,1902:1905,1925:1926,
                          rep("year",4),rep("month",3),rep("year",3),
                        rep("ITA",4),rep("HVR",2),rep("ITA",2),rep("ESP",2),
                      rep("GSA 17",5),rep("GSA 1",2),rep("GSA 12",3),
                      rep("gear 1",4),rep("gear 2",6),75,45,230,89,45,78,96,100,125,200)))

colnames(split.test.input) <-  c("species", "year", "Time.unit","country","GSA","Gear","Quantity")

我拆分了许多变量:

split.res <- dlply(split.test.input,.(species),
      dlply,.(Time.unit),
      dlply,.(country),
      dlply,.(GSA),
      dlply,.(Gear))

现在,我想为列表中每个元素的每个数量计算一些统计分析(在这种情况下为总和),例如,我提取第一个列表(列表等的列表等):< / p>

df.fromSplit <- data.frame(split.res[["a"]][["year"]][["ITA"]][["GSA 17"]][["gear 1"]][["Quantity"]])     


colnames(df.fromSplit) <-  "a,year,ITA,GSA 17,gear.1" #the name of my variables for the first list
     df.fromSplit
           a,year,ITA,GSA 17,gear.1
        1                    75
        2                    45
        3                    230
        4                    89

我想为此列计算sum

sum(as.numeric(levels(df.fromSplit[,1])[df.fromSplit[,1]] ))     
   439

但这并不优雅...

重要

  

我想动态计算每个数量的总和   我列表中的每个元素。结果可能是(或多或少)数据   帧或多个数据帧(每个列表一个),如下所示:

    combination             sum
a,year,ITA,GSA 17,gear.1    439
b,month,HVR,GSA.1,gear.2    78
[...]
and so on for each combination of list

我认为一个for循环可以提取列表的每个元素,并计算每个列表的数量之和,但是对于for循环,我不知道如何基于变量提取每个列表(我列表的体验非常低)

3 个答案:

答案 0 :(得分:1)

实际上很难想象有一个需要split.res这样的复杂对象的目的。您的要求可以简单得多。

首先,让我们将Quantity转换为数字类型(当前是一个因素)。

split.test.input$Quantity <- as.numeric(as.character(split.test.input$Quantity))

然后简单

tapply(split.test.input$Quantity, apply(split.test.input[c(1, 3:6)], 1, paste0, collapse = ", "), sum)
#  a, year, ITA, GSA 17, gear 1  b, month, HVR, GSA 1, gear 2 
#                           439                            78 
# b, month, HVR, GSA 17, gear 2  b, month, ITA, GSA 1, gear 2 
#                            45                            96 
#  b, year, ITA, GSA 12, gear 2  c, year, ESP, GSA 12, gear 2 
#                           100                           325 

(groups <- apply(split.test.input[c(1, 3:6)], 1, paste0, collapse = ", "))
#  [1] "a, year, ITA, GSA 17, gear 1"  "a, year, ITA, GSA 17, gear 1" 
#  [3] "a, year, ITA, GSA 17, gear 1"  "a, year, ITA, GSA 17, gear 1" 
#  [5] "b, month, HVR, GSA 17, gear 2" "b, month, HVR, GSA 1, gear 2" 
#  [7] "b, month, ITA, GSA 1, gear 2"  "b, year, ITA, GSA 12, gear 2" 
#  [9] "c, year, ESP, GSA 12, gear 2"  "c, year, ESP, GSA 12, gear 2" 
tapply(split.test.input$Quantity, groups, sum)

另外,由于您已经在使用dlply,因此您可能会对类似

感兴趣
ddply(split.test.input, .(species, Time.unit, country, GSA, Gear), summarise, Sum = sum(Quantity))
  species Time.unit country    GSA   Gear Sum
# 1       a      year     ITA GSA 17 gear 1 439
# 2       b     month     HVR  GSA 1 gear 2  78
# 3       b     month     HVR GSA 17 gear 2  45
# 4       b     month     ITA  GSA 1 gear 2  96
# 5       b      year     ITA GSA 12 gear 2 100
# 6       c      year     ESP GSA 12 gear 2 325

答案 1 :(得分:1)

考虑汇总多个列:

split.test.input$Quantity <- as.numeric(as.character(split.test.input$Quantity))

agg_df <- aggregate(Quantity ~ species + Time.unit + country + GSA + Gear,
                    data=split.test.input, FUN=sum)

agg_df
#   species Time.unit country    GSA   Gear Quantity
# 1       a      year     ITA GSA 17 gear 1      439
# 2       b     month     HVR  GSA 1 gear 2       78
# 3       b     month     ITA  GSA 1 gear 2       96
# 4       c      year     ESP GSA 12 gear 2      325
# 5       b      year     ITA GSA 12 gear 2      100
# 6       b     month     HVR GSA 17 gear 2       45

如果需要列表,请对{em>组合列使用by运行tapply(面向对象的包装器paste(..., collapse="")):

df_list <- by(split.test.input, split.test.input[c("species", "Time.unit", "country", "GSA", "Gear")],
              function(sub) unique(transform(sub,
                                             combination = paste(unique(sub[c("species", "Time.unit", "country", "GSA", "Gear")]), collapse=" "),
                                             sum = sum(sub$Quantity))[c("combination", "sum")])
)
df_list <- Filter(NROW, df_list)
df_list

# [[1]]
#                combination sum
# 1 a year ITA GSA 17 gear 1 439

# [[2]]
#                combination sum
# 6 b month HVR GSA 1 gear 2  78

# [[3]]
#                combination sum
# 7 b month ITA GSA 1 gear 2  96

# [[4]]
#                combination sum
# 9 c year ESP GSA 12 gear 2 325

# [[5]]
#                combination sum
# 8 b year ITA GSA 12 gear 2 100

# [[6]]
#                 combination sum
# 5 b month HVR GSA 17 gear 2  45

答案 2 :(得分:0)

我们可以使用tidyverse

library(tidyverse)
split.test.input %>%
    group_by_at(vars(names(.)[c(1, 3:6)])) %>% 
    summarise(Quantity = sum(parse_number(Quantity)))
# A tibble: 6 x 6
# Groups:   species, Time.unit, country, GSA [?]
#  species Time.unit country GSA    Gear   Quantity
#  <fct>   <fct>     <fct>   <fct>  <fct>     <dbl>
#1 a       year      ITA     GSA 17 gear 1      439
#2 b       month     HVR     GSA 1  gear 2       78
#3 b       month     HVR     GSA 17 gear 2       45
#4 b       month     ITA     GSA 1  gear 2       96
#5 b       year      ITA     GSA 12 gear 2      100
#6 c       year      ESP     GSA 12 gear 2      325