Question

我有以下示例数据集

>print(data)

ID   Plant   Infloresence   displaySize   visitationRate
 1     1           1             4             0.25
 2     1           2             4             0.25
 3     1           3             4             0.25
 4     1           4             4             0.25
 5     2           1             2             1.00
 6     2           2             2             1.00
 7     3           1             1             2.00
 8     4           1             5             0.80
 9     4           2             5             0.80
10     4           3             5             0.80
11     4           4             5             0.80
12     4           5             5             0.80
13     5           1             3             0.33
14     5           2             3             0.33
15     5           3             3             0.33

我要留下很多不需要的信息，但基本上这个数据集中包含的是给定的植物（Plant = 1,2,3,4,5），花的数量植物（Infloresence = 1-4,1-2,1,1-5,1-3）和访视率（通过将昆虫访问次数（未显示）除以花的数量计算）（displaySize）。我还在一张单独的表格中显示了花粉昆虫平均携带的数量，我们将这些平均值与显示的数据表结合起来计算每个昆虫访客的花粉量（以100％为单位）在植物之间移动。

当我们这样做时，我们通常关心物种作为一个整体并一次使用所有300-700行数据。然而，我想要做的是计算每株植物（所以ID 1-4,5-6,7,8-12和13-15）。我有代码可以做到，但我不知道如何循环它，以便它运行它为植物1，植物2，植物3，植物4等。

我不知道这是否是足够的信息，如果我需要，我可以尝试更清楚。下面是我的代码 - 它已经过多次测试，不需要手动计算，而且效果很好。

visitData = read.csv("caVisitation.csv", header = TRUE)     #Type the name of your formatted visitation data file between the ""
loadData = read.csv("caLoad.csv", header = TRUE)     #Type the name of your formatted pollen load data file between the ""
    pollinatorNumbers = table(unlist(visitData[, grep('Visitor', names(visitData))]))
    zeros = sum(apply(visitData[9], 2, function(x) length(which(x == 0.00000000))))
    counts = as.matrix(c(pollinatorNumbers, zeros))
    totalCounts = sum(counts[,1])
    average = mean(visitData[,9]) #This 7 indicates that the Visits/Infl/20min column is the seventh colum from the left of the page.
    percentVisits = (counts/totalCounts) 
    rate = percentVisits[-length(percentVisits),]/average
ploadData = as.data.frame(loadData, stringsAsFactors=FALSE, na.rm = FALSE)
ploadData$Load = as.numeric(ploadData$load)
pollenLoads  = (aggregate(load~pollinator, ploadData, FUN=mean, na.action=NULL))
pollenFlow = as.matrix((rate*pollenLoads[,2]))
    loadTotal = sum(pollenFlow[,1], na.rm = TRUE)
    percentpollenFlow = ((pollenFlow/loadTotal)*100)
        colnames(percentpollenFlow) <- c("Percent Pollen Flow")
percentpollenFlow #Returns the result in % out of 100

我想提一下，我已经尝试过这项工作，但我对循环知识的应用是垃圾。

Answer 1

您可以使用data.tables包。

 cols <- c("col_1", "col_2")
 some_data[, cols := lapply(.SD, function(x) {
           # your function code
      }, .SDcols = c("col_3", "col_4"), by = plant]

但如果您发布数据以及预期输出会有所帮助。 by子句将执行循环。

请注意，在将对象分配给变量名时，我会使用<-而不是=。

循环数据集中某些变量的代码块

1 个答案: