根据另一列的值添加一列

时间:2018-06-07 14:50:36

标签: r loops

我有一个如下所示的数据框。

df <- data.frame(mnth = c("jan", "feb", "feb", "mar", "mar",
                          "mar", "apr", "apr", "apr", "apr", 
                          "may", "may", "may", "may", "may"),
                 n = c(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5),
                 value = c(5, 1, 3, 2, 8, 0, 6, 0, 2, 7, 2, 1, 4, 2, 6))

我想在value字段中为n字段的每个值添加相应的数字。

在这种情况下,答案应该是:
16,12,6,9,6

16 = 5 + 1 + 2 + 6 + 2  # all rows where 'n' = 1
12 = 3 + 8 + 0 + 1      # all rows where 'n' = 2
6  = 0 + 2 + 4          # all rows where 'n' = 3
9  = 7 + 2              # all rows where 'n' = 4
6                       # all rows where 'n' = 5

如何编写for循环以在R中添加数字?

3 个答案:

答案 0 :(得分:0)

我不会为此使用for循环,因为这可以通过单行(也许是稍微神秘的但仍然是)来完成:

df$N <- c(16, 12, 6, 9, 6)[df$mth]

在此之前,您需要谨慎地重新订购mth因素:

df$mth <- factor(df$mth, levels=c("jan", "feb", "mar", "apr", "may"))

结果:

> df
   mth n value N
1  jan 1     5 16
2  feb 1     1 12
3  feb 2     3 12
4  mar 1     2  6
5  mar 2     8  6
6  mar 3     0  6
7  apr 1     6  9
8  apr 2     0  9
9  apr 3     2  9
10 apr 4     7  9
11 may 1     2  6
12 may 2     1  6
13 may 3     4  6
14 may 4     2  6
15 may 5     6  6

for循环的等价物可以是:

for (i in 1:nrow(df)) {
  df$N[i] <- switch(as.character(df$mth[i]), 
                    "apr" = 9,
                    "feb" = 12,
                    "jan" = 16,
                    "mar" = 6,
                    "may" = 6)
}

答案 1 :(得分:0)

我同意Sab使用data.table。我想你的预期输出可能有一个拼写错误,所以我在下面的例子中包含了几个不同的选项:

library(data.table)

df <- data.frame(mnth = c("jan", "feb", "feb", "mar", "mar",
                      "mar", "apr", "apr", "apr", "apr", 
                      "may", "may", "may", "may", "may"),
             n = c(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5),
             value = c(5, 1, 3, 2, 8, 0, 6, 0, 2, 7, 2, 1, 4, 2, 6))

setDT(df)  # converts data.frame to data.table

df[,.(sum_n = sum(n),  # adds up the 'n' column
  sum_value = sum(value),  # adds up the 'value' column
  count_row = .N), by=mnth]  # counts the number of rows for each value of 'mnth'

这会给您以下结果:

   mnth sum_n sum_value count_row
1:  jan     1         5         1
2:  feb     3         4         2
3:  mar     6        10         3
4:  apr    10        15         4
5:  may    15        15         5

修改

在海报澄清之后,这是工作代码:

df[,.(sum_value = sum(value)), by = .(n)]

这给出了以下结果:

> df[,.(sum_value = sum(value)), by = .(n)]
   n sum_value
1: 1        16
2: 2        12
3: 3         6
4: 4         9
5: 5         6

答案 2 :(得分:0)

以下是使用data.tablemerge的解决方案 - 非常简单:

library(data.table)
dt1 <- as.data.table(df)

dt2 <- dt2 <- data.table(mnth = c('jan', 'feb', 'mar', 'apr', 'may'), 
                         N = c(16, 12, 6, 9, 6))

> merge(dt, dt2, by = 'mnth', all = T, fill = T)
    mnth n value  N
 1:  apr 1     6  9
 2:  apr 2     0  9
 3:  apr 3     2  9
 4:  apr 4     7  9
 5:  feb 1     1 12
 6:  feb 2     3 12
 7:  jan 1     5 16
 8:  mar 1     2  6
 9:  mar 2     8  6
10:  mar 3     0  6
11:  may 1     2  6
12:  may 2     1  6
13:  may 3     4  6
14:  may 4     2  6
15:  may 5     6  6

如果您只想要观察计数和列总和,可以在by中使用data.table参数:

> dt[, .(nsum = sum(n), valsum = sum(value), obs = .N), by = mnth]
   mnth nsum valsum obs
1:  jan    1      5   1
2:  feb    3      4   2
3:  mar    6     10   3
4:  apr   10     15   4
5:  may   15     15   5