Question

我有一个包含许多变量的数据框，其中两个是＆＃34;年＆＃34;和＃34;国家＆＃34;。我希望在某些条件下对某些列的行进行求和，条件是行是针对特定年份的国家/地区，而NA是针对未指定的国家/地区。例如：

A B C  year  country total
1 1 1  2000   IT      3
2 2 2  2001   IT      6
3 3 3  2001   DE      9
4 4 4  2002   DK      NA
5 5 5  2000   FR      NA
6 6 6  2001   DE      18

在Stata中，这看起来像：

egen variable = rowtotal (A B C) if ///
country_year=="36_04" | country_year=="37_04" | country_year=="96_04" | ///
country_year=="97_04" | country_year=="83_04" | country_year=="83_09" | ///
country_year=="87_09" | country_year=="87_04"

Answer 1

使用dplyr，条件重现问题中的输出，并假设数据框名为df1，没有名为total的现有列：

library(dplyr)
df1 %>%
  filter(year < 2002, country %in% c("IT", "DE")) %>%
  group_by(year, country) %>%
  rowwise() %>%
  mutate(total = sum(A, B, C)) %>%
  right_join(df1)

结果：

     A     B     C  year country total
 <int> <int> <int> <int>   <chr> <int>
     1     1     1  2000      IT     3
     2     2     2  2001      IT     6
     3     3     3  2001      DE     9
     4     4     4  2002      DK    NA
     5     5     5  2000      FR    NA
     6     6     6  2001      DE    18

Answer 2

这是一个使用data.table的选项，指定了＆＃39; i＆＃39;在逻辑条件下，我们将+中指定的列的相应元素求和（.SDcols）并将输出分配（:=）到＆＃39; total＆＃39;

library(data.table)
setDT(df1)[year < 2002 & country %chin% c("IT", "DE"),
        total := Reduce(`+`, .SD),  .SDcols = A:C]
df1
#   A B C year country total
#1: 1 1 1 2000      IT     3
#2: 2 2 2 2001      IT     6
#3: 3 3 3 2001      DE     9
#4: 4 4 4 2002      DK    NA
#5: 5 5 5 2000      FR    NA
#6: 6 6 6 2001      DE    18

R中多个条件下的行总和

2 个答案: