Question

我的数据集具有以下结构：

id amount zipcode cat1 cat1_times cat2 cat2_times
1  1000   1001      0       0        1      7
2  2000   1001      0       0        1      7
3  2300   1002      1       6        1      5
4  1500   1002      1       6        1      5
5  2700   1003      1       3        1      5
6  3400   1003      1       3        1      5

Cat1是一个二进制变量，如果某个邮政编码中包含类别1，则该变量的值为1。 Cat1_times是某个邮政编码中类别1的建筑物数。我想计算每行的建筑物总数（cat1 + cat2）：

id amount zipcode cat1 cat1_times cat2 cat2_times total_times
1  1000   1001      0       0        1      7          7
2  2000   1001      0       0        1      7          7
3  2300   1002      1       6        1      5          11          
4  1500   1002      1       6        1      5          11
5  2700   1003      1       3        1      5          8
6  3400   1003      1       3        1      5          8

我尝试使用sum（cat1_times，cat2_times），但每行都得到相同的结果。

Answer 1

将stringr的{{1}}与str_detect一起使用

rowSums

Answer 2

或者：

library(dplyr)

df1 %>% select(matches("times")) %>% transmute(total_times=rowSums(.)) %>% bind_cols(df1,.)

#  id amount zipcode cat1 cat1_times cat2 cat2_times total_times
#1  1   1000    1001    0          0    1          7           7
#2  2   2000    1001    0          0    1          7           7
#3  3   2300    1002    1          6    1          5          11
#4  4   1500    1002    1          6    1          5          11
#5  5   2700    1003    1          3    1          5           8
#6  6   3400    1003    1          3    1          5           8

Answer 3

或者，如果您有很多列

numberOfCategories=2
rowSums(df[,paste0('cat',1:numberOfCategories,'_times')])

Answer 4

使用base R

df1$total_times <- Reduce(`+`, df1[grep('cat\\d+_times', names(df1))])
df1$total_times
#[1]  7  7 11 11  8  8

总和多个变量

4 个答案: