R通过部分匹配其他变量名称

时间:2017-03-28 23:27:36

标签: r string

我有多个变量名称需要根据常见的文本字符串组合成单个变量。我的样本数据是:

structure(list(And = c(10L, NA, 10L), and = c(20L, 10L, 10L), 
andbc = c(1L, NA, NA), baNdc = c(4L, NA, 5L), ban = c(1L, 
NA, 1L)), .Names = c("And", "and", "andbc", "baNdc", "ban"), class = "data.frame", row.names = c(NA, -3L))

我想创建一个新变量x,其值将是共享公共文本字符串"和#34;的其他变量的值的行和。忽略该字符串中任何字母的大小写。

我尝试通过指定排列来创建变量,我希望避免这种变换:

names1[, 1:5][is.na(names1[, 1:5])] <- 0
names1$x <- sum(names1[which(grepl("And|and|aNd", names(names1)))])

我得到的x值的结果是符合文本字符串标准的变量的所有值的总和:

structure(list(And = c(10, 0, 10), and = c(20L, 10L, 10L), andbc = c(1, 0, 0), baNdc = c(4, 0, 5), ban = c(1, 0, 1), x = c(70, 70, 70)), .Names = c("And", "and", "andbc", "baNdc", "ban", "x"), row.names = c(NA, -3L), class ="data.frame"

如何根据文本字符串条件获取行总和,并避免必须指定大写或小写的排列?

2 个答案:

答案 0 :(得分:2)

以下是诀窍

df <- structure(list(And = c(10L, NA, 10L), and = c(20L, 10L, 10L), 
           andbc = c(1L, NA, NA), baNdc = c(4L, NA, 5L), ban = c(1L, 
                                                                 NA, 1L)), .Names = c("And", "and", "andbc", "baNdc", "ban"), class = "data.frame", row.names = c(NA, -3L))

x <- rowSums(df[, grep("and", tolower(colnames(df)))], na.rm = TRUE)

答案 1 :(得分:1)

colnames(names1) <- tolower(colnames(names1))

将摆脱你对排列的需求

names1$x <- rowSums(names1[which(grepl('and', colnames(names1)))], na.rm = TRUE)