Question

我想在data.frame中增加一个新列，以指示对于每行，数字“ 2”是否出现在某些其他列中。这是适用于小型data.frame的简单版本：

df <- data.frame(mycol.1 = 1:5,  mycol.2= 5:1, other.col = -2:2)
df$mycols.contain.two <- df$mycol.1 ==2 | df$mycol.2 ==2
df

  mycol.1 mycol.2 other.col mycols.contain.two
1       1       5        -2              FALSE
2       2       4        -1               TRUE
3       3       3         0              FALSE
4       4       2         1               TRUE
5       5       1         2              FALSE

现在假设data.frame有50列，我希望新列指示每行中是否以“ mycol”开头的任何列都包含“ 2”，而不必使用“ |”符号49次。我假设使用starts_with()有一个优雅的dplyr答案，但我不知道语法。

Answer 1

您可以这样做：

df <- data.frame(mycol.1 = 1:5,  mycol.2= 5:1, other.col = -2:2)
df$TYPE <- ifelse(rowSums(ifelse(sapply(df, function (x){x == 2}), 1, 0)) > 0 , "TRUE", "FALSE")

# > df
# mycol.1 mycol.2 other.col  TYPE
# 1       1       5        -2 FALSE
# 2       2       4        -1  TRUE
# 3       3       3         0 FALSE
# 4       4       2         1  TRUE
# 5       5       1         2  TRUE

Answer 2

您可以通过建立索引来实现。让我们获取mtcars数据。

head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

之后，我们可以索引任何列。假设我们在第8至11栏中很有趣

 mtcars$new <- rowSums(mtcars[,8:11]==2)>0

给予

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb   new
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 FALSE
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 FALSE
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 FALSE
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 FALSE
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 TRUE
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 FALSE
>

Answer 3

您可以使用一个简单的apply()循环：

df <- data.frame(mycol.1 = 1:5,  mycol.2= 5:1, other.col = -2:2)
df$mycols.contain.two <- apply(df, 1, function(x){any(x == 2)})

或者如果您只想检查前三列：

df <- data.frame(mycol.1 = 1:5,  mycol.2= 5:1, other.col = -2:2)
df$mycols.contain.two <- apply(df, 1, function(x){any(x[1:3] == 2)})

如何创建一个新列，指示某些其他列是否包含给定值？

3 个答案: