Question

我想根据另一个变量的条件更改多个变量的值

类似的东西：

df <- iris
df$index <- row.names(df)

if(df$index == 10){
  df$Species <- "test";
  df$Sepal.Length <- 100
}

因此，如果索引值为10，那么我想将Species更改为“ test”，并将sepal.length更改为100。

相反，我收到警告：

Warning message:
In if (df$index == 10) { :
  the condition has length > 1 and only the first element will be used

变量保持不变。

Answer 1

当前，您所有的表达式在相等性==或赋值运算符<-的两侧保持不同的长度。具体来说：

在这里，if(df$index == 10)将向量 df $ index 的所有值与一个值10进行比较，该值仅返回 TRUE作为第十个元素：[FALSE, FALSE, FALSE, ..., TRUE, FALSE, FALSE, FALSE ...]。通过print(df$index == 10)进行检查。

因此，警告仅使用第一个值：FALSE。随后，由于if返回FALSE，因此没有更新任何值。
此处，df$Species <- "test"用一个值"test"覆盖 df $ Species 的所有值（即所有行）。但这会被忽略，因为if返回FALSE。
此处，df$Sepal.Length <- 100用一个值100覆盖 df $ Sepal.Length 的所有值（即所有行）。但这会被忽略，因为if返回FALSE。

同样，您的意思是按索引更新单行值，而无需使用任何 if逻辑或row.names中的新列即可处理索引，只需简单地对向量进行索引，然后重新执行-相应地分配单个值：

df$Species[10] <- "test"
df$Sepal.Length[10] <- 100

Answer 2

您正在使用的if语句看起来像可以在for循环中工作。 df$index == 10返回一个向量，因此错误表明if语句将仅以该向量的第一个元素开头。下面的解决方案应该起作用。 subset是过滤器为真的数据，然后处理该数据帧。然后删除此数据，并将操纵的subset附加到数据框的底部。这样可以确保更改后所有观察值都保留在数据集中，但不能保证观察值保持相同顺序。

library(tidyverse)
df <- iris
df$index <- row.names(df)


subset <- df[df$index == 10, ]
subset$Species <- "test"
subset$Sepal.Length <- 100

df <- df[df$index != 10, ] %>%
  rbind(subset)

Answer 3

我认为这个答案对您来说可能会更灵活。它使用tidyverse，您可以在此处了解更多信息：https://r4ds.had.co.nz/introduction.html

library(tidyverse)
# specify condition if you want to use multiple times
y <- df$index == 10

df <- df %>% # this is a pipe. It plugs df into the next function, which is mutate
  # mutate modifies variables in the df
 mutate(
   Species = 
 # case when can handle many conditions, though we just have one here
     case_when(
       y ~ "test",
    # TRUE means if the condition is not met (or something like that, and we just return the original value)
       TRUE ~ as.character(Species)),
 # we treat each variable separately
   Sepal.Length = 
     case_when(
       y ~ 100,
       TRUE ~ as.double(Sepal.Length))
 )

在IF语句中执行多项操作

3 个答案: