Question

这应该是非常简单的，但我正在努力让它发挥作用。我目前有一个列名称向量：

columns <- c('product1', 'product2', 'product3', 'support4')

我现在想在for循环中使用dplyr来改变某些列，但是我很难让它认识到它是一个列名，而不是一个变量。

for (col in columns) {
  cross.sell.val <- cross.sell.val %>%
    dplyr::mutate(col = ifelse(col == 6, 6, col)) %>%
    dplyr::mutate(col = ifelse(col == 5, 6, col))
}

在这些情况下我可以使用％＆gt;％吗？感谢..

Answer 1

您应该可以在不使用for循环的情况下执行此操作。

因为您没有提供任何数据，所以我将使用内置iris数据集。它的顶部看起来像：

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

首先，我保存要分析的列：

columns <- names(iris)[1:4]

然后，对每列使用mutate_at以及该特定规则。在每个中，.表示每列的向量。您的示例意味着每列的规则相同，但如果不是这样，您可能需要更多的灵活性。

mod_iris <-
  iris %>%
  mutate_at(columns, funs(ifelse(. > 5, 6, .))) %>%
  mutate_at(columns, funs(ifelse(. < 1, 1, .)))

返回：

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          6.0         3.5          1.4           1  setosa
2          4.9         3.0          1.4           1  setosa
3          4.7         3.2          1.3           1  setosa
4          4.6         3.1          1.5           1  setosa
5          5.0         3.6          1.4           1  setosa
6          6.0         3.9          1.7           1  setosa

如果您愿意，可以编写一个函数来对列进行所有更改。这也可以让您为每列设置不同的截止值。例如，您可能希望将数据的底部和顶部设置为等于该阈值（由于某种原因在异常值中进行控制），或者您可能知道每个变量都使用虚拟值作为占位符（并且该值因列而异，但始终是最常见的值）。您可以通过这种方式轻松添加任意感兴趣的规则，并且它比将单独的规则链接在一起（例如，如果您使用均值，更改某些值时的平均值更改）提供了更多的灵活性。

示例功能：

modColumns <- function(x){
  botThresh <- quantile(x, 0.25)
  topThresh <- quantile(x, 0.75)

  dummyVal <- as.numeric(names(sort(table(x)))[1])
  dummyReplace <- NA

  x <- ifelse(x < botThresh, botThresh, x)
  x <- ifelse(x > topThresh, topThresh, x)
  x <- ifelse(x == dummyVal, dummyReplace, x)

  return(x)
}

并在使用中：

iris %>%
  mutate_at(columns, modColumns) %>%
  head

返回：

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.3          1.6         0.3  setosa
2          5.1         3.0          1.6         0.3  setosa
3          5.1         3.2          1.6         0.3  setosa
4          5.1         3.1          1.6         0.3  setosa
5          5.1         3.3          1.6         0.3  setosa
6          5.4         3.3          1.7         0.4  setosa

在dplyr中使用vector in for循环中的列名

1 个答案: