Question

第一个问题的帖子。请原谅可能存在的任何格式问题。

我要做的是有条件地替换数据帧列中的因子级别。原因是右单引号（U + 2019）和撇号（U + 0027）之间的unicode差异。

所有需要替换的列都以“INN8”开头，所以我正在使用

grep("INN8", colnames(demoDf)) -> apostropheFixIndices
for(i in apostropheFixIndices) {
    levels(demoDfFinal[i]) <- c(levels(demoDf[i]), "I definitely wouldn't")
    (insert code here)
}

获取索引以执行条件替换。

我已经看过无数涉及动态命名变量的问题：naming variables on the fly

以及how to assign values to dynamic variables

并且已经探索了R-FAQ on turning a string into a variable，并研究了Ari Friedman的建议，即列表中的命名元素是首选。但是我不确定最佳实践建议的执行情况和重要性。

我知道我需要按照

的方式做点什么

demoDf$INN8xx[demoDf$INN8xx=="I definitely wouldn’t"] <- "I definitely wouldn't"]

但到目前为止我尝试过的迭代都没有用。

感谢您的时间！

Answer 1

如果我理解正确，那么您不想重命名列。那么这可能会奏效：

demoDf <- data.frame(A=rep("I definitely wouldn’t",10) , B=rep("I definitely wouldn’t",10))
newDf  <- apply(demoDf, 2, function(col) { 
  gsub(pattern="’", replacement = "'", x = col) 
})

它只是检查所有列是否有错误的符号。

或者，如果您有一个包含要检查的列索引的向量，那么您可以使用

# Let's say you identified columns 2, 5 and 8
cols <- c(2,5,8)
sapply(cols, function(col) { 
  demoDf[,col] <<- gsub(pattern="’", replacement = "'", x = demoDf[,col])
})

动态调用dataframe列＆amp; R中的条件替换

1 个答案: