Question

我在R中有一个数据帧（数据），有数千行和10列。其中9列包含多个级别的因子。

这是数据框的一小部分。

一个gr1

10 303.90

11 304.1

12 303.6

13 303.90 obs

14 303.90k

作为一个例子，一个因素的水平是＆＃34; 303.90＆＃34;另一个级别是＆＃34; 303.90 obs＆＃34;。我想改变＆＃34; 303.90 obs＆＃34;至＆＃34; 303.90＆＃34;。我使用以下命令编辑级别的名称。

data[] = as.data.frame(lapply(data, function(x) {x = gsub("303.90 obs","303.90", fixed = T, x)}))

但这并没有改变水平＆＃34; 303.90 obs＆＃34;至＆＃34; 303.90＆＃34;。它保持不变。此命令仍适用于其他字符串，例如。＆＃34; 303.9＆＃34;变为＆＃34; 303.90＆＃34; 当我使用时：

data[] = as.data.frame(lapply(data, function(x) {x = gsub("303.9 obs","303.90", fixed = T, x)}))

为什么会出现这种情况的任何建议？

Answer 1

我对lapply并不熟悉，因此我的解决方案只是循环遍历数据帧的列。这样可以正常工作。

col1 <- 1:10
col2 <- 21:30
col3 <- c("503.90", "303.90 obs", "803.90sfsdf sf", "203.90 obs", "303.90", "103.90 obs", "303.90", "403.90 obs", "803.90sfsdf sf", "303.90 obs")
col4 <- c("303.90", "303.90 obs", "303.90", "203.90 obs", "303.90", "107.40fghfg", "303.90", "303.90 obs", "303.90", "303.90 obs")

data <- data.frame(col1, col2, col3, col4)

data$col3 <- as.factor(data$col3)
data$col4 <- as.factor(data$col4)

for(i in 3:4) {
  matchedExpression = regexpr(pattern = "\\d+\\.\\d+", text = data[,i])
  data[,i] = regmatches(x = data[,i], m = matchedExpression)
  data[,i] <- as.factor(data[,i])
}

修改

OP改变了描述。要将所有因素更改为303.90正则表达式，这是一个更好的解决方案。但是，需要从OP获得更多信息以提供一般解决方案，例如：只有303.90应该更改吗？

<强> EDIT2

更新了脚本，因为OP提供了更多信息，例如列可以具有与303.90不同的因子。

在R中的列上使用gsub

1 个答案: