r-替换特定列中的字符串

时间:2018-09-01 13:13:17

标签: r

我有大约200万行和45列的数据集。我想替换此数据集中一个特定列中的值列表。

我已经尝试过gsub,但是事实证明它花费的时间过长。我需要执行16次替换。

举例说明我所做的事情:

setwd("C:/RStudio")
dat2 <- read.csv("2016 new.csv", stringsAsFactors=FALSE)
dat3 <- read.csv("2017 new.csv", stringsAsFactors=FALSE)
dat4 <- read.csv("2018 new.csv", stringsAsFactors=FALSE)
myfulldata <- rbind(dat2, dat3)
myfulldata <- rbind(myfulldata, dat4)
myfulldata <- myfulldata[, -c(1,5,10,11,12,13,15,20,21,22,41,42,43,44,48,50,51,52,59,61,62,64,65,66,67,68,69,70,71,72)]
gc()
myfulldata[is.na(myfulldata)] <- ""
gc()
myfulldata <- gsub("Text Being Replaced","CS1",myfulldata, fixed=TRUE)

我绑定了几个文件,然后删除了不需要的列。最重要的是我开始替换字符串部分。我只想替换一个特定列中的案例。考虑到这一点,我是否可以使用gsub以外的其他东西或效果最好的东西,以便只替换第36列中名为Waypoint的案例?

非常感谢, 欧根

1 个答案:

答案 0 :(得分:0)

去钓鱼的答案:

set.seed(123)

# data simulation
n = 10 #2e6
m = 45 #45
myfulldata <- as.data.frame(matrix(paste0("Text", 1:(n * m)), ncol = m), stringsAsFactors = FALSE)
names(myfulldata)[36] <- "Waypoint"
myfulldata$Waypoint[sample(seq.int(nrow(myfulldata)), 5)] <- "Text Being Replaced"
myfulldata$Waypoint
# [1] "Text351" "Text352" "CS1"     "CS1"     "Text355" "CS1"     "CS1"     "CS1"     
# "Text359" "Text360"   

# data replacement
myfulldata$Waypoint <- gsub("Text Being Replaced", "CS1", myfulldata$Waypoint, fixed = TRUE)
myfulldata

输出:

       V33     V34     V35 Waypoint     V37     V38
1  Text321 Text331 Text341  Text351 Text361 Text371
2  Text322 Text332 Text342  Text352 Text362 Text372
3  Text323 Text333 Text343      CS1 Text363 Text373
4  Text324 Text334 Text344      CS1 Text364 Text374
5  Text325 Text335 Text345  Text355 Text365 Text375
6  Text326 Text336 Text346      CS1 Text366 Text376
7  Text327 Text337 Text347      CS1 Text367 Text377
8  Text328 Text338 Text348      CS1 Text368 Text378
9  Text329 Text339 Text349  Text359 Text369 Text379
10 Text330 Text340 Text350  Text360 Text370 Text380