我有大约200万行和45列的数据集。我想替换此数据集中一个特定列中的值列表。
我已经尝试过gsub,但是事实证明它花费的时间过长。我需要执行16次替换。
举例说明我所做的事情:
setwd("C:/RStudio")
dat2 <- read.csv("2016 new.csv", stringsAsFactors=FALSE)
dat3 <- read.csv("2017 new.csv", stringsAsFactors=FALSE)
dat4 <- read.csv("2018 new.csv", stringsAsFactors=FALSE)
myfulldata <- rbind(dat2, dat3)
myfulldata <- rbind(myfulldata, dat4)
myfulldata <- myfulldata[, -c(1,5,10,11,12,13,15,20,21,22,41,42,43,44,48,50,51,52,59,61,62,64,65,66,67,68,69,70,71,72)]
gc()
myfulldata[is.na(myfulldata)] <- ""
gc()
myfulldata <- gsub("Text Being Replaced","CS1",myfulldata, fixed=TRUE)
我绑定了几个文件,然后删除了不需要的列。最重要的是我开始替换字符串部分。我只想替换一个特定列中的案例。考虑到这一点,我是否可以使用gsub以外的其他东西或效果最好的东西,以便只替换第36列中名为Waypoint的案例?
非常感谢, 欧根
答案 0 :(得分:0)
去钓鱼的答案:
set.seed(123)
# data simulation
n = 10 #2e6
m = 45 #45
myfulldata <- as.data.frame(matrix(paste0("Text", 1:(n * m)), ncol = m), stringsAsFactors = FALSE)
names(myfulldata)[36] <- "Waypoint"
myfulldata$Waypoint[sample(seq.int(nrow(myfulldata)), 5)] <- "Text Being Replaced"
myfulldata$Waypoint
# [1] "Text351" "Text352" "CS1" "CS1" "Text355" "CS1" "CS1" "CS1"
# "Text359" "Text360"
# data replacement
myfulldata$Waypoint <- gsub("Text Being Replaced", "CS1", myfulldata$Waypoint, fixed = TRUE)
myfulldata
V33 V34 V35 Waypoint V37 V38
1 Text321 Text331 Text341 Text351 Text361 Text371
2 Text322 Text332 Text342 Text352 Text362 Text372
3 Text323 Text333 Text343 CS1 Text363 Text373
4 Text324 Text334 Text344 CS1 Text364 Text374
5 Text325 Text335 Text345 Text355 Text365 Text375
6 Text326 Text336 Text346 CS1 Text366 Text376
7 Text327 Text337 Text347 CS1 Text367 Text377
8 Text328 Text338 Text348 CS1 Text368 Text378
9 Text329 Text339 Text349 Text359 Text369 Text379
10 Text330 Text340 Text350 Text360 Text370 Text380