Question

我有2列的数据集，我想通过使用gsub来清理我的数据集，例如

Data_edited_txt2 <- gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", Data_edited_txt2$text)
Data_edited_txt2 <- gsub("@\\w+", " ", Data_edited_txt2$text)
Data_edited_txt2 <- gsub("[[:punct:]]", "", Data_edited_txt2$text)

我会在第二次运行gsub时遇到错误：“$ operator对原子矢量无效”，我注意到第二列在运行第一个gsub后会消失。

请告知如何执行所有gsub，但保留第二列？

structure(list(text = structure(c(1L, 3L, 7L, 4L, 2L, 5L, 6L), .Label = c("@airasia im searching job", 
"@AirAsia no flight warning for cebu outbound?", "@shazzr1 @AirAsia never mind.. now everyone can fly.", 
"@TigerAir confirmed as having far nastier policies and uncaring customer service than @airasia who I will now fly every time in preference.", 
"@Wingmates Since your taxes is HIGHER than other airlines but your service is really BAD because always change and cancel the flight.", 
"hai MASwings @Wingmates . Bilakah tempoh promosi anda? Saya ingin terbang ke Palawan dengan bajet yang agak rendah :3", 
"One thing I \"like\" about @AirAsia is, DELAY."), class = "factor"), 
created = structure(c(3L, 2L, 1L, 7L, 6L, 4L, 5L), .Label = c("2/2/2014 11:30", 
"2/2/2014 11:32", "2/2/2014 12:18", "24/2/2014 4:03", "29/3/2014 8:21", 
"30/1/2014 16:02", "31/1/2014 8:13"), class = "factor")), .Names = c("text", 
"created"), class = "data.frame", row.names = c(NA, -7L))

Answer 1

您覆盖整个数据框而不是仅覆盖一列。试试这个：

Data_edited_txt2$text <- gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", Data_edited_txt2$text)
Data_edited_txt2$text <- gsub("@\\w+", " ", Data_edited_txt2$text)
Data_edited_txt2$text <- gsub("[[:punct:]]", "", Data_edited_txt2$text)

在具有2列的数据帧中执行gsub

1 个答案: