Question

我的data.frame包含2列和数千行随机字符串：

Column1                      Column2
"this is done in 1 hour"     "in 1 hour"

我想得到一个像这样的新data.frame列：

Column3
"this is done"

所以基本上根据Column2匹配字符串并获取Column1的剩余部分。如何处理？

编辑：

这不会解决问题，因为字符串的长度不同，所以我不能这样做：

substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}

substrRight(x, 3)

所以我需要grepl匹配的东西。

Answer 1

您可以使用正则表达式执行此操作：

data <- data.frame(Column1 = "this is done in 1 hour", Column2 = "in 1 hour")
data$Column3 <- gsub(data$Column2, '', data$Column1) # Replace fist parameter by second in third.

编辑：对于超过1行，您可以使用mapply：

data <- data.frame(Column1 = c("this is done in 1 hour", "this is a test"), Column2 = c("in 1 hour", "a test"))
data$Column3 <- mapply(gsub, data$Column2, '', data$Column1)

Answer 2

以下是您可以执行此操作的示例：

# example data frame
testdata <- data.frame(colA=c("this is","a test"),colB=c("is","a"),stringsAsFactors=FALSE)

# adding the new column
newcol <- sapply(seq_len(nrow(testdata)),function(x) gsub(testdata[x,"colB"],"",testdata[x,"colA"],fixed=TRUE))
new.testdata <- transform(testdata,colC=newcol)

# result
new.testdata
#      colA | colB  | colC
# --------------------------
# 1 this is |   is  | th 
# 2  a test |    a  |   test

编辑：gsub(str1,'',str2,fixed=TRUE)删除str1内str2的所有匹配项，而使用sub只会删除第一次出现。由于str1通常被解释为正则表达式，因此设置fixed=TRUE非常重要。否则如果str1包含.\+?*{}[]等字符，则会发生混乱。要解决此评论，以下内容将仅替换str1中str2的最后一次出现，从而导致所需的输出：

revColA <- lapply(testdata[["colA"]],function(x) paste0(substring(x,nchar(x):1,nchar(x):1)))
revColA <- lapply(revColA,paste,collapse='')
revColB <- lapply(testdata[["colB"]],function(x) paste0(substring(x,nchar(x):1,nchar(x):1)))
revColB <- lapply(revColB,paste,collapse='')

revNewCol <- sapply(seq_len(nrow(testdata)),function(x) sub(revColB[x],"",revColA[x],fixed=TRUE))
newcol <- lapply(revNewCol,function(x) paste0(substring(x,nchar(x):1,nchar(x):1)))
newcol <- sapply(newcol,paste,collapse='')

new.testdata <- transform(testdata,colC=newcol)

### output ###
#        colA   colB   colC
------------------------
# 1  |this is |   is | this 
# 2  | a test |   a  |   test

在R中查找字符串匹配

2 个答案: