我正在R中建立一个新项目,并想从文本中提取特定符号
X <- c("amazing tiny phone ^_^","so cute!!! <3")
我想从R中的^_^
中提取<3
和X
谢谢!
答案 0 :(得分:0)
更直接
X = c("amazing tiny phone ^_^","so cute!!! <3","^_^ and :) are my fav symbols")
patt=c("=d" ,"<3" , ":o" , ":(" ,
":)" , "(y)" , ":*" , "^_^", ":d" ,";)" , ":'(")
variable = sapply(X,function(x){
i = which(patt%in%strsplit(x," ")[[1]])
if (length(i)>0){
paste(patt[i],collapse=" ")
} else{NA}
})
names(variable)=NULL
> variable
[1] "^_^" "<3" ":) ^_^" NA
答案 1 :(得分:0)
@GraemeForst可以使用分组和超前来实现概括:
group <- "[\\^\\_\\<\\>3\\:\\(\\)\\;]"
pat <- sprintf(".*[\\s\\b](%s+)(?!\\1)", group)
group
定义字符分组。基本上所有我们要提取的符号。pat
定义了我们的匹配模式。 [\\s\\b]
说,在可能的匹配之前,必须有一个空白或边界。 (?!\\1)
说,在比赛之后,不能有group
的元素。这是一个演示:
X <- c("amazing tiny phone ^_^","so cute!!! <3", "I like pizza :)", "hello beautiful ;)")
gsub(pat, "\\1", grep(pat, X, value = TRUE, perl = TRUE), perl = TRUE)
# [1] "^_^" "<3" ":)" ";)"
这可以进一步完善和概括。可以添加的一个非常简单的步骤是扩展grouping
。
旧答案
您可以为此使用正则表达式:
# create the pattern to be extracted
pat = ".*(\\^\\_\\^).*|.*(\\<3).*" # escape special characters with "\\" and ".*" to specify there may be text before/after
# extract
gsub(pat, "\\1\\2", grep(pat, X, value = TRUE, perl = TRUE), perl = TRUE)
# [1] "^_^" "<3"