我正在尝试从R中字符串的开头删除主题标签。 例如:
x<- "I didn't know it could be #boring. guess I need some fun #movie #lateNightThoughts"
我想删除字符串末尾的#lateNightThoughts和#movie主题标签。结果:
- "I didn't know it could be #boring. guess I need some fun"
我尝试过:
stringi::stri_replace_last_regex(x,'#\\S+',"")
但是它只删除最后一个标签。
- "I didn't know it could be #boring. guess I need some fun #movie "
您知道如何获得预期的结果吗?
编辑:
如何从文本开头删除主题标签? 例如:
x<- "#Thomas20 I didn't know it could be #boring. guess I need some fun #movie #lateNightThoughts"
答案 0 :(得分:2)
您可以使用
> x<- "I didn't know it could be #boring. guess I need some fun #movie #lateNightThoughts"
> sub("\\s*\\B#\\w+(?:\\s*#\\w+)*\\s*$", "", x)
[1] "I didn't know it could be #boring. guess I need some fun"
或者,如果您不关心要从其开始进行匹配的第一个#
的上下文,则甚至可以使用
sub("(?:\\s*#\\w+)+\\s*$", "", x)
请参见regex demo。
详细信息
\s*
-零个或多个空格\B
-在当前位置之前,可以有字符串的开头或非单词char(通常用于确保您在“单词”中不匹配#
,因此如果不需要,可以删除此非单词边界)#
-一个#
字符\w+
-1个或多个单词字符(字母,数字或_
)(?:\s*#\w+)*
-零次或多次出现:
\s*
-零个或多个空格#
-一个#
字符\w+
-1个以上的字符字符\s*
-零个或多个空格$
-字符串的结尾。