假设我有一些这样的文字,
text<-c("[McCain]: We need tax policies that respect the wage earners and job creators. [Obama]: It's harder to save. It's harder to retire. [McCain]: The biggest problem with American healthcare system is that it costs too much. [Obama]: We will have a healthcare system, not a disease-care system. We have the chance to solve problems that we've been talking about... [Text on screen]: Senators McCain and Obama are talking about your healthcare and financial security. We need more than talk. [Obama]: ...year after year after year after year. [Announcer]: Call and make sure their talk turns into real solutions. AARP is responsible for the content of this advertising.")
我想删除(编辑:删除)[和](以及括号本身)之间的所有文本。最好的方法是什么?这是我使用正则表达式和stingr包的微弱尝试:
str_extract(text, "\\[[a-z]*\\]")
感谢您的帮助!
答案 0 :(得分:21)
有了这个:
gsub("\\[[^\\]]*\\]", "", subject, perl=TRUE);
正则表达式意味着什么:
\[ # '['
[^\]]* # any character except: '\]' (0 or more
# times (matching the most amount possible))
\] # ']'
答案 1 :(得分:9)
以下应该可以解决问题。 ?
强制执行惰性匹配,在随后的.
之前匹配尽可能少]
。
gsub('\\[.*?\\]', '', text)
答案 2 :(得分:3)
这是另一种方法:
library(qdap)
bracketX(text, "square")
答案 3 :(得分:3)
不需要使用具有否定字符类/括号表达的PCRE正则表达式,&#34; classic&#34; TRE正则表达式也会起作用:
subject <- "Some [string] here and [there]"
gsub("\\[[^][]*]", "", subject)
## => [1] "Some here and "
<强>详情:
\\[
- 文字[
(必须在[[]
之类的括号表达式中进行转义或使用,才能解析为文字[
)[^][]*
- 一个否定括号表达式,匹配[
和]
以外的0 +字符(请注意,括号表达式开头的]
被视为文字]
)]
- 文字]
(此字符在PCRE和TRE regexp中并不特殊,不必转义)。如果您只想用其他分隔符替换方括号,请在替换模式中使用带有backreference的捕获组:
gsub("\\[([^][]*)\\]", "{\\1}", subject)
## => [1] "Some {string} here and {there}"
请参阅another demo
(...)
括号构造形成一个捕获组,其内容可以使用反向引用\1
进行访问(因为该组是模式中的第一个,其ID设置为1)。
答案 4 :(得分:2)
我认为这在技术上可以回答您的要求,但是您可能想在正则表达式的末尾添加\\:
以获得更漂亮的文本(删除冒号和空格)。
library(stringr)
str_replace_all(text, "\\[.+?\\]", "")
#> [1] ": We need tax policies that respect the wage earners..."
vs ...
str_replace_all(text, "\\[.+?\\]\\: ", "")
#> [1] "We need tax policies that respect the wage earners..."
由reprex package(v0.2.0)于2018-08-16创建。