Question

假设我有一些这样的文字，

text<-c("[McCain]: We need tax policies that respect the wage earners and job creators. [Obama]: It's harder to save. It's harder to retire. [McCain]: The biggest problem with American healthcare system is that it costs too much. [Obama]: We will have a healthcare system, not a disease-care system. We have the chance to solve problems that we've been talking about... [Text on screen]: Senators McCain and Obama are talking about your healthcare and financial security. We need more than talk. [Obama]: ...year after year after year after year. [Announcer]: Call and make sure their talk turns into real solutions. AARP is responsible for the content of this advertising.")

我想删除（编辑：删除）[和]（以及括号本身）之间的所有文本。最好的方法是什么？这是我使用正则表达式和stingr包的微弱尝试：

str_extract(text, "\\[[a-z]*\\]")

感谢您的帮助！

Answer 1

有了这个：

gsub("\\[[^\\]]*\\]", "", subject, perl=TRUE);

正则表达式意味着什么：

  \[                       # '['
  [^\]]*                   # any character except: '\]' (0 or more
                           # times (matching the most amount possible))
  \]                       # ']'

Answer 2

以下应该可以解决问题。 ?强制执行惰性匹配，在随后的.之前匹配尽可能少]。

gsub('\\[.*?\\]', '', text)

Answer 3

这是另一种方法：

library(qdap)
bracketX(text, "square")

Answer 4

不需要使用具有否定字符类/括号表达的PCRE正则表达式，＆＃34; classic＆＃34; TRE正则表达式也会起作用：

subject <- "Some [string] here and [there]"
gsub("\\[[^][]*]", "", subject)
## => [1] "Some  here and "

请参阅online R demo

<强>详情：

\\[ - 文字[（必须在[[]之类的括号表达式中进行转义或使用，才能解析为文字[）
[^][]* - 一个否定括号表达式，匹配[和]以外的0 +字符（请注意，括号表达式开头的]被视为文字]）
] - 文字]（此字符在PCRE和TRE regexp中并不特殊，不必转义）。

如果您只想用其他分隔符替换方括号，请在替换模式中使用带有backreference的捕获组：

gsub("\\[([^][]*)\\]", "{\\1}", subject)
## => [1] "Some {string} here and {there}"

请参阅another demo

(...)括号构造形成一个捕获组，其内容可以使用反向引用\1进行访问（因为该组是模式中的第一个，其ID设置为1）。

Answer 5

我认为这在技术上可以回答您的要求，但是您可能想在正则表达式的末尾添加\\:以获得更漂亮的文本（删除冒号和空格）。

library(stringr)
str_replace_all(text, "\\[.+?\\]", "")

#> [1] ": We need tax policies that respect the wage earners..."

vs ...

str_replace_all(text, "\\[.+?\\]\\: ", "")
#> [1] "We need tax policies that respect the wage earners..."

由reprex package（v0.2.0）于2018-08-16创建。

删除两个括号之间的所有文本

5 个答案: