Question

我试图使用正则表达式从字符串中删除某些数据。说我有一个字符串：＆＃34;姓名（birthyear）[数据]＆＃34; 我想要的结果：＆＃34;名为birthyear＆＃34;

我现在拥有的：

data = data.replaceAll("((?s)(<|\\[).*?(>|\\]))","");

给出了结果：＆＃34;姓名（birthyear）＆＃34;

我需要添加到此正则表达式中以删除＆＃39;（＆＃39;和＆＃39;）＆＃39;？

我只想为此使用一个正则表达式，因为该方法将用于替换大量数据（+ -20m行）

Answer 1

不需要使用正则表达式：

// Assuming ( and ) are present in the string, in that order.
int openingBracket = data.indexOf('(');
int closingBracket = data.indexOf(')', openingBracket);
data = new StringBuilder(closingBracket - 1)
  // The bit up to (but not including) the (
  .append(data, 0, openingBracket)
  // The bit after the (, up to the ).
  .append(data, openingBracket + 1, closingBracket)
  .toString();

使用这样的基本字符串操作几乎总是比使用正则表达式更快：在内部，正则表达式引擎也必须使用这些操作来操作字符串。因此，基于正则表达式的实现只能是＆＃34;同样复杂的＆＃34;比上面的。

（An informal benchmark显示我的方法比Kent的答案大约快10倍。

正则表达式的强大之处来自于您可以用来表示所搜索模式的简洁性，而不是速度。

但是这种简洁可以是一种诅咒也是一种祝福：很容易构建一个正则表达式，让你不知道它是如何工作的。如上所述，使用更详细的代码可能会有所帮助，因为您可以更轻松地调试它：您可以在每一行上停止并查看子表达式如何评估。

最终，它是一种平衡：有时正则表达式是正确的工具，有时它们不是。你应该明白这些替代方案，并权衡它们对你的特定应用的相对优点。

Answer 2

String data = "Name (birthyear) [data] ";
System.out.println(data.replaceAll("([^(]+)[(]([^)]+)[)].*","$1$2"));

打印：

Name birthyear

更新

我们从输入字符串中取两组：$1$2
group1：从开始到第一个open-bracket char（不包括），即：Name+space
group2：在开括号之后，我们取第一个字符，直到最后一个非关闭括号字符birthyear，我们跳过所有其他字符。

Answer 3

尝试使用：

查找：$1
替换：$ : open parenthesis ([^)]+) : group 1, not a close parenthesis $ : a close parenthesis \s* : 0 or more spaces [<\\[] : < or [ .+? : 1 or more character non greedy [>\\]] : > or ]

<强>解释

for row in reader:
    for thing in row[2].split(';'):
        writer.writerow(row[:2]+[thing])

正则表达式从字符串中删除字符

3 个答案:

更新