Question

我有一个看起来像的文件：

16262|John, Doe|John|Doe|JD|etc...

我需要找到并替换案例：

16262|John, Doe, Dae|John|Doe Dae|JD|etc...

通过

16262|John, Doe Dae|John|Doe Dae|JD|etc...

总之，我想在第二个字段中更改第一个字母后面的逗号（可能不止一个以后）。

有什么建议吗？

Answer 1

使用gnu sed：

BRE语法：

sed 's/\(\(^\||\)[^|,]*,\) \?\|, \?/\1 /g;'

ERE语法：

sed -r 's/((^|\|)[^|,]*,) ?|, ?/\1 /g;'

细节：

(          # group 1: all the begining of an item until the first comma
    (      # group 2:
        ^  # start of the line
      |    # OR
        \| # delimiter
    )
    [^|,]* # start of the item until | or ,
    ,      # the first comma
)          # close the capture group 1
[ ]?       # optional space
|        # OR  
,          # an other comma
[ ]?

当第一个分支成功时，第一个逗号在组1中以项目的所有开头被捕获，因为替换字符串包含对捕获组1（\ 1）的引用，因此第一个逗号保持不变。

当第二个分支成功时，未定义组1，并且替换字符串中的引用\ 1是空字符串。这就是删除其他逗号的原因。

Answer 2

这在很大程度上取决于语言。如果您有后视，可以使用正则表达式(?<=,.*),执行此操作。如果您没有，例如在JavaScript中，如果您可以反转字符串，您仍然可以使用前瞻：

String.prototype.reverse = function () {
    return this.split("").reverse().join("");
};
"a, b, c, d".reverse().replace(/,(?=.*,)/g, '').reverse()
// yields "a, b c d"

我不认为还有其他功能非常类似于正则表达式的外观，可以轻松模拟它们。可能您可以使用更强大的语言来捕获第一个逗号的索引，替换所有逗号，然后重新插入第一个逗号。

正则表达式在第一个之后删除逗号

2 个答案: