Question

我有一个文本，其中数字以各种可能的方式出现。例如，

text = "hello23 the2e are 13 5.12apples *specially_x00123 named 31st"

我要用'＃'替换所有数字，但特殊模式中的数字以*，单词，下划线，任何字符和数字开头，例如* \ w + _ [az] \ d +（即， * specially_x00123）。

我尝试使用lookaround语法和non-capturing group，但是找不到一种将其准确更改为以下内容的方法

text_cleaned = "hello## the#e are ## #.##apples *specially_x00123 named ##st"

我可以使用如下模式：

p1 = r'\d(?<!\*\w+_\w+)'

然后，它抱怨像这样； “向后看需要固定宽度的样式”

我尝试使用非捕获组：

p2 = r'(?:\*[a-z]+_\w+)\b|\d'

它取出特殊令牌（* specially_x000123）和所有数字。我认为这是我可能会包含在解决方案中的内容，但是我找不到如何做。有什么想法吗？

Answer 1

您可能要做的是在捕获组(\d)中捕获数字，并在替换检查中为第一个捕获组使用回调。

如果是第1组，则用#替换，否则返回匹配项。

由于\w+也与下划线匹配，因此您可以使用反义字符类[^\W_\n]+

匹配字符char，但下划线除外。

\*[^\W_\n]+_[a-z]\d+\b|(\d)

Regex demo | Python demo

import re
text = "hello23 the2e are 13 5.12apples *specially_x00123 named 31st"
pattern = r"\*[^\W_\n]+_[a-z]\d+\b|(\d)"
print (re.sub(pattern, lambda x: "#" if x.group(1) else x.group(), text))

结果

hello## the#e are ## #.##apples *specially_x00123 named ##st

Answer 2

一个选择可能是，我们将字符串拆分为star之前，然后为star之后。表达式(\d)捕获星号之前的每个数字，我们可以简单地使用#替换它们，然后将其与$2结合起来：

(\d)|(\*.*)

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(\d)|(\*.*)"

test_str = ("hello23 the2e are 13 5.12apples *specially_x00123 named\n\n"
    "hello## the#e are ## #.##apples *specially_x00123 named")

subst = "#\\2"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

regex101.com

const regex = /(\d)|(\*.*)/gm;
const str = `hello23 the2e are 13 5.12apples *specially_x00123 named`;
const subst = `#$2`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

正则表达式，用于捕获和替换字符串中的所有数字（特殊模式除外）

2 个答案:

测试

regex101.com