Question

在Python中，我有一串用逗号分隔的名称，并且我试图在名称周围添加双方括号。

这是原始字符串的格式。
1. John Smith1, John Smith2, John Smith3, etc. 

我想结束的是：
1. [[John Smith1]], [[John Smith2]], [[John Smith3]], [[etc.]] 

我尝试使用此正则表达式：
(.+?)(?:, |( )$)

已替换：
[[\1]], \2

但是它给出了以下结果：
[[1. John Smith1]], [[John Smith2]], [[John Smith3]], [[etc.,]] 

如何将首字母“ \ d。\ s”移到名字捕获之外？
如何防止姓氏（在这种情况下，以此类推而不是等）后面的最后一个逗号？

任何建议将不胜感激。

更新
抱歉，我没有更具体。当我说要匹配时，我应该说“仅”匹配该模式。

当我使用此正则表达式：(?<=\.\s|,\s)([^,\r\n]+)\s*(?= |,)和此替换[[\1]]时，它做了两项意外的事情。
1.尽管它可以在regex101.com中运行，但是当我在Notepad ++中查看输出时，所有名称都更改为SOH，并且在Notepad中它们变成了非打印字符。
2.它过于激进，因此它更改了多个实例的每个实例，这些实例用逗号分隔。因此输出如下：
1. John Smith1, John Smith2, John Smith3, John Smith4 
This is the reason why John Smith1, John Smith2, John Smith3, and John Smith4 did what they did. 

在记事本++中看起来像这样：

1. [[SOH]], [[SOH]], [[SOH]], [[SOH]] 
This is the reason why John Smith1, [[SOH]], [[SOH]], and John Smith4 did what they did. 

我将尝试其他建议，看是否有任何工作。

再次感谢。

最新更新 我解决了非打印问题。我忘了用“ r”转义正则表达式中的替换字符串。现在，如果我可以让正则表达式在第一个 处停止，我应该得到所需的东西。仍在搜索...

另一件事：将有更多带编号的行，用逗号分隔的名称和描述以及字符串中的换行符。所以

1. FirstName1 LastName1, FirstName2 LastName2, FirstName3 LastName3<br>  
Description with FirstName1 LastName1, FirstName2 LastName2, FirstName3 LastName3<br>

2. FirstName3 LastName3, FirstName4 LastName4<br>  
Description with FirstName3 and FirstName4 LastName4.<br>

3. FirstName3 LastName3, FirstName6 LastName6<br>  
Description with FirstName3 and FirstName6.<br>

仍然只想更改以数字/句号/空格开头并以换行符结尾的行。

1. [[FirstName1 LastName1]], [[FirstName2 LastName2]], [[FirstName3 LastName3]]<br>  
Description with FirstName1 LastName1, FirstName2 LastName2, FirstName3 LastName3<br>  

2. [[FirstName3 LastName3]], [[FirstName4 LastName4]]<br>  
Description with FirstName3 and FirstName4 LastName4.<br>  

3. [[FirstName3 LastName3]], [[FirstName6 LastName6]]<br>  
Description with FirstName3 and FirstName6.<br>

与单词“描述”不匹配。它仅用作示例。

Answer 1

也许有些表达类似于

(?<=\.\s|,\s)([^,\r\n]+)\s*(?=<br>|,)

并替换为

[[\1]]

也可以选择。

测试

import re

regex = r"(?<=\.\s|,\s)([^,\r\n]+)\s*(?=<br>|,)"
test_str = ("1. John Smith1, John Smith2, John Smith3, etc.<br>\n"
    "12. John Smith1, John Smith2, John Smith3, etc.<br>")
subst = "[[\\1]]"

print(re.sub(regex, subst, test_str))

输出

1. [[John Smith1]], [[John Smith2]], [[John Smith3]], [[etc.]]<br>
12. [[John Smith1]], [[John Smith2]], [[John Smith3]], [[etc.]]<br>

如果您希望简化/修改/探索表达式，请在regex101.com的右上角进行说明。如果愿意，您还可以在this link中查看它如何与某些示例输入匹配。

Answer 2

您可以这样做

import re

st = "1. John Smith1, John Smith2, John Smith3, etc.<br>"

re.findall(r"(?:\d\. )?(.*?)(?:, |<br>)", st)

Answer 3

和往常一样，有几种方法可以做到这一点，但是仅用正则表达式替换就可能不够。这是我的两个选择：

正则表达式+字符串操作

扩展原始正则表达式，可以使用它来更好地捕获并跳过第一个数字/点/空格组：

import re
st = '1. John Smith1, John Smith2, John Smith3, etc.<br>'
re1 = r"(\d\.\s)*(.+?)(?:, |(<br>)$)"
new_st = re.sub(re1, r"\1[[\2]], \3", st)
print(new_st)

为我们提供的值为：

new_st = '1. [[John Smith1]], [[John Smith2]], [[John Smith3]], [[etc.]], <br>'

请注意最后一个逗号在末尾。我们可以使用以下方法删除该对象：

new_st = ''.join(new_st.rsplit(", ", 1))

这给了我们

'1. [[John Smith1]], [[John Smith2]], [[John Smith3]], [[etc.]]<br>'

总的来说，您会得到：

import re
st = '1. John Smith1, John Smith2, John Smith3, etc.<br>'
re1 = r"(\d\.\s)*(.+?)(?:, |(<br>)$)"
new_st = re.sub(re1, r"\1[[\2]], \3", st)  # notice I do capture the first group
new_st = ''.join(new_st.rsplit(", ", 1))

提取核心，然后使用split / join

这也使用了正则表达式，但仅用于提取字符串的 core 。然后结合使用join / split以获得所需的结果：

import re
st = '1. John Smith1, John Smith2, John Smith3, etc.<br>'
re2 = r"(\d+\.\s+)(.+)(<br>)$"
sections = re.findall(re3, st)

# just to make it clearer i'll split the sections
the_number, the_core, the_end = sections[0]

# rework the core
the_core = ']], [['.join(the_core.split(','))

# glue all the pieces together adding what's missing
new_st = the_number + '[[' + the_core + ']]' + the_end

其结果为：

'1. [[John Smith1]], [[ John Smith2]], [[ John Smith3]], [[ etc.]]<br>'

Answer 4

您可以尝试这样的事情

(^\d\.\s*)?(\s*)(?:([^,]+)(?=, |<br>$))

替换为

\1\2[[\3]]

Regex Demo

如果,之后的空格并不总是存在，则应使用(?=.\s*| )代替正向前进

如何在用逗号分隔的名称列表中替换逗号

4 个答案:

测试

输出

正则表达式+字符串操作

提取核心，然后使用split / join