Question

我想知道是否有办法更好地做到这一点？我希望将找到的每个对象转换为字符串，而不是查找整个列表，然后转换列表中的每个项目：

aList = regexObj.findall(s.text) if regexObj.findall(s.text) else None

self._menuUrls = map( lambda x: str( 'https://....' + x + '?otherparams=...' ), aList )

我是否可以使用预先制作的方法在一次传递中执行此操作，或者这是否需要我创建一个单独的方法/ lambda？我可以更有效地处理这个问题吗？

编辑：我使用包含500k匹配实例的文件对几个方法进行了自己的研究，发现使用re.findall（）的列表理解比使用re.finditer（）转换对象的列表理解快40-50％当你搜索一个项目时。

menuUrls = []

start = time.time()

regex = re.compile("javascript:iframeLink\('([^']+)'\);")

#My Original Solution = 0.78200006485
menuUrls = map( lambda x: str('http://...' + x + '?param=...'), regex.findall(str(lines)))

#My Revised Solution = 0.619000196457
menuUrls = [ str('http://...' + x + '?param=...') for x in regex.findall(str(lines)) ]

#Friend's Proposal = 0.802000045776
for m in regex.finditer(str(lines)):
    menuUrls.append(str('http://...' + m.group(1) + '?param=...'))

#Stack Proposal = 0.912000179291
menuUrls = [ str('http://...' + x.group(0) + '?param=...') for x in regex.finditer(str(lines)) ]

set(menuUrls)

print time.time() - start

Answer 1

您正在寻找regex_iter = regexObj.finditer(s.text) self._menuUrls = ['https://....' + x.group(0) + '?otherparams=...' for x in regex_iter]。类似的东西：

map

这是边缘的，但通常情况下，列表理解速度会比带有lambda的{{1}}更快（事实上，map与任何其他非内置函数相比）。

演示：

>>> import re
>>> text = "1 234 6 889 33 5 777 dff hd ae 2  ggre 777 fdf"
>>> pattern = re.compile(r"\d+")
>>> nums = ['<'+ m.group(0) + '>' for m in pattern.finditer(text)]
>>> nums
['<1>', '<234>', '<6>', '<889>', '<33>', '<5>', '<777>', '<2>', '<777>']
>>>

Answer 2

menuUrls = []

start = time.time()

regex = re.compile("javascript:iframeLink\('([^']+)'\);")

#My Original Solution = 0.78200006485
menuUrls = map( lambda x: str('http://...' + x + '?param=...'), regex.findall(str(lines)))

#My Revised Solution = 0.619000196457
menuUrls = [ str('http://...' + x + '?param=...') for x in regex.findall(str(lines)) ]

#Friend's Proposal = 0.802000045776
for m in regex.finditer(str(lines)):
    menuUrls.append(str('http://...' + m.group(1) + '?param=...'))

#Stack Proposal = 0.912000179291
menuUrls = [ str('http://...' + x.group(0) + '?param=...') for x in regex.finditer(str(lines)) ]

set(menuUrls)

print time.time() - start

regex.findall（）的列表理解被测试为建议解决方案中最快的搜索和转换功能

在re.findall（）上运行

2 个答案: