Question

我一直在使用python进行网页抓取。一切都像涂油的装备一样，直到我用它来获得产品的描述，这实际上是一个laaaarge描述。

所以，它根本不起作用......比如我的正则表达式是不正确的。可悲的是，我不能告诉你我正在抓哪个网站来向你展示真实的例子，但实际上我知道正则表达式确实没问题......就像这样：

descriptionRegex = 'id="this_id">(.*)</div>\s*<div\ id="another_id"'

for found in re.findall(descriptionRegex, response) :
   print found

这笔交易是（。*）就像是25000多个字符

re.findall（）查找的字符数有限制吗？有什么办法可以实现这个目标吗？

Answer 1

您需要在致电re.DOTALL时指定.findall()。

如果您运行此程序，它将按您的要求运行：

import re
response = '''id="this_id">
blah
</div> <div id="another_id"'''

descriptionRegex = r'id="this_id">(.*)</div>\s*<div\ id="another_id"'

for found in re.findall(descriptionRegex, response, re.DOTALL ) :
   print found

重新模块的最大容量？蟒蛇

1 个答案: