Question

我写了以下代码来提取所有没有。来自网页＆amp;添加所有这些。但我想编码它而不使用正则表达式，所以，请指导我如何做到这一点。链接：http://python-data.dr-chuck.net/comments_361585.html

我的代码：

import urllib
import re
from BeautifulSoup import *

html = urllib.urlopen('http://python-data.dr-chuck.net/comments_361585.html ').read()

soup = BeautifulSoup(html)

# Retrieve all of the anchor tags
tags = soup('td')

total = 0
for tag in tags:
# Look at the parts of a tag
      line = str(tag)
      x = re.findall('[0-9]+',line)
      if len(x) > 0:
           for item in x:
                total += int(item)

print(total)

没有使用正则表达式我尝试了这个：

import urllib
from BeautifulSoup import *

url = raw_input('Enter - ')
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)


tags = soup.find_all('span', text=True)

for tag in tags:
        number=tag.get('class', None)
        total = sum( int(tag.text) for tag in tags )

print ('total')

但它有一些错误：＆＃39; NoneType＆＃39;对象不可调用。

指导我如何解决它。

Answer 1

您根本不必使用正则表达式，只能使用bs4轻松完成此操作您可以搜索“span”：

，而不是获取所有'td'并使用正则表达式过滤它们的值

tags = soup.find_all('span', text=True)

然后你可以总结结果：

total = sum( int(tag.text) for tag in tags )

如何编写没有正则表达式的代码

1 个答案: