Question

我无法在链接的HTML文件中添加数字（将它们相加）。

我目前收到此错误：

Line 26 b=sum(y)  typeError unsupported operand types for +: int and str

这是我的代码

import urllib
from BeautifulSoup import *
import re

counter = 0
added = 0


url = "http://python-data.dr-chuck.net/comments_42.html"
html = urllib.urlopen(url).read()

soup = BeautifulSoup(html)

# Retrieve all of the span tags
spans = soup('span')

for comments in spans:
    print comments
    counter +=1
    #y = re.findall('(\d+)', comments)  -- didnt work 
    #print y
    #added += y
y = re.findall('(\d+)', str(soup))
print y
b = sum(y)
print b

print "Count", counter
print "Sum", added

我想要的输出类似于：

Count: 50
Sum: 2482

正如你可以看到我注释掉我的代码的地方 - 我试图最初添加它们。不知道为什么这不起作用。

#y = re.findall('(\d+)', comments)  -- didnt work 
    #print y
    #added += y

我也不确定为什么将找到的号码放在列表中

y = re.findall('(\d+)', str(soup))

Answer 1

您正在尝试对字符串求和。在汇总之前将字符串转换为整数，正如Pynchia所说，然后打印b as the Sum。

...
b = sum(map(int, y))
...
print "Count", counter
print "Sum", b

如果您想更正评论部分，请使用：

...
y = re.findall('(\d+)', str(comments))
print y
added = sum(map(int, y))

Answer 2

引自Python Documentation：

re.findall(pattern, string, flags=0)

返回所有不重叠的内容   字符串中的模式匹配，作为字符串列表。字符串是   从左到右扫描，并按找到的顺序返回匹配。

如果   模式中存在一个或多个组，返回列表   组;如果模式有多个，这将是一个元组列表   组。结果中包含空匹配，除非他们触摸了   另一场比赛的开始。

这个表达式：

y = re.findall('(\d+)', str(soup))将返回与您的模式(\d+)匹配的所有字符串列表，该字符串是数字字符串。所以你有一个字符串列表。

然后，

b = sum(y)，会尝试使用某些字符串而不是整数，这就是您收到错误消息的原因。

尝试改为：

b = sum(map(int, y))，这会将y中的每个字符串数字转换为整数，然后将它们全部加起来。

<强>样本：

>>> s = 'Today is 31st, December, Temperature is 18 degC'
>>> y = re.findall('(\d+)', s)
['31', '18']
>>> b = sum(map(int, y))
>>> b
49

无法将HTML文件中提取的整数添加到一起

2 个答案: