Question

我使用BeautifulSoup4（使用lxml解析器）来解析看起来像这样的xml：

<?xml version="1.0" encoding="UTF-8" ?>
<data>
<metadata id="8735180"  name="Dauphin Island" lat="30.2500" lon="-88.0750"/>
<observations>
<wl t="2013-12-14 00:00"  v="0.725" s="0.059" f="0,0,0,0" q="v" />
<wl t="2013-12-14 00:06"  v="0.771" s="0.066" f="0,0,0,0" q="v" />
<wl t="2013-12-14 00:12"  v="0.764" s="0.085" f="0,0,0,0" q="v" />

....等

python代码是这样的：

obs_soup = BeautifulSoup(urllib2.urlopen('http://tidesandcurrents.noaa.gov/api/datagetter?product=water_level&application=NOS.COOPS.TAC.WL&begin_date=20131214&end_date=20131216&datum=MSL&station=8735180&time_zone=GMT&units=english&interval=&format=xml),'lxml')

for l in obs_soup.findall('wl'):
    obs.append(l['v'])

我一直收到错误：

for l in obs_soup.findall('wl'):
TypeError: 'NoneType' object is not callable

我尝试了解决方案here（除了寻找＆＃39; html＆＃39;，我寻找＆＃39;数据＆＃39;），但这没有用。有什么建议吗？

Answer 1

这里有两个问题。

首先，findall中没有BeautifulSoup这样的方法。将其更改为：

for l in obs_soup.find_all('wl'):
    obs.append(l['v'])

......它会起作用。

那么，为什么你得到这个TypeError: 'NoneType' object is not callable而不是更常见的AttributeError？由于BeautifulSoup的神奇查找 - 同样可以让obs_soup.wl作为查找<wl>的快捷方式，也可以obs_soup.findall作为查找<findall>的快捷方式。由于没有<findall>个节点，因此返回None。然后你试图将None对象称为函数，这当然是无意义的。

此外，如果您实际上已根据声明复制并粘贴了here的副本，则您不会遇到此问题。该代码使用findAll，大写字母为“A”，这是find_all的弃用同义词。（当然，您不应该使用已弃用的同义词。）

其次，您明确要求使用lxml的HTML解析器而不是其XML解析器。不要那样做。见the docs：

BeautifulSoup(markup, ["lxml", "xml"])

由于您没有提供完整的XML文档，我不知道这是否会影响您，或者您是否会碰巧幸运。但是，当你真正做正确的事情时，你不应该依靠发生的事情来获得幸运。

BeautifulSoup4无法识别xml标签

1 个答案: