Question

我有以下代码（Python 2.7，bs4），它可以工作：

html = urllib2.urlopen("https://www.zidisha.org/microfinance/loan/youmpi/1434.html").read())
soup=BeautifulSoup(html)
tag = soup.find(text=re.compile("On-Time Repayments:")).find_parent("td").find_next_sibling("td")

print type(tag)
for child in tag.children:
print repr(child)

#Output:
<class 'bs4.element.Tag'>
u'\n'
u'modified by Julia to add number of months repayments were due 15-10-2013'
u'\n\n80% (10)\n\n'

我想获得标签中的第三个元素 - “80％（10）” - （从unicode中剥离和转换没问题），但是当我尝试定义时： myVar = tag.children [2]，我收到以下错误： 'listiterator对象没有属性' getitem '

我通过使用以下方式进行了修复： myVar = tag.next_element.next_element.next_element.strip（），但感觉我的IDE正在判断我。

我认为我可以通过列表理解来迭代，我可以使用索引来获取特定元素，但显然不是。在没有链接.next_element方法的情况下获得标记的第三个（或理论上的第20个）元素的最佳方法是什么？

Answer 1

如果您想以列表的形式访问子项，请使用.contents代替.children。

请参阅.contents and .children。

bs4 tag.children [2]给出'listiterator对象没有属性'getitem'

1 个答案:

bs4 tag.children [2]给出'listiterator对象没有属性'__getitem__'

1 个答案:

bs4 tag.children [2]给出'listiterator对象没有属性'getitem'