我收到了这个错误:
Traceback (most recent call last):
File "scr.py", line 18, in <module>
for title in link.find('a'):
TypeError: 'NoneType' object is not iterable
这些是:
for each in soup.find_all(attrs={'class' : 'table table-bordered table-custom'}):
for link in each.find_all('td'):
for title in link.find('a'):
print "\033[1;37m%s" % title.text
我的代码:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
import sys
if len(sys.argv) < 2:
print "Falta um paramêtro."
else:
data = requests.get("http://theanonybay.org/search?q=%s" % sys.argv[1].replace(" ", "%20")).text
soup = BeautifulSoup(data)
for each in soup.find_all(attrs={'class' : 'table table-bordered table-custom'}):
for link in each.find_all('td'):
for title in link.find('a'):
print "\033[1;37m%s" % title.text
for little in link.find_all(attrs={'class' : 'btn btn-flat btn-xs btn-warning'}):
print "Magnet Link:\033[0;37m", little.get('href'),"\n\n"
答案 0 :(得分:3)
您正在使用link.find()
,如果找不到该元素,则会始终返回元素或None
。
这意味着当前a
单元格中没有td
个链接。你可能不应该遍历链接对象,无论如何,即使找到它,因为那样你循环遍历元素的内容。
您必须先明确测试是否先找到了任何内容:
a_element = link.find('a')
if a_element:
# not None, so we can proceed
...
如果您想查找给定表格中的所有链接文本,通常更容易使用CSS selectors;从每一行开始,然后从那里向下钻取以获取链接:
for row in soup.select('.table-custom tr'):
link = row.find('a', text=True)
if link:
print "\033[1;37m%s" % link.get_text(strip=True)
for magnet in row.select('a[href^=magnet:]'):
print "Magnet Link:\033[0;37m", magnet['href']
print
请注意,不是手动转义搜索查询,而是使用requests
参数将转义转义为params
。您应该使用response.content
属性,并将解码保留给BeautifulSoup;服务器通常不在头文件中包含内容字符集,然后强制要求使用Latin-1,这通常是错误的:
params = {'q': sys.argv[1]}
response = requests.get("http://theanonybay.org/search", params=params)
soup = BeautifulSoup(response.content)