我的代码:
from urllib2 import urlopen
from bs4 import BeautifulSoup
url = "https://realpython.com/practice/profiles.html"
html_page = urlopen(url)
html_text = html_page.read()
soup = BeautifulSoup(html_text)
links = soup.find_all('a', href = True)
files = []
def page_names():
for a in links:
files.append(a['href'])
return files
page_names()
print files[:]
base = "https://realpython.com/practice/"
print base + files[:]
我正在尝试解析出三个网页文件名并将它们附加到“文件”列表中,然后以某种方式附加或添加到基本网址的末尾以进行简单打印。
我已经尝试将“base”作为单个项目列表,所以我可以追加,但我对Python很新,并且相信我搞砸了我的陈述。
目前我得到:
print files[:]
TypeError: 'type' object has no attribute '__getitem__'
答案 0 :(得分:2)
最后你定义了list[:]
,这是完全错误的,因为list
是用于创建实际列表的内置关键字。
from urllib2 import urlopen
from bs4 import BeautifulSoup
url = "https://realpython.com/practice/profiles.html"
html_page = urlopen(url)
html_text = html_page.read()
soup = BeautifulSoup(html_text)
links = soup.find_all('a', href = True)
files = []
def page_names():
for a in links:
files.append(a['href'])
page_names()
base = "https://realpython.com/practice/"
for i in files:
print base + i
<强>输出:强>
https://realpython.com/practice/aphrodite.html
https://realpython.com/practice/poseidon.html
https://realpython.com/practice/dionysus.html
您不需要创建用于存储链接或文件的中间列表,只需使用list_comprehension。
from urllib2 import urlopen
from bs4 import BeautifulSoup
url = "https://realpython.com/practice/profiles.html"
html_page = urlopen(url)
html_text = html_page.read()
soup = BeautifulSoup(html_text)
files = [i['href'] for i in soup.find_all('a', href = True)]
base = "https://realpython.com/practice/"
for i in files:
print base + i