我的代码在下面,但是为什么brand
值输出External_links
而不是我提取的项目列表。
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
my_url = 'https://en.wikipedia.org/wiki/Harry_Potter'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,"html.parser")
headline = page_soup.findAll("span",{"class":"mw-headline"})
for item in headline:
brand = item["id"] # Outputs "External_links"
答案 0 :(得分:1)
在您的for
循环中,您要遍历页面中的每个标题,然后将标题值分配给变量brand
。循环结束后,brand
的值将是最后一个标题(“外部链接”)。
如果您修改代码以打印每个标题的值,则会看到您正在获取所需的值。
>>> for item in headline:
... print(item["id"])
...
Plot
Early_years
Voldemort_returns
Supplementary_works
Harry_Potter_and_the_Cursed_Child
In-universe_books
Pottermore_website
Structure_and_genre
Themes
Origins
Publishing_history
Translations
Completion_of_the_series
Cover_art
Achievements
Cultural_impact
Commercial_success
Awards,_honours,_and_recognition
Reception
Literary_criticism
Social_impact
Controversies
Adaptations
Films
Spin-off_prequels
Games
Audiobooks
Stage_production
Attractions
The_Wizarding_World_of_Harry_Potter
The_Making_of_Harry_Potter
References
Further_reading
External_links
答案 1 :(得分:0)
您的import re
def main():
mytext = open("m.txt")
mypattern = re.compile('n. (m.|f.)')
for line in mytext:
match = re.search(mypattern, line)
if match:
print(match.group())
if __name__ == "__main__":
main()
变量必须是一个列表,例如代码可能像这样:
brand
打印:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
from pprint import pprint
my_url = 'https://en.wikipedia.org/wiki/Harry_Potter'
with uReq(my_url) as uClient:
page_html = uClient.read()
page_soup = soup(page_html, "xml")
brand = []
for item in page_soup.find_all('span', {'class': 'mw-headline'}):
brand.append(item["id"])
pprint(brand)
答案 2 :(得分:0)
使用列表理解实现相同目的:
import requests
from bs4 import BeautifulSoup
from pprint import pprint
url = 'https://en.wikipedia.org/wiki/Harry_Potter'
soup = BeautifulSoup(requests.get(url).text, "lxml")
items = [item.get('id') for item in soup.find_all('span',class_='mw-headline')]
pprint(items)