我试图刮一个日文网站的英文版,问题是日文版和英文版的链接是一样的,有没有办法告诉beautifulsoup刮掉英文版而不是日文版? / p>
链接我想刮:
答案 0 :(得分:2)
要证明添加lang=en
url查询参数确实有效:
>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975"
>>> english_url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975&lang=en"
>>>
>>> print(BeautifulSoup(requests.get(url).content, "html.parser").find(class_="team-name").get_text(strip=True))
サガン鳥栖
>>> print(BeautifulSoup(requests.get(english_url).content, "html.parser").find(class_="team-name").get_text(strip=True))
Sagan Tosu
请注意,您还可以使用SFCM01LANG
值添加en
Cookie :
>>> url = "https://data.j-league.or.jp/SFMS02/?match_card_id=17975"
>>> response = requests.get(url, cookies={'SFCM01LANG': 'en'})
>>> soup = BeautifulSoup(response.content, "html.parser")
>>> print(soup.find(class_="team-name").get_text(strip=True))
Sagan Tosu