在python中抓取部分网站

时间:2018-01-29 17:35:59

标签: python web-scraping beautifulsoup telegram

我想抓住这个网站http://warframe.wikia.com/wiki/的一部分,特别是描述,命令应该回复为

"Equipped with a diverse selection of arrows, deft hands, shroud of stealth and an exalted bow, the crafty Ivara infiltrates hostile territory with deception and diversion, and eliminates threats with a shot from beyond. Ivara emerged in Update 18.0."

没有别的,也许可以用来设置一种<p>我想打印。 现在我有这个,但它没有回复我想要的东西。

import requests
from bs4 import BeautifulSoup

req = requests.get('http://warframe.wikia.com/wiki/Ivara')
soup = BeautifulSoup(req.text, "lxml")
for sub_heading in soup.find_all('p'):
    print(sub_heading.text)

2 个答案:

答案 0 :(得分:2)

您可以使用目标段落的索引并将所需文本作为

print(soup.select('p')[4].text.strip())

或使用文字"Release Date:"获取上一段:

 print(soup.findAll('b', text="Release Date:")[0].parent.next_sibling.text.strip())

答案 1 :(得分:1)

使用@Andersson提供的解决方案(由于每个人都没有Release date,它不会为所有英雄工作)和@SIM的评论,我给你一个概括任何英雄/冠军的解决方案(或在该游戏中你称之为的任何东西)。

name = 'Ivara'
url = 'http://warframe.wikia.com/wiki/' + name
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
main_text = soup.find('div', class_='tabbertab')
print(main_text.find('b', text=name).parent.text.strip())

输出:

Equipped with a diverse selection of arrows, deft hands, shroud of stealth and an exalted bow, the crafty Ivara infiltrates hostile territory with deception and diversion, and eliminates threats with a shot from beyond. Ivara emerged in Update 18.0.

对于其他英雄,只需更改name变量。

使用name = 'Volt'的另一个示例,输出:

Volt has the power to wield and bend electricity. He is highly versatile, armed with powerful abilities that can damage enemies, provide cover and complement the ranged and melee combat of his cell. The electrical nature of his abilities make him highly effective against the Corpus, with their robots in particular. He is one of three starter options for new players.

<强>解释

如果您检查该页面,则可以看到<b>hero-name</b>标记内只有<div class="tabbertab" ... ><b>...</b>。因此,您可以使用NVERS=v8.9.2 sudo rm -rf node-$NVERS-linux-x64 sudo rm -f node-$NVERS-linux-x64.tar.xz wget https://nodejs.org/dist/$NVERS/node-$NVERS-linux-x64.tar.xz tar xvf node-$NVERS-linux-x64.tar.xz sudo cp node-$NVERS-linux-x64/bin/node /usr/local/bin/node sudo cp node-$NVERS-linux-x64/bin/npm /usr/local/bin/npm 查找所需的文字。