使用Python

时间:2017-05-31 09:25:37

标签: python web-scraping beautifulsoup

我使用Python开发了一个脚本来从这个URL https://www.jumia.com.ng/mobile-phones/

中删除手机名称

这是我的剧本:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.jumia.com.ng/mobile-phones/' 
uClient =uReq(my_url) #open connection.. grab the page
page_html = uClient.read() #load the content into a varaible
uClient.close()  #close the console
page_soup = soup(page_html, "html.parser") #it does the html parser
phone_name = page_soup.findAll("span",{"class":"name"}) #grabs each phone name
print (phone_name)

我的预期结果应该是这样的:

Marathon M5 Mini 5.0-Inch IPS (2GB, 16GB ROM) Android 5.1 Lollipop, 13MP + 8MP Smartphone - Grey

但我得到的是:

<span class="name" dir="ltr">Marathon M5 Mini 5.0-Inch IPS (2GB, 16GB ROM) Android 5.1 Lollipop, 13MP + 8MP Smartphone - Grey</span>.

如何从此<span class="name" dir="ltr">Marathon M5 Mini 5.0-Inch IPS (2GB, 16GB ROM) Android 5.1 Lollipop, 13MP + 8MP Smartphone - Grey</span>中提取文字?

1 个答案:

答案 0 :(得分:0)

要提取姓名,请使用.text

>>> for phone_name in page_soup.findAll("span",{"class":"name"}):
        print(phone_name.text)

Boom J8 5.5 Inch (2GB, 16GB ROM) Android Lollipop 5.1 13MP + 5MP Smartphone - White (MWFS)
Marathon M5 Mini 5.0-Inch IPS (2GB, 16GB ROM) Android 5.1 Lollipop, 13MP + 8MP Smartphone - Grey

因此,您的脚本应该是:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.jumia.com.ng/mobile-phones/' 
uClient =uReq(my_url) #open connection.. grab the page
page_html = uClient.read() #load the content into a varaible
uClient.close()  #close the console
page_soup = soup(page_html, "html.parser") #it does the html parser
for phone_name in page_soup.findAll("span",{"class":"name"}):
    print(phone_name.text)