从网页上废弃移动规范

时间:2017-11-30 09:53:53

标签: python xpath web-scraping

我正在尝试获取网页中列出的所有手机的详细信息,例如名称,价格和规格。我成功获得名称和价格,以及规格 - 它搞砸了。有24个手机列表,当我尝试获取规格时,它会将所有规格放在一个列表中。我无法根据他们所属的手机找到合适的方式将它们分开。任何帮助都会得到满足。以下是功能定义 -

def get_link(self,link):
    page = requests.get(link)
    tree = html.fromstring(page.content)
    name = tree.xpath("//div[@class='_3wU53n']/text()")
    print name
    time.sleep(5)
    price = tree.xpath("//div[@class='_1vC4OE _2rQ-NK']/text()")[1::2]
    print price      
    time.sleep(5)
    highlights = tree.xpath("//ul[@class='vFw0gD']/li/text()")
    print highlights


'''
    dictionary={}
    for i in range(len(name)):
        dictionary[name[i]]=price[i]
    print dictionary


    return
'''

传递的链接是 - https://www.flipkart.com/mobiles-accessories/mobiles/pr?count=40&otracker=categorytree&p%5B%5D=sort%3Dpopularity&sid=tyy%2F4io

到目前为止的输出是 -

['Mi A1 (Black, 64 GB)', 'Redmi Note 4 (Gold, 32 GB)', 'Mi A1 (Rose Gold, 64 GB)', 'Redmi Note 4 (Gold, 64 GB)', 'Redmi Note 4 (Black, 32 GB)', 'Honor 9i (Graphite Black, 64 GB)', 'Redmi Note 4 (Black, 64 GB)', 'Moto E4 Plus (Fine Gold, 32 GB)', 'Moto E4 Plus (Iron Gray, 32 GB)', 'Intex Aqua 5.5 VR (Champagne, White, 8 GB)', 'Lenovo K8 Plus (Venom Black, 32 GB)', 'Redmi Note 4 (Dark Grey, 64 GB)', 'Panasonic Eluga Ray (Gold, 16 GB)', 'Moto C Plus (Pearl White, 16 GB)', 'Moto C Plus (Starry Black, 16 GB)', 'Moto C Plus (Fine Gold, 16 GB)', 'Lenovo K8 Plus (Fine Gold, 32 GB)', 'Panasonic Eluga Ray 700 (Champagne Gold, 32 GB)', 'Panasonic Eluga I5 (Gold, 16 GB)', 'OPPO F5 (Black, 64 GB)', 'Lenovo K8 Plus (Fine Gold, 32 GB)', 'Moto X4 (Super Black, 64 GB)', 'Swipe ELITE Sense- 4G with VoLTE', 'Swipe ELITE Sense- 4G with VoLTE']


['14,999', '9,999', '14,999', '11,999', '9,999', '17,999', '11,999', '9,999', '9,999', '4,499', '9,999', '11,999', '6,999', '6,999', '6,999', '6,999', '9,999', '9,999', '6,499', '24,990', '10,999', '22,999', '5,555', '5,555']


['4 GB RAM | 64 GB ROM | Expandable Upto 128 GB', '5.5 inch Full HD Display', '12MP + 12MP Dual Rear Camera | 5MP Front Camera', '3080 mAh Li-polymer Battery', 'Qualcomm Snapdragon 625 64 bit Octa Core 2GHz Processor', 'Android Nougat 7.1.2 | Stock Android Version', 'Android One Smartphone - with confirmed upgrades to Android Oreo and Android P', 'Brand Warranty of 1 Year Available for Mobile and 6 Months for Accessories', .....]

1 个答案:

答案 0 :(得分:0)

放手一搏。我认为这是你的预期输出:

<activity
        android:name=".ui.home.MainActivity"
        android:windowSoftInputMode="adjustPan"/>

单个潜在客户的输出:

import requests
from bs4 import BeautifulSoup

res = requests.get('https://www.flipkart.com/mobiles/pr?count=40&otracker=categorytree&p=sort%3Dpopularity&sid=tyy%2C4io')
soup = BeautifulSoup(res.text, "lxml")
for items in soup.select("._1UoZlX"):
    name = items.select("._3wU53n")[0].text
    price = items.select("._1vC4OE._2rQ-NK")[0].text
    specifics = ' '.join([item.text for item in items.select(".tVe95H")])
    print("Name: {}\nPrice: {}\nSpecification: {}\n".format(name,price,specifics))