运行python for multipul urls得不到输出

时间:2018-01-25 18:28:33

标签: python python-3.x web-scraping

我正在尝试为这个网站运行一个刮刀,当我只使用一个网址时,代码可以工作,但是当我添加多个时,它没有输出。我需要它通过不同的网址运行并刮取信息。

> Blockquote
>`import requests
>import csv
>from bs4 import BeautifulSoup
>from html.parser import HTMLParser
>from time import sleep
from random import randint
<import urllib.request

r=requests.get('https://www.qiagen.com/us/products/a-z-list/#&&s=Ascending&pg=55&q=&l=')
c=r.content
s=BeautifulSoup(c,"html.parser")


product_urls = ['https://www.qiagen.com/us/shop/pcr/primer-sets/miscript-precursor-assays/#orderinginformation', 
'https://www.qiagen.com/us/shop/pcr/primer-sets/miscript-primer-assay-plate/#orderinginformation', 
'https://www.qiagen.com/us/shop/pcr/primer-sets/miscript-primer-assays/#orderinginformation', 
'https://www.qiagen.com/us/shop/genes-and-pathways/technology-portals/browse-qpcr/mirna-gene-expression/mirna-isolation/miscript-single-cell-qpcr-kit/#orderinginformation']

for url in product_urls:
    page = urllib.request.urlopen(url)
    s = BeautifulSoup(page,"html.parser")

getall = s.find_all("div",{"class":"gene_globe_segment_0_OrderingInfoPane"})
getall

for i in getall:
    product_name = (i.find('div',{'class':'title'}).text.strip())
    product_discription = (i.find('div',{'class': 'copy'}).text.strip())
    product_number = (i.find('td',{'class': 'textLeft paddingTopLess'}).text.strip())
    cat_number = (i.find('td',{'class': 'textRight paddingTopLess'}).text.strip())
    product_price = (i.find('td',{'class': 'textRight paddingTopLess priceSingle'}).text.strip())

for i in getall:
    print(i.find('div',{'class':'title'}).text.strip()) #product name
    print(i.find('div',{'class': 'copy'}).text.strip()) #product discription
    print(i.find('td',{'class': 'textLeft paddingTopLess'}).text.strip()) #product number
    print(i.find('td',{'class': 'textRight paddingTopLess'}).text.strip()) #cat number
    print(i.find('td',{'class': 'textRight paddingTopLess priceSingle'}).text.strip()) #product price

    print(' ')`<

1 个答案:

答案 0 :(得分:0)

您的脚本存在多个问题。您已在主容器中定义了错误的class名称。你的脚本错误地缩进了。最后,您需要以这样的方式调整选择器,以便它可以处理不同的站点。我已将您的打印项目减少到两个,以便我可以给您一个透明的演示。我试着清理一下你的烂摊子。我在下面粘贴的修改过的脚本是有效的。

你走了:

import requests
from bs4 import BeautifulSoup

product_urls = [
'https://www.qiagen.com/us/shop/pcr/primer-sets/miscript-precursor-assays/#orderinginformation', 
'https://www.qiagen.com/us/shop/pcr/primer-sets/miscript-primer-assay-plate/#orderinginformation', 
'https://www.qiagen.com/us/shop/pcr/primer-sets/miscript-primer-assays/#orderinginformation', 
]

for URL in product_urls:
    page = requests.get(URL)
    soup = BeautifulSoup(page.text,"lxml")

    for item in soup.select(".content"):
        product_name = item.select_one('.title').text.strip()
        product_discription = item.select_one('.copy').text.strip()
        print("Name: {}\n\nDescription: {}\n\n".format(product_name,product_discription))