我的代码是这个
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url=https://www.chembid.com/results/?q=124-07-2&sort=price
my_url='https://www.chembid.com/results/?q=124-07-2&sort=price'
# opening up connection grapping the page
uClient=uReq(my_url)
page_html=uClient.read()
uClient.close()
#html parser
page_soup=soup(page_html,"html.parser")
for Container in Containers:
name=Container.div.div.span
title_container=Container.findAll("a",{"class":"supplier"})
supplier=title_container[0].text
我现在想做的就是使用bs4查找全部
>>> cas_no=Container.findAll("span",{"class":"regular-small-regular-small-font block"})
此代码中
工厂供应高质量的99%min辛酸/辛酸CAS 124-07-2,用于制造染料,药物,香料 Verifizierter Anbieter-> -> 山东宝维能源科技有限公司 中国 CAS号:124-07-2 质量/等级:农业级,电子级,食品级,工业级,医学级,试剂级 www.alibaba.com $ 0.25-3.68 每公斤,离岸价 显示报价我要寻找的是名称,供应商,Cas-no,质量和价格。
谢谢
答案 0 :(得分:0)
所以我首先看到的是您尝试遍历Containers
对象,但从未将其存储为任何东西。因此,您需要先进行存储,然后再进行迭代。
希望有人会发布一个更强大的解决方案,但是就输出内容和您要输出的内容而言,这将从特定页面中获取。有一些不存在的部分,因此我不得不考虑这些部分,如果它们不存在,则为空。尽管如此,这应该可以帮助您:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import pandas as pd
results = pd.DataFrame()
my_url='https://www.chembid.com/results/?q=124-07-2&sort=price'
# opening up connection grapping the page
uClient=uReq(my_url)
page_html=uClient.read()
uClient.close()
#html parser
page_soup=soup(page_html,"html.parser")
containers = page_soup.find_all('div', {'class':"result-horizontal-wrapper"})
for container in containers:
name = container.div.div.span.text
if container.find('a' , {'class':'supplier'}):
supplier = container.find('a' , {'class':'supplier'}).text
else:
supplier = 'n/a'
span_cas_qulity = container.find_all('span', {'class':'regular-small-font block'})
cas_no = [x.text for x in span_cas_qulity if 'CAS' in x.text]
quality = [x.text for x in span_cas_qulity if 'Quality/Grade' in x.text]
if cas_no != []:
cas_no = cas_no[0]
else:
cas_no = None
if quality != []:
quality = quality[0]
else:
quality = None
span_price = container.select('span.black-bold-font-big')[0].text
span_rate = container.select('span.block.regular-small-font.price')[0].text
temp_df = pd.DataFrame([[name, supplier, cas_no, quality, span_price, span_rate]], columns = ['name','supplier','cas_no','quality','price','rate'])
results = results.append(temp_df).reset_index(drop = True)