我试图通过在包含(unspsc_list)列表的列表中包含不同网站的网站的所有链接的列表来抓取网站,但是我无法抓取,而只是在控制台上打印链接
ulElement.insertBefore(fragment, ulElement.firstChild);
答案 0 :(得分:0)
我想,您想对列表unspsc_link
中的所有链接执行循环主体,为它们检索html并过滤掉其中的表。如果确实如此,那么您可以对代码进行以下一些小的修改来开始:
df_Conversion = pd.DataFrame(columns = ['Ab','Unit of Measure', 'Conversion', 'Net/Gross Weight (lbs)', 'Volume (cubic ft)', 'Shipping Dimensions (inch) L x W x H', 'GTIN'])
myList = ['ZOL890080401','ZOL89004004','ZOL89000180','ZOL3502111001','ZOL21110201CC','ZIM750000300','ZIM607500007','ZIM515047501','ZIM387400300','ZIM197800100','ZIM160201000','ZIM160200500','ZIG2516','ZIG1934S','ZIG1933S','ZIG1925S','XOM8229506','WTLSS1','WTLSB2','WTLRHS001','WTLOS1','WTLERSHLFM','WTLDIVAJ6','WSD909165','WOL7507040','WOL7507030','WOL7505730','WOL66084LW40','WOL66083LW40','WOL66083LW30']
# the next line replaces your for loop
# by a list comprehension (it's just equivalent, so you can
# chooose, whatever you like more)
unspsc_link = [f"https://www.medline.com/sku/item/MDP{i}" for i in myList]
# the line link = requests.get(unspsc_link).text doesn't work I guess
# because get expects a single URL I guess
for url in unspsc_link:
link = requests.get(url).text
soup = BeautifulSoup(link, 'lxml')
# the rest is actually your code,
# please check if it does, what you
# want it to do with the above modifications
# if not, maybe you can add some more infos
# about what doesn't work and what it should
# do
SKUDATA = []
div1 = soup.find('div', {'class': 'medSKUPriceData'})
SKUDATA.append(div1.text.strip())
div = soup.find('div', {'class': 'medSKUFltRt'})
right_table3 = div.find('table', {'class': 'medSKUTableDetails table-striped uomTable'})
df3 = pd.read_html(str(right_table3))[0]
df2 = pd.DataFrame(SKUDATA)
df_ProductId = pd.DataFrame()
df_ProductId = df_ProductId.append(df2, ignore_index = True)
df_ProductId.columns = ['Ab']
df_Unit_of_Measurment = pd.DataFrame(columns=['Unit of Measure', 'Conversion', 'Net/Gross Weight (lbs)', 'Volume (cubic ft)', 'Shipping Dimensions (inch) L x W x H', 'GTIN'])
df_Unit_of_Measurment = df_Unit_of_Measurment.append(df3, ignore_index = True)
df_Unit_of_Measurment.columns = ['Unit of Measure', 'Conversion', 'Net/Gross Weight (lbs)', 'Volume (cubic ft)', 'Shipping Dimensions (inch) L x W x H', 'GTIN']
df_y = pd.DataFrame()
df_y = pd.concat([df_ProductId,df_Unit_of_Measurment], ignore_index=True)
df_y = df_y.fillna(method='ffill')
df_y = df_y[1:]
df_Conversion = df_Conversion.append(df_y, ignore_index = True)
df_Conversion = df_Conversion[1:]