如何通过传递列表来抓取网站以请求抓取每个请求的链接

时间:2019-08-31 06:28:16

标签: python beautifulsoup

我试图通过在包含(unspsc_list)列表的列表中包含不同网站的网站的所有链接的列表来抓取网站,但是我无法抓取,而只是在控制台上打印链接

ulElement.insertBefore(fragment, ulElement.firstChild);

1 个答案:

答案 0 :(得分:0)

我想,您想对列表unspsc_link中的所有链接执行循环主体,为它们检索html并过滤掉其中的表。如果确实如此,那么您可以对代码进行以下一些小的修改来开始:

df_Conversion = pd.DataFrame(columns = ['Ab','Unit of Measure', 'Conversion', 'Net/Gross Weight (lbs)', 'Volume (cubic ft)', 'Shipping Dimensions (inch) L x W x H', 'GTIN']) 
myList = ['ZOL890080401','ZOL89004004','ZOL89000180','ZOL3502111001','ZOL21110201CC','ZIM750000300','ZIM607500007','ZIM515047501','ZIM387400300','ZIM197800100','ZIM160201000','ZIM160200500','ZIG2516','ZIG1934S','ZIG1933S','ZIG1925S','XOM8229506','WTLSS1','WTLSB2','WTLRHS001','WTLOS1','WTLERSHLFM','WTLDIVAJ6','WSD909165','WOL7507040','WOL7507030','WOL7505730','WOL66084LW40','WOL66083LW40','WOL66083LW30']

# the next line replaces your for loop
# by a list comprehension (it's just equivalent, so you can
# chooose, whatever you like more)
unspsc_link = [f"https://www.medline.com/sku/item/MDP{i}" for i in myList]
# the line link = requests.get(unspsc_link).text doesn't work I guess
# because get expects a single URL I guess

for url in unspsc_link:
    link = requests.get(url).text
    soup = BeautifulSoup(link, 'lxml')

    # the rest is actually your code, 
    # please check if it does, what you
    # want it to do with the above modifications
    # if not, maybe you can add some more infos
    # about what doesn't work and what it should
    # do
    SKUDATA = []
    div1 = soup.find('div', {'class': 'medSKUPriceData'})
    SKUDATA.append(div1.text.strip())

    div = soup.find('div', {'class': 'medSKUFltRt'})
    right_table3 = div.find('table', {'class': 'medSKUTableDetails table-striped uomTable'})

    df3 =  pd.read_html(str(right_table3))[0]

    df2 = pd.DataFrame(SKUDATA)
    df_ProductId = pd.DataFrame()
    df_ProductId = df_ProductId.append(df2, ignore_index = True)
    df_ProductId.columns = ['Ab']

    df_Unit_of_Measurment = pd.DataFrame(columns=['Unit of Measure', 'Conversion', 'Net/Gross Weight (lbs)', 'Volume (cubic ft)', 'Shipping Dimensions (inch) L x W x H', 'GTIN'])
    df_Unit_of_Measurment = df_Unit_of_Measurment.append(df3, ignore_index = True)
    df_Unit_of_Measurment.columns = ['Unit of Measure', 'Conversion', 'Net/Gross Weight (lbs)', 'Volume (cubic ft)', 'Shipping Dimensions (inch) L x W x H', 'GTIN']

    df_y = pd.DataFrame()
    df_y = pd.concat([df_ProductId,df_Unit_of_Measurment], ignore_index=True)
    df_y = df_y.fillna(method='ffill')
    df_y = df_y[1:]

    df_Conversion = df_Conversion.append(df_y, ignore_index = True)

    df_Conversion = df_Conversion[1:]