我有一个元组列表,我需要使用pandas将其放入CSV文件,但不知道如何。我想把它们放在一个但是没有用的情况下。这是我想要进入CSV的列表。
tables = soup.find_all("div", {"class":"pane"})[0].find_all("table")
if (len(tables) > 4):
product_list = [
(
t[0].findAll("div", {"class":"headline"})[0].text.strip(), #title
t[0].findAll("div", {"class":"copy"})[0].text.strip(), #description
t[1].text.strip(), #product number
t[2].text.strip(), #category number
t[3].text.strip() #price
)
for t in (t.find_all('td') for t in tables[4].find_all('tr'))
if t
]
elif (len(tables) == 1):
product_list = [
(
t[0].findAll("div", {"class":"catNo"})[0].text.strip(), #catNo
t[0].findAll("div", {"class":"headline"})[0].text.strip(), #headline
t[0].findAll("div", {"class":"price"})[0].text.strip(), #price
t[0].findAll("div", {"class":"copy"})[0].text.strip() #description
)
for t in (t.find_all('td') for t in tables[0].find_all('tr'))
if t
]
else:
print("could not parse main product\n\n")
time.sleep(timeDelay)
print(product_list)
time.sleep(timeDelay)
if len(tables) > 5:
add_product_list = [
(
t[0].findAll("div", {"class":"title"})[0].text.strip(), #title
t[0].findAll("div", {"class":"copy"})[0].text.strip(), #description
t[1].text.strip(), #product number
t[2].text.strip(), #category number
t[3].text.strip() #price
)
for t in (t.find_all('td') for t in tables[5].find_all('tr'))
if t
]
print(add_product_list)
time.sleep(timeDelay)
我已经导入了大熊猫,但不知道要放在数据框中的是什么,因为它们不是每个都被命名为特定的项目,它们都被集中在一起。任何帮助都会很棒,因为这是我的拳头之一我做过的擦伤。谢谢!
这也是我正在抓取的HTML / URL脚本的第一部分。
from bs4 import BeautifulSoup
import requests
import time
import random
import csv
import pandas as pd
f = pd.DataFrame
filename = "Qiagen_Scrape_final.csv"
f = open(filename, "w")
headers = "product_name, product discription, Cat No, product number, price\n"
f.write('headers')
product_urls =[
'https://www.qiagen.com/us/shop/pcr/primer-sets/miscript-precursor-assays/#orderinginformation',
'https://www.qiagen.com/us/shop/pcr/primer-sets/miscript-primer-assay-plate/#orderinginformation',
]
答案 0 :(得分:0)
以下是如何使用字典用这些数据填充数据框:
from bs4 import BeautifulSoup
import requests
import time
import random
import pandas as pd
product_urls = [
'https://www.qiagen.com/us/shop/pcr/primer-sets/miscript-primer-assay-plate/#orderinginformation'
]
html = requests.get(product_urls[0]).text
soup = BeautifulSoup(html, 'lxml')
container = soup.find('table')
tables = container.find_all('tr')
dic_list = []
for t in tables[13:]:
data = t.find_all('td')
dic = {}
try:
dic['title'] = data[0].find('div').text
dic['description'] = data[0].find("div", {"class":"copy"}).text,
dic['prod_number'] = data[1].text,
dic['cat_number'] = data[2].text,
dic['price'] = data[3]['price']
except:
pass
dic_list.append(dic)
df = pd.DataFrame(dic_list)
print(df.sample(3))
cat_number description price prod_number \
25 (MS00064316,) ((ZmU65-2),) 93.1 (218300,)
13 (MS00064232,) ((ZmU49-1),) 93.1 (218300,)
26 (MS00064323,) ((OssnoR28),) 93.1 (218300,)
title
25 Zm_U65-2_1 miScript Primer Assay
13 Zm_U49-1_1 miScript Primer Assay
26 Os_snoR28_1 miScript Primer Assay
最后,保存您的csv
文件,如下所示:
df.to_csv('sample_csv.csv', index=False)