我正在使用漂亮的汤来尝试刮擦网站表并将仅特定的列提取到CSV文件中。
#include <concepts>
template<typename T>
concept Fooable = requires(T f) {
{ bar(f) } -> std::convertible_to<float>;
};
import requests
import urllib.request
from bs4 import BeautifulSoup
product_table = browser.page_source
soup = BeautifulSoup(product_table, 'html.parser')
table = soup.find_all('table')[4]
table_rows = table.find_all('tr')
for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)
的输出:
print(row)
所以我的问题是:如何仅从每行中提取单元格[]
['', 'CANDY', 'ALBANESE CONFEC', 'Albanese Confectionery Group', 'Gummi Sour Bears 12 Flavor', '12', '7 oz', '17.14', 'CS', '53328', '', 'ACG53328', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', 'Albanese Confectionery Group', 'Gummi Bears 12 Flavor', '12', '7.5 oz', '17.14', 'CS', '53348', '', 'ACG53348', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', 'Albanese Confectionery Group', 'Gummi Mini Worms 12 Flavor', '12', '7.5 oz', '17.14', 'CS', '53350', '', 'ACG53350', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', "Albanese World's Best", 'Gummi Bears 12 Flavor', '6', '9 oz', '11.59', 'CS', '53380', '', 'ACG53380', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', "Albanese World's Best", 'Gummi Mini Worms 12 Flavor', '6', '9 oz', '11.59', 'CS', '53381', '', 'ACG53381', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', "Albanese World's Best", 'Peach Rings', '6', '8 oz', '11.59', 'CS', '53383', '', 'ACG53383', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', "Albanese World's Best", 'Gummi Worms Mini Sour Neon', '6', '8 oz', '11.59', 'CS', '53384', '', 'ACG53384', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', "Albanese World's Best", 'Gummi Bears 12 Flavor', '12', '3.5 oz', '8.23', 'CS', '53450', '', 'ACG53450', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'ALBANESE CONFEC', "Albanese World's Best", 'Gummi Sherbet Bears 12 Flavor', '12', '3.5 oz', '8.23', 'CS', '53456', '', 'ACG53456', '', '\xa0\xa0\xa0\xa0']
['', 'CANDY', 'AMERICAN LICORI', 'Red Vines', 'Red Vines Orig Red Twists Bag', '12', '8 oz', '19.20', 'CS', '00232', '', 'AML00232', '', '\xa0\xa0\xa0\xa0']
和[11]
并将它们全部并排打印到csv中。因此,对于第1行的示例,我想将ACG53328(单元A)和17.14(单元B)写入csv文件并继续向下。如果有区别的话,我没有在此处粘贴大约4,000行。
答案 0 :(得分:0)
类似以下的方法应该起作用:
import csv
import requests
import urllib.request
from bs4 import BeautifulSoup
product_table = browser.page_source
soup = BeautifulSoup(product_table,'html.parser')
table = soup.find_all('table')[4]
with open('output.csv', 'w', newline="") as f:
writer = csv.writer(f)
writer.writerow(['SKU', 'TD_7'])
for tr in table.find_all('tr'):
try:
td_12 = tr.find_all('td')[12].get_text(strip=True)
except IndexError:
td_12 = ""
try:
td_08 = tr.find_all('td')[8].get_text(strip=True)
except IndexError:
td_08 = ""
writer.writerow([td_12, td_08])