我正在尝试抓一个似乎有嵌套子表的表。我最初尝试编写一个循环,进一步进入子表但我解析它有困难。该网站是https://www.brilliantearth.com/design-your-own-engagement-ring/?nav=1,我正在尝试使用Python中的BeautifulSoup来获取表格的元素以存储到Pandas数据框中。表中的每一行都有一个名为"内部项的标签。"
似乎列标题有一个单独的表,另一个表列出了所有价格。 如何解析要放入列表的单元格元素?
import requests
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.brilliantearth.com/design-your-own-engagement-ring/?nav=1'
soup = BeautifulSoup(response.content, 'html.parser')
# Find tables
all_tables=soup.find_all('table')
all_tables
# Select table with content we want
right_table = all_tables[-1]
right_table
# Generate lists
COMPARE=[]
SHAPE=[]
CARAT=[]
COLOR=[]
CLARITY=[]
CUT=[]
ORIGIN=[]
PRICE=[]
# Add elements from cells to corresponding list
for row in right_table.findAll("tr"):
cells = row.findAll('td')
COMPARE.append(cells[0])
SHAPE.append(cells[0])
CARAT.append(cells[0])
COLOR.append(cells[0])
CLARITY.append(cells[0])
CUT.append(cells[0])
ORIGIN.append(cells[0])
PRICE.append(cells[0])