使用BeautifulSoup

时间:2018-04-15 14:54:36

标签: python pandas beautifulsoup screen-scraping

我正在尝试抓一个似乎有嵌套子表的表。我最初尝试编写一个循环,进一步进入子表但我解析它有困难。该网站是https://www.brilliantearth.com/design-your-own-engagement-ring/?nav=1,我正在尝试使用Python中的BeautifulSoup来获取表格的元素以存储到Pandas数据框中。表中的每一行都有一个名为"内部项的标签。"

似乎列标题有一个单独的表,另一个表列出了所有价格。 如何解析要放入列表的单元格元素?

import requests

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://www.brilliantearth.com/design-your-own-engagement-ring/?nav=1'

soup = BeautifulSoup(response.content, 'html.parser')

# Find tables
all_tables=soup.find_all('table')
all_tables

# Select table with content we want
right_table = all_tables[-1]
right_table

# Generate lists
COMPARE=[]
SHAPE=[]
CARAT=[]
COLOR=[]
CLARITY=[]
CUT=[]
ORIGIN=[]
PRICE=[]

# Add elements from cells to corresponding list
for row in right_table.findAll("tr"):
    cells = row.findAll('td')

    COMPARE.append(cells[0])
    SHAPE.append(cells[0])
    CARAT.append(cells[0])
    COLOR.append(cells[0])
    CLARITY.append(cells[0])
    CUT.append(cells[0])
    ORIGIN.append(cells[0])
    PRICE.append(cells[0])

0 个答案:

没有答案