BeautifulSoup - 如何遍历整个html页面,为每个

时间:2017-11-03 23:04:50

标签: python beautifulsoup

我正在使用Beautifulsoup来更改表格元素。更具体地说,我在tbody和td元素中添加了一个类。这很好用,但仅适用于第一个匹配元素。我无法弄清楚如何遍历页面上的其他匹配元素。

soup = BeautifulSoup(combine_html, "html.parser")
soup.find('tbody')['class'] = 'list'
soup.find('td')['class'] = 'fuzzy'
soup

发生以下变化

<tbody> changes to <tbody class="list"> 
The first <td> changes to <td class="fuzzy">

~~~更新~~~

我没有得到任何输入,所以也许我没有用正确的标签发布我的问题,或者答案很简单,所以没有人发布。

我能够让这个工作 - 但它真的很难看。请参阅以下代码:

import csv
import pandas as pd
# import numpy as np
from bs4 import BeautifulSoup, Tag, NavigableString

# Select columns from csv file
csv_columns = ['Email', 'Recipient Name', 'Department', 'Clicked Link?']

# Set input csv file to read from nd specify columns using csv_columns variable
df = pd.read_csv('camp1_beneficiary_fullcsv.csv', skipinitialspace=True, usecols=csv_columns)

# Set the HTML header
# Set Bootstrap CSS
# Set CSS location for list.min.js Javascript - mainly the list class
# Set div id for list.min.js
html_header="""
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.0/css/bootstrap.min.css">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.0/css/bootstrap-theme.min.css">
<link rel="stylesheet" href="def.css">
<div id="users">
  <input class="search" placeholder="Search" />
  <button class="sort" data-sort="em">
    Sort by name
  </button>
"""
# Set HTML 'footer'
# Specify list.min.js external javascript file and code

html_footer ="""
<script src="list.min.js"></script>
<script>
var options = {
  valueNames: [ 'fuzzy' ]
};
var userList = new List('users', options);
</script>

"""

# Generate HTML body using df.to_html from Pandas
html_body = df.to_html(classes=["table-bordered", "table-striped", "table-hover"])

# Combine html header, body, and footer into variable
combine_html = (html_header + html_body + html_footer)

# Find elements in HTML and add classes to support javascript classes for filtering

soup = BeautifulSoup(combine_html, "html.parser")
soup.find('tbody')['class'] = 'list'
soup

f = open('test.html','w')
f.write(str(soup))
f.close()

f = open('test.html', 'r')
filedata = f.read()
f.close()

newdata = filedata.replace("<td>", "<td class='fuzzy'>")

f = open('final.html', 'w')
f.write(newdata)
f.close()

1 个答案:

答案 0 :(得分:0)

使用find_all功能。 Here是文档。

for td in soup.find_all('td'):
    td['class'] = "list"