Question

我正在寻找一种方法来删除所有重复的标题与html类＆＃34; thead＆＃34;表格中显示的行。这是我遇到问题之前的代码：

for yr in years:  
    try:  
        url = 'https://www.pro-football-reference.com/years/' + yr + '/passing.htm'
        html = urlopen(url)

        soup = BeautifulSoup(html, "lxml") 
        column_headers = [th.getText() for th in soup.findAll('tr', limit=2)[0].findAll('th')]
        table_rows = soup.select("#passing tr")[1:]

Answer 1

由于你想要的标签没有任何课程，而且你不想要的标签有以下标签：

<tr class="thead">

你可以简单地使用它来获取你想要的所有行：

table_rows = soup.find('table', id='passing').find_all('tr', class_=None)[1:]

使用class_=None将跳过所有具有任何类名的标记。

如何删除表行中显示的标题？

1 个答案: