如何正确剪贴网站并从网站获取所有td文本

时间:2020-11-03 07:48:51

标签: python web web-scraping beautifulsoup python-requests

我是python的新手。有谁知道{sum(int(int(td.text)for td在汤中。select('td:last-child')[1:])}}在此或[0]或[1中] [1:]的用途是什么]。我在下面的许多循环示例中看到了它。在我练习的过程中,我构建了此代码,无法将csv文件中的所有数据都删除。在此先感谢,一次抱歉两个问题。

import requests
from bs4 import BeautifulSoup
import csv

url= "https://iplt20.com/stats/2020/most-runs"

r= requests.get (url)

soup= BeautifulSoup (r.content, 'html5lib')

lst= []

table=soup.find ('div', attrs = {'class':'js-table'})



#for row in table.findAll ('div', attrs= {'class':'top-players__player-name'}):
#    score = {}
 #   score['Player'] = row.a.text.strip()
#    lst.append(score)

for row in table.findAll (class_='top-players__m top-players__padded '):
    score = {}
    score['Matches'] = int(row.td.text)
    lst.append(score)

filename= 'iplStat.csv'
with open (filename, 'w', newline='') as f:
    w= csv.DictWriter(f,['Player', 'Matches'])
    w.writeheader()
    for score in lst:
        w.writerow(score)



print (lst)

1 个答案:

答案 0 :(得分:0)

所有这些甚至都不需要。只需使用pandas

import requests
import pandas as pd

url = "https://iplt20.com/stats/2020/most-runs"

r = requests.get (url)

df = pd.read_html(r.content)[0]

df.to_csv("iplStats.csv", index = False)

csv文件的屏幕截图:

enter image description here