我想在某些情况下对https://www.esportsearnings.com/tournaments进行网络爬网,然后将其导出到CSV。条件是:
<a href
链接)的文本自动处理多个网页(例如,在网页抓取第一页后,它应自动抓取2、3、4等)
将bs4导入为bs 导入urllib.request 将熊猫作为pd导入
source = urllib.request.urlopen('https://www.esportsearnings.com/tournaments')。read() 汤= bs.BeautifulSoup(来源,'lxml') 表格= soup.find('表格') table_rows = table.find_all('tr')
对于table_rows中的tr: td = tr.find_all('td') 行= [i.td中的i.text] 打印(行)
我是python新手,无法完成所有条件。上面编写的代码只是抓取数据。我想自动处理多个页面并将其导出到csv。 有人可以帮忙吗?
答案 0 :(得分:1)
import requests
import xlsxwriter
from bs4 import BeautifulSoup
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
workbook = xlsxwriter.Workbook('C:/Users/Desktop/data.xlsx')
worksheet = workbook.add_worksheet()
row = 0
column = 0
linkrow =0
urls = ["https://www.esportsearnings.com/tournaments"] #add more url by adding here
for i in urls:
page = requests.get(i)
soup = BeautifulSoup(page.content, 'html.parser')
i=1
for link in soup.find_all('a'):
a=link.get('href')
worksheet.write(linkrow,0,a)
print(link.get('href'))
linkrow += 1
workbook.close()
# for link in soup.find_all('td'):
# print(link.get_text())
尝试此代码