我想在2019年1月10日在数据框中存储所有用于NHL $ 30K芬兰语Flash的团队。到目前为止,我只能将团队存储在首页上。而且,如果用户进入了两个不同的团队,则两次都存储了他最高排名的团队...这是我的代码:
#Packages:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
import time
# Driver
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
# DF taht will be use later
results = pd.DataFrame()
calendar=[]
calendar.append("2019-01-10")
for d in calendar:
driver.get("https://rotogrinders.com/resultsdb/date/"+d+"/sport/4/")
time.sleep(10)
contest= driver.find_element_by_xpath("//*[@id='root']/div/main/main/div[2]/div[3]/div/div/div[1]/div/div/div/div/div[3]")
contest.click()
list_links = driver.find_elements_by_tag_name('a')
hlink=[]
for ii in list_links:
hlink.append(ii.get_attribute("href"))
sub="https://rotogrinders.com/resultsdb"
con= "contest"
contest_list=[]
for text in hlink:
if sub in text:
if con in text:
contest_list.append(text)
c=contest_list[2]
driver.get(c)
WebDriverWait(driver, 60).until(ec.presence_of_element_located((By.XPATH, './/tbody//tr//td//span//a[text() != ""]')))
# Get tables to get the user names
tables = pd.read_html(driver.page_source)
users_df = tables[0][['Rank','User']]
users_df['User'] = users_df['User'].str.replace(' Member', '')
# Initialize results dataframe and iterate through users
for i, row in users_df.iterrows():
rank = row['Rank']
user = row['User']
# Find the user name and click on the name
user_link = driver.find_elements(By.XPATH, "//a[text()='%s']" %(user))[0]
user_link.click()
# Get the lineup table after clicking on the user name
tables = pd.read_html(driver.page_source)
lineup = tables[1]
# Restructure to put into resutls dataframe
lineup.loc[9, 'Name'] = lineup.iloc[9]['Salary']
lineup.loc[10, 'Name'] = lineup.iloc[9]['Pts']
temp_df = pd.DataFrame(lineup['Name'].values.reshape(-1, 11),
columns=lineup['Pos'].iloc[:9].tolist() + ['Total_$', 'Total_Pts'] )
temp_df.insert(loc=0, column = 'User', value = user)
temp_df.insert(loc=0, column = 'Rank', value = rank)
temp_df["Date"]=d
results = results.append(temp_df)
results = results.reset_index(drop=True)
driver.close()
所以,我想:
1)遍历所有页面:
我确实找到了next_page
按钮;与:
next_button = driver.find_elements_by_xpath("//button[@type='button']")
但是,我无法在我的for循环中添加该步骤。
2)如果用户多次参加比赛,则访问不同的user_link。我认为也许可以使用这样的用户频率通过for循环来做到这一点:
users_df.groupby("User").count()
for i in range(users_df[user,"Number"]):
user_link = driver.find_elements(By.XPATH, "//a[text()='%s']" %(user))[i]
user_link.click()
但是,添加这些步骤时,我总是收到一些错误消息。或者,如果运行正常,只需跳过该部分即可逐行存储所有团队,并迅速关闭驱动程序...
答案 0 :(得分:1)
我的建议:
对于您来说,仅使用请求或任何其他等效模块来从服务器获取数据就足够了,因为您要剪贴的服务具有api服务器,例如check the link 。 该示例使用第一个端点:
希望这会使您的任务更轻松。