Web爬网-遍历具有活动/非活动状态的表行

时间:2019-01-14 22:05:19

标签: python selenium web-scraping

我想对以下信息进行网上报废: https://rotogrinders.com/resultsdb/date/2019-01-13/sport/4/slate/5c3c66edb1699a43c0d7bba7/contest/5c3c66f2b1699a43c0d7bd0d

有一个主表带有列user。当您点击user时,旁边还有另一个表格,显示该用户参加比赛的团队信息。我想提取所有用户的团队。因此,我需要能够通过以下方式遍历所有用户:单击,然后在第二个表中提取信息。这是我提取第一个用户团队的代码:

from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
DFSteam = []

driver.get("https://rotogrinders.com/resultsdb/date/2019-01- 13/sport/4/slate/5c3c66edb1699a43c0d7bba7/contest/5c3c66f2b1699a43c0d7bd0d")
Team1=driver.find_element_by_css_selector("table.ant-table-fixed")
driver.close
print(Team1.text)

但是,我无法遍历不同的用户。我注意到,当我单击user时,该行开关的tr类在页面源代码中从非活动状态变为活动状态,但是我不知道如何使用它。此外,我想将提取的团队存储在数据框中。我不确定同时执行此操作还是之后执行该操作是否更好。 数据框如下所示:

  

排名(团队)/ C / C / W / W / W / D / D / G / UTIL / TOTAL($)/总积分      1 /马克·舍费尔(Mark Scheifel / Mickael Backlund / Artemi Panarin)/尼克·福利尼奥(Nick Foligno)/迈克尔·弗洛里克(Michael Frolik)/马克·佐丹奴(Mark Giordano)/扎克·韦伦斯基(Cachon Hellebuyck)/布兰登·塔内夫(Brandon Tanev)/ 50 000 / 54.60

1 个答案:

答案 0 :(得分:1)

您有正确的想法。只需找到用户名元素然后单击即可获取阵容表,重新格式化以组合为一个结果数据框。

enter image description here

用户名文本用<a>标记。只需找到与用户名匹配的<a>标签即可。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd


url = 'https://rotogrinders.com/resultsdb/date/2019-01-13/sport/4/slate/5c3c66edb1699a43c0d7bba7/contest/5c3c66f2b1699a43c0d7bd0d'

# Open Browser and go to site
driver = webdriver.Chrome("C:/chromedriver_win32/chromedriver.exe")
driver.get(url)


# Waits until tables are loaded and has text. Timeouts after 60 seconds
WebDriverWait(driver, 60).until(ec.presence_of_element_located((By.XPATH, './/tbody//tr//td//span//a[text() != ""]')))

# Get tables to get the user names
tables = pd.read_html(driver.page_source)
users_df  = tables[0][['Rank','User']]
users_df['User'] = users_df['User'].str.replace(' Member', '')

# Initialize results dataframe and iterate through users
results = pd.DataFrame()
for i, row in users_df.iterrows():

    rank = row['Rank']
    user = row['User']

    # Find the user name and click on the name
    user_link = driver.find_elements(By.XPATH, "//a[text()='%s']" %(user))[0]
    user_link.click()

    # Get the lineup table after clicking on the user name
    tables = pd.read_html(driver.page_source)
    lineup = tables[1]

    #print (user)
    #print (lineup)

    # Restructure to put into resutls dataframe
    lineup.loc[9, 'Name'] = lineup.iloc[9]['Salary']
    lineup.loc[10, 'Name'] = lineup.iloc[9]['Pts']

    temp_df = pd.DataFrame(lineup['Name'].values.reshape(-1, 11), 
                    columns=lineup['Pos'].iloc[:9].tolist() + ['Total_$', 'Total_Pts'] )

    temp_df.insert(loc=0, column = 'User', value = user)
    temp_df.insert(loc=0, column = 'Rank', value = rank)

    results = results.append(temp_df)

results = results.reset_index(drop=True)

driver.close()

输出:

print (results)
    Rank            User    ...        Total_$ Total_Pts
0      1    Canadaman101    ...     $50,000.00      54.6
1      2  MayhemLikeMe27    ...     $50,000.00      53.9
2      2    gunslinger58    ...     $50,000.00      53.9
3      4        oilkings    ...     $48,600.00      53.6
4      5           TTB19    ...     $50,000.00      53.4
5      6      Adamjloder    ...     $49,800.00      53.1
6      7     DollarBillW    ...     $49,900.00      52.6
7      8     Biglarry696    ...     $49,900.00      52.4
8      8       tical1994    ...     $49,900.00      52.4
9      8        rollem02    ...     $49,900.00      52.4
10     8         kchoban    ...     $50,000.00      52.4
11     8       TBirdSCIL    ...     $49,900.00      52.4
12    13        manny716    ...     $49,900.00      52.1
13    14        JayKooks    ...     $50,000.00      51.9
14    15        Cambie19    ...     $49,900.00      51.4
15    16         mjh6588    ...     $50,000.00      51.1
16    16    shanefriesen    ...     $50,000.00      51.1
17    16        mnfish42    ...     $50,000.00      51.1
18    19        Pugsly55    ...     $49,900.00      50.9
19    19         volpez7    ...     $50,000.00      50.9
20    19        Scherr47    ...     $49,900.00      50.9
21    19    Testosterown    ...     $50,000.00      50.9
22    23         markm22    ...     $49,700.00      50.6
23    23  foreveryoung12    ...     $49,800.00      50.6
24    23       STP_Picks    ...     $49,900.00      50.6
25    26    jibbinghippo    ...     $49,800.00      50.4
26    26     loumister35    ...     $49,900.00      50.4
27    26         creels3    ...     $50,000.00      50.4
28    26        JayKooks    ...     $50,000.00      51.9
29    26   mmeiselman731    ...     $49,900.00      50.4
30    26         volpez7    ...     $50,000.00      50.9
31    26   tommienation1    ...     $49,900.00      50.4
32    26    jibbinghippo    ...     $49,800.00      50.4
33    26    Testosterown    ...     $50,000.00      50.9
34    35           nut07    ...     $50,000.00      49.9
35    35         volpez7    ...     $50,000.00      50.9
36    35        durfdurf    ...     $50,000.00      49.9
37    35    chupacabra21    ...     $50,000.00      49.9
38    39       Mbermes01    ...     $50,000.00      49.6
39    40        suerte41    ...     $50,000.00      49.4
40    40   spliksskins77    ...     $50,000.00      49.4
41    42     Andrewskoff    ...     $49,600.00      49.1
42    42          Alky14    ...     $49,800.00      49.1
43    42         bretned    ...     $50,000.00      49.1
44    42         bretned    ...     $50,000.00      49.1
45    42        gehrig38    ...     $49,700.00      49.1
46    42      d-train_91    ...     $49,500.00      49.1
47    42   DiamondDallas    ...     $50,000.00      49.1
48    49           jdmre    ...     $50,000.00      48.9
49    49         Devosty    ...     $50,000.00      48.9

[50 rows x 13 columns]