我已经为此工作了两个星期,我想我已经很接近了,但是可以使用一些帮助。我一直在努力消除赔率,以吸引比赛和钱圈
browser = webdriver.Chrome()
browser.get("https://www.oddsportal.com/american-football/usa/nfl- 2017-2018/results/")
games = browser.find_element_by_class_name('table-main').text
这将返回一个字符串,其中表的所有行均以'\ n10'分隔,而所有表条目均以'\ n'分隔
"American Football\n»\n USA\n»\nNFL 2018/2019\n03 Feb 2019 - Play Offs 1 2 B's\n22:30 Los Angeles Rams - New England Patriots 3:13\n+110\n-127\n10\n27 Jan 2019 - All Stars 1 2 B's\n19:00 NFC - AFC 7:26\n-106\n-118\n10\n20 Jan 2019 - Play Offs 1 2 B's\n22:40 Kansas City Chiefs - New England Patriots 31:37 OT\n-172\n+148\n10\n19:05 New Orleans Saints - Los Angeles Rams 23:26 OT\n-164\n+140\n10\n13 Jan 2019 - Play Offs 1 2 B's\n20:40 New Orleans Saints - Philadelphia Eagles 20:14\n-385\n+312\n10\n17:05 New England Patriots - Los Angeles Chargers 41:28\n-196\n+166\n10\n00:15 Los Angeles Rams - Dallas Cowboys 30:22\n-345\n+281\n10\n12 Jan 2019 - Play Offs 1 2 B's\n20:35 Kansas City Chiefs - Indianapolis Colts 31:13\n-208\n+175\n10\n06 Jan 2019 - Play Offs 1 2 B's\n20:40 Chicago Bears - Philadelphia Eagles 15:16\n-286\n+231\n10\n17:05 Baltimore Ravens - Los Angeles Chargers 17:23\n-149\n+129\n10\n00:15 Dallas Cowboys - Seattle Seahawks 24:22\n-149\n+129\n10\n05 Jan 2019 - Play Offs 1 2 B's\n20:35 Houston Texans - Indianapolis Colts 7:21\n-127\n+108\n10\n31 Dec 2018 1 2 B's\n00:20 Tennessee Titans - Indianapolis Colts 17:33\n+194\n-233\n10
如果我执行以下操作,我会走得更近,但我仍然不知道如何实现拥有4列数据框的最终目标:比赛日期,球队,钱圈1,钱圈2
game_list1 = re.split('\n10', table_main)
返回:
["American Football\n»\n USA\n»\nNFL 2017/2018\n04 Feb 2018 - Play Offs 1 2 B's\n22:30 New England Patriots - Philadelphia Eagles 33:41\n-196\n+173",
"\n28 Jan 2018 - All Stars 1 2 B's\n19:00 AFC - NFC 24:23\n+124\n-147",
"\n21 Jan 2018 - Play Offs 1 2 B's\n22:40 Philadelphia Eagles - Minnesota Vikings 38:7\n+129\n-147",
'\n19:05 New England Patriots - Jacksonville Jaguars 24:20\n-333\n+279',
"\n14 Jan 2018 - Play Offs 1 2 B's\n20:40 Minnesota Vikings - New Orleans Saints 29:24\n-233\n+197",
'\n17:05 Pittsburgh Steelers - Jacksonville Jaguars 42:45\n-303\n+254',
'\n00:15 New England Patriots - Tennessee Titans 35:14\n-909\n+608',
所以我想我越来越近了,但是由于在不同日期的不同游戏数量而导致模式发生变化时,我不知道从这里去哪里
数据框看起来像这样,但没有得分:
date game money_line1 money_line2
0 04 Feb 2018 Patriots - Eagles -196 +173
在此之前,我尝试遍历它,运行此命令将返回1行,因为看起来我要查找的每个唯一元素都具有类名odd.deactivate:
browser = webdriver.Chrome()
browser.get("https://www.oddsportal.com/american-football/usa/nfl-2018-2019/results/")
time.sleep(2)
tab_main = browser.find_element_by_class_name('odd.deactivate').text
tab_main
'22:30 Los Angeles Rams - New England Patriots 3:13\n+110\n-127\n10'
但是尝试使用元素和xpath遍历它没有用,这是我当前的尝试:
browser = webdriver.Chrome()
browser.get("https://www.oddsportal.com/american-football/usa/nfl-2018-2019/results/")
time.sleep(2)
tab_main = browser.find_elements_by_class_name('odd.deactivate')
game_list = []
for line in tab_main:
game = line.find_element_by_xpath('/tbody/tr[4]/td[2]')
ml1 = line.find_element_by_xpath('/tbody/tr[4]/td[4]')
ml2 = line.find_element_by_xpath('/tbody/tr[4]/td[6]')
game_row = (game, ml1, ml2)
game_list.append(game_row)
这会产生以下错误:
---------------------------------------------------------------------------
NoSuchElementException Traceback (most recent call last)
<ipython-input-646-e1f07f8ecd68> in <module>
5 game_list = []
6 for line in tab_main:
----> 7 game = line.find_element_by_xpath('/tbody/tr[4]/td[2]')
8 ml1 = line.find_element_by_xpath('/tbody/tr[4]/td[4]')
9 ml2 = line.find_element_by_xpath('/tbody/tr[4]/td[6]')
~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py in find_element_by_xpath(self, xpath)
349 element = element.find_element_by_xpath('//div/td[1]')
350 """
--> 351 return self.find_element(by=By.XPATH, value=xpath)
352
353 def find_elements_by_xpath(self, xpath):
~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py in find_element(self, by, value)
657
658 return self._execute(Command.FIND_CHILD_ELEMENT,
--> 659 {"using": by, "value": value})['value']
660
661 def find_elements(self, by=By.ID, value=None):
~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py in _execute(self, command, params)
631 params = {}
632 params['id'] = self._id
--> 633 return self._parent.execute(command, params)
634
635 def find_element(self, by=By.ID, value=None):
~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py in execute(self, driver_command, params)
319 response = self.command_executor.execute(driver_command, params)
320 if response:
--> 321 self.error_handler.check_response(response)
322 response['value'] = self._unwrap_value(
323 response.get('value', None))
~/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response)
240 alert_text = value['alert'].get('text')
241 raise exception_class(message, screen, stacktrace, alert_text)
--> 242 raise exception_class(message, screen, stacktrace)
243
244 def _value_or_default(self, obj, key, default):
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/tbody/tr[4]/td[2]"}
(Session info: chrome=81.0.4044.138)
答案 0 :(得分:1)
由于它具有<connexionDesktop v-if="fromChild == false " v-on:child-to-parent="childMessage"> </connexionDesktop>
...
data () {
fromChild:false
}
methods: {
childMessage (value) {
alert('from child' + value );
this.fromChild = value
}
}
...
标签,因此我将使用pandas的<table>
函数,因为它专门解析表标签。棘手的部分是数据有多个标头,然后是数据,因此您只需要弄清楚如何遍历这些标头即可。
.read_html()
输出:
from selenium import webdriver
import pandas as pd
browser = webdriver.Chrome()
browser.get("https://www.oddsportal.com/american-football/usa/nfl-2017-2018/results/")
df= pd.read_html(browser.page_source, header=0)[0]
dateList = []
gameList = []
money_line1List = []
money_line2List = []
for row in df.itertuples():
if not isinstance(row[1], str):
continue
elif ':' not in row[1]:
date = row[1].split('-')[0]
continue
time = row[1]
dateList.append(date)
gameList.append(row[2])
money_line1List.append(row[5])
money_line2List.append(row[6])
result = pd.DataFrame({'date':dateList,
'game':gameList,
'money_line1':money_line1List,
'money_line2':money_line2List})