我正在尝试提取一些棒球比赛的门票价格/信息,但每次尝试获取数据时都会出错...知道什么会导致价格、位置和细节出现这些问题吗?我也试过 XPATH 没有成功
games = ['https://seatgeek.com/dodgers-at-cubs-tickets/5-3-2021-chicago-illinois-wrigley-field/mlb/5316872', \
'https://seatgeek.com/dodgers-at-cubs-tickets/5-5-2021-chicago-illinois-wrigley-field/mlb/5316885']
#gather ticket data
urls = []
location = []
prices = []
details = []
for g in games:
try:
driver.get(g)
price = [i.text for i in WebDriverWait(driver, 100).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.Button__ButtonContents')))]
print(price)
loc = [i.text for i in WebDriverWait(driver, 100).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.ListingTicket__Section')))]
print(loc)
detail = [i.text for i in WebDriverWait(driver, 100).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.ListingTicket__Availability')))]
print(detail)
url = [str(g)] * len(price)
urls.extend(url)
prices.extend(price)
location.extend(loc)
details.extend(detail)
print(str(g) + ": " + len(price) + " ")
except:
print('Failed: ' + str(g))
pass
import requests
import pandas as pd
driver.get('https://seatgeek.com/chicago-cubs-tickets')
gameIds = [i.get_attribute('href') for i in WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.EventItem__ItemLink-sc-14845pu-6')))]
gameIds = [x[-7:] for x in gameIds]
url = 'https://seatgeek.com/rescraper/v2/listings'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'}
writer = pd.ExcelWriter(final, engine='xlsxwriter')
tables = []
for gameId in gameIds:
payload = {
'_include_seats': '1',
'client_id': 'MTY2MnwxMzgzMzIwMTU4',
'id': '%s' %gameId,
'sixpack_client_id': '93d1ab10-07dc-4482-bb89-b87c2b144e33'}
jsonData = requests.get(url, headers=headers, params=payload).json()
df = pd.json_normalize(jsonData['listings'])
df.to_excel(writer, sheet_name=gameId)
tables.append(df)
print(gameId)
table = pd.concat(tables)
writer = pd.ExcelWriter(final, engine='xlsxwriter')
table.to_excel(writer, sheet_name='Tickets')
writer.save()
print('Done')
新错误:
HTTPSConnectionPool(host='seatgeek.com', port=443): Max retries exceeded with url: /rescraper/v2/listings?
_include_seats=1&client_id=MTY2MnwxMzgzMzIwMTU4&id=5316872&sixpack_client_id=93d1ab10-07dc-4482-bb89-b87c2b144e33
(Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))
答案 0 :(得分:1)
您可以将这些用于这些元素:
price = [i.text for i in WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//div[@data-test='event-listing']//a/span")))]
price = [x.replace('\n', '') for x in price] #added to get rid of newline character in each list element
print(price)
loc = [i.text for i in WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//div[@data-test='event-listing']//div[@data-test='section']")))]
print(loc)
detail = [i.text for i in WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//div[@data-test='event-listing']//span[@data-test='quantity']")))]
print(detail)
['$26/ea', '$112/ea', '$27/ea', '$122/ea', '$101/ea', '$88/ea', '$35/ea', '$38/ea']
['424 Right · Row 6', 'Section 113 · Row 1', '420 Right · Row 9', 'Section 114 · Row 1', 'Section 109 · Row 3', 'Section 110 · Row 13', '421 Right · Row 7', '421 Right · Row 6']
['2 tickets', '4 tickets', '2 tickets', '4 tickets', '4 tickets', '4 tickets', '2 tickets', '2 tickets']
...
我为 price
添加了另一个列表推导式以去除每个字符串中出现的换行符
您还需要一个修复:
改变这个:
print(str(g) + ": " + len(price) + " ")
为此:
print(str(g) + ": " + str(len(price)) + " ")
答案 1 :(得分:1)
只需从 api 中获取该数据。只要你有那个身份证号码。您可能需要破译列的含义,但似乎很容易。您可能还想添加游戏的日期,否则所有数据都在那里:
import requests
import pandas as pd
gameIds = [5316872, 5316885]
url = 'https://seatgeek.com/rescraper/v2/listings'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'}
tables = []
for gameId in gameIds:
payload = {
'_include_seats': '1',
'client_id': 'MTY2MnwxMzgzMzIwMTU4',
'id': '%s' %gameId,
'sixpack_client_id': '93d1ab10-07dc-4482-bb89-b87c2b144e33'}
jsonData = requests.get(url, headers=headers, params=payload).json()
df = pd.json_normalize(jsonData['listings'])
tables.append(df)
输出:
这是第一个表格(仅显示前 5 行),但第一个表格中有 265 行。另一个是 455。
print(tables[0].head(10).to_string())
dm ep et f gk gr id ihd dl h lv vp mk m pu p pf q rp r rf rr ss sdq sgp sgf sif s sf sr sh sco sp spt st wc sro dq.b dq.dq dq.ddq dq.ev d fi sg sd
0 electronic True 1 2.00 budweiser bleachers 515_19 85202 y5EMUx5j6Y 0 0 0 0 s:budweiser-bleachers-515 r:19 exchange 0 82.57 84.57 2 4 19 Row 19 19 None [] 64 20.57 False budweiser bleachers 515 Budweiser Bleachers 515 515 0 False [2] pdf 0 0 1 74.62 7.8 146.55 NaN NaN NaN NaN
1 electronic True 1 209.00 121_5_111:112 895002 kYetLw0ZN64 2021-05-02 0 0 0 0 s:121 r:5 exchange 0 686.00 895.00 2 4 5 Row 5 5 [111, 112] [5, 5] 686 209.00 False 121 Section 121 121 0 False [2] mobile 0 0 5 15.73 2.1 433.27 TMX XFER MOBILE ENTRY. Scan your tickets from your mobile phone for this event. MOBILE ENTRY NO SPLITS 9645a1de-66df-49b5-b637-5fa5c4736c41 NaN NaN
2 electronic True 1 156.45 budweiser bleachers 502_11_111:112 663002 lxVsqxleK85 2021-05-02 0 0 0 0 s:budweiser-bleachers-502 r:11 exchange 0 506.00 662.45 2 4 11 Row 11 11 [111, 112] [6, 6] 506 156.45 False budweiser bleachers 502 Budweiser Bleachers 502 502 0 False [2] mobile 0 0 6 2.84 0.5 117.89 TMX XFER MOBILE ENTRY. Scan your tickets from your mobile phone for this event. MOBILE ENTRY NO SPLITS NaN NaN NaN
3 electronic True 1 148.75 129_13_111:112 631002 kYetLw0ZN2A 2021-05-02 0 0 0 0 s:129 r:13 exchange 0 482.00 630.75 2 4 13 Row 13 13 [111, 112] [6, 6] 482 148.75 False 129 Section 129 129 0 False [2] mobile 0 0 6 4.63 0.7 166.99 TMX XFER MOBILE ENTRY. Scan your tickets from your mobile phone for this event. MOBILE ENTRY NO SPLITS f2d511b1-7b7f-4d84-b628-966fee6e8109 NaN NaN
4 electronic True 1 164.16 218_10_111:112 695002 w3JsqE3VkKz 2021-05-02 0 0 0 0 s:218 r:10 exchange 0 530.00 694.16 2 4 10 Row 10 10 [111, 112] [6, 6] 530 164.16 False 218 Section 218 218 0 False [2] mobile 0 0 6 3.56 0.6 166.48 TMX XFER MOBILE ENTRY. Scan your tickets from your mobile phone for this event. MOBILE ENTRY NO SPLITS a4904c72-fcc2-4342-b214-3283268cbbab NaN NaN
5 electronic True 1 156.45 218_15_111:112 663002 NrqUJbEl0YM 2021-05-02 0 0 0 0 s:218 r:15 exchange 0 506.00 662.45 2 4 15 Row 15 15 [111, 112] [6, 6] 506 156.45 False 218 Section 218 218 0 False [2] mobile 0 0 6 3.70 0.6 155.66 TMX XFER MOBILE ENTRY. Scan your tickets from your mobile phone for this event. MOBILE ENTRY NO SPLITS a4904c72-fcc2-4342-b214-3283268cbbab NaN NaN
6 electronic True 1 147.17 229_9_111:112 621002 qVjH7eqn6jB 2021-05-02 0 0 0 0 s:229 r:9 exchange 0 473.00 620.17 2 4 9 Row 9 9 [111, 112] [6, 6] 473 147.17 False 229 Section 229 229 0 False [2] mobile 0 0 6 2.73 0.4 77.54 TMX XFER MOBILE ENTRY. Scan your tickets from your mobile phone for this event. MOBILE ENTRY NO SPLITS 4481eab0-396d-4696-bf67-950e33b45c5d NaN NaN
7 electronic True 1 139.45 229_13_111:112 589002 rVOH8wD9EP2 2021-05-02 0 0 0 0 s:229 r:13 exchange 0 449.00 588.45 2 4 13 Row 13 13 [111, 112] [6, 6] 449 139.45 False 229 Section 229 229 0 False [2] mobile 0 0 6 3.01 0.5 74.55 TMX XFER MOBILE ENTRY. Scan your tickets from your mobile phone for this event. MOBILE ENTRY NO SPLITS 4481eab0-396d-4696-bf67-950e33b45c5d NaN NaN
8 electronic True 1 132.75 229_17_111:112 557002 jDvsErZMO59 2021-05-02 0 0 0 0 s:229 r:17 exchange 0 424.00 556.75 2 4 17 Row 17 17 [111, 112] [6, 6] 424 132.75 False 229 Section 229 229 0 False [2] mobile 0 0 6 3.33 0.5 71.82 TMX XFER MOBILE ENTRY. Scan your tickets from your mobile phone for this event. MOBILE ENTRY NO SPLITS 4481eab0-396d-4696-bf67-950e33b45c5d NaN NaN
9 electronic True 1 148.75 218_20_111:112 631002 3q7fvGgbAwB 2021-05-02 0 0 0 0 s:218 r:20 exchange 0 482.00 630.75 2 4 20 Row 20 20 [111, 112] [6, 6] 482 148.75 False 218 Section 218 218 0 False [2] mobile 0 0 6 3.90 0.6 145.81 TMX XFER MOBILE ENTRY. Scan your tickets from your mobile phone for this event. MOBILE ENTRY NO SPLITS a4904c72-fcc2-4342-b214-3283268cbbab NaN NaN