我正在尝试从this site中提取时间表数据。内容包含在类为.departures-table的div中。我想忽略前两行并将数据存储在数组中,但是它不起作用。我显然犯了一个错误,但找不到哪个。谢谢
snav_live_departures_url = "https://www.snav.it/"
headers = {'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.3'}
request = urllib.request.Request(snav_live_departures_url,headers=headers)
html = urllib.request.urlopen(request).read()
soup = BeautifulSoup(html,'html.parser')
snav_live_departures = []
snav_live_departures_table = list(soup.select('.departures-table div')) [2:]
print(snav_live_departures_table)
for div in snav_live_departures_table:
div = div.select('departures-row')
snav_live_departures.append({
'TIME':div[4].text,
'DEPARTURE HARBOUR':div[0].text,
'ARRIVAL HARBOUR':div[1].text,
'STATUS':td[3].select('span.tt-text')[0].text,
'PURCHASE LINK':div[6].select('a')[0].attrs['href']
})
答案 0 :(得分:2)
这里发生了一些不同的事情:
您实际上是“幸运的”页面上没有数据,否则此代码将以"""
Created on Mon Dec 17 17:33:01 2018
@author: Jennie
"""
moves = ['rock', 'paper', 'scissors']
import random
#Create player class
class Player:
def move(self):
return 'rock'
def learn(self, my_move, their_move):
pass
#Create random player class
class RandomPlayer:
def __init__(self):
Player.__init__(self)
def move(self):
#use imported random function & choice
choices = ['Rock', 'Paper', 'Scissors']
random_player = random.choice(choices)
#Computer choice is either rock, paper, or scissors
if random_player == ("Rock"):
print("Opponent played Rock")
elif random_player == ("Paper"):
print("Opponent played Paper")
else:
print("Opponent played Scissors")
#return value
return random_player
#Create human player class
class HumanPlayer:
def __init__(self):
Player.__init__(self)
def move(self):
while True:
human_player = input("'Rock', 'Paper', or 'Scissors' ")
#Detect invalid entry
if human_player.lower() not in moves:
print('Please choose Paper, Rock or Scissors: ')
else:
break
return human_player
##class that remembers what move the opponent played last round
class ReflectPlayer:
def __init__(self, ReflectPlayer):
Player.__init__(self)
self.ReflectPlayer = ReflectPlayer
# def move
def move(self, move):
self.move = move
def getmove(self, move):
return self.move
#define cycleplayer class that remembers what move it played last round,
# and cycles through the different moves.
class CyclePlayer:
def __init__(self, CyclePlayer):
Player.__init__(self)
self.CyclePlayer = CyclePlayer
self.human_player_history = {} # stores the frequency of human player moves
for move in moves:
self.human_player_history[move] = 0
def move(self, max_move):
max_move = max(self.human_player_history.items(), key=lambda elem: elem[1])[0]
if max_move == 'rock':
return 'paper'
if max_move == 'scissors':
return 'rock'
if max_move == 'paper':
return 'rock'
def beats(move1, move2):
if ((move1 == 'rock' and move2 == 'rock') or
(move1 == 'paper' and move2 == 'paper') or
(move1 == 'scissors' and move2 == 'scissors')):
return "** It's a TIE **"
elif ((move1 == 'rock' and move2 == 'scissors') or
(move1 == 'scissors' and move2 == 'paper') or
(move1 == 'paper' and move2 == 'rock')):
return "** Human WINS **"
else:
return "** Random Player WINS **"
#Create game class
class Game:
def __init__(self, human_player, random_player):
self.player1 = human_player
self.player2 = random_player
self.player1_score = 0
self.player2_score = 0
def play_round(self):
move1 = self.player1.move()
move2 = self.player2.move()
print(f"Player 1: {move1} Player 2: {move2}")
if (move1 == move2):
print("it's a tie!")
elif beats(move1, move2) is True:
self.player1_score += 1
elif beats(move2, move1) is True:
self.player2_score += 1
print(f"Scores, HumanPlayer: {self.player1_score} RandomPlayer: {self.player2_score}")
def play_game(self):
print("Game start!")
for round in range(4):
print(f"Round {round}:")
self.play_round()
print("Game over!")
if __name__ == '__main__':
game = Game(HumanPlayer(), RandomPlayer())
game.play_game()
爆炸,因为NameError
不在范围内:
td
'DEPARTURE HARBOUR':td[0].text,
元素,它们都是<td>
s 我认为您可能最高兴的是模仿API调用,从响应中剥离JS回调文本,并使用结构化数据:
<div>
答案 1 :(得分:0)
如上所述,在处理类似JavaScript的页面时,您可能需要在浏览器中的Dev Tools上监视Network,以查看数据的加载方式。
此代码将生成一个漂亮的字典,供您根据需要解析数据:
import requests
import json
URL = 'https://booking.snav.it/api/v1/dashboard/nextDepartures?callback=jQuery12345&_=12345'
r = requests.get(URL)
s = r.content.decode('utf-8')
data = json.loads(s[16:len(s)-2])