我正在使用带有Selenium
Chrome web driver
尝试从nhl stats中提取数据。我可以获取数据,但我希望将其格式化为镜像excel中的网站表。我可以抓取数据,但只能收集整个数据的一列。
以下是我的代码 -
#Load Lib
import csv
from selenium import webdriver
#Browser load and player stats
driver = webdriver.Chrome(executable_path=r"ENTER PATH")
driver.get("http://www.nhl.com/stats/player?aggregate=0&gameType=2&report=skatersummary&pos=S&reportType=season&seasonFrom=20162017&seasonTo=20162017&filter=gamesPlayed,gte,1&sort=points,goals,assists")
PlayerStats = driver.find_elements_by_class_name("rt-tr-group")
for post in PlayerStats:
print(post.text)
driver.close()
输出
1
Connor McDavid
2016-17
EDM
ç
82个
30个
70个
100个
27个
26个
1.22
3
27个
1
2
6
1
251个
11.9
21:07
24.37
43.22
西德尼克罗斯比
2016-17
PIT
ç
75个
44个
45个
89个
17个
24个
1.19
14个
25个
0
0
5
1
255个
17.3
19:52
24.69
48.23
答案 0 :(得分:0)
已删除数据中有换行符,您可以使用其他内容替换换行符,例如翼片
for post in PlayerStats:
print(post.text.replace('\n', '\t'))
答案 1 :(得分:0)
你只需要换行符。
print(post.text.split('\n')) # this is only print. How to split and save a list of lists, I will leave it as an exercise for you.
输出:
['1', 'Connor McDavid', '2016-17', 'EDM', 'C', '82', '30', '70', '100', '27', '26', '1.22', '3', '27', '1', '2', '6', '1', '251', '11.9', '21:07', '24.37', '43.2']
['2', 'Sidney Crosby', '2016-17', 'PIT', 'C', '75', '44', '45', '89', '17', '24', '1.19', '14', '25', '0', '0', '5', '1', '255', '17.3', '19:52', '24.69', '48.2']
要将列表列表转换为Excel,您可以使用pandas
库。
df = pandas.DataFrame(PlayerStats) # after you save the list of lists
df = df.T # Transpose. rows become columns.
df = df.T # Transpose. change it once again from column to rows
# I know the above is like a hack. Would appreciate if someone came up with
# a neater solution.
# To add column names:
df.columns = ['Heading1', 'Heading2'] # -> get the headings from the site
# To save as excel
df.to_excel("filename.xlsx") # -> has arguments, please check Pandas documentation
更整洁的单线转换:
df = pd.DataFrame(PlayerStats).T.T
输出:
0 1 2 3 4 5 6 7 8 9 ... 13 14 15 16 \
0 1 Connor McDavid 2016-17 EDM C 82 30 70 100 27 ... 27 1 2 6
1 2 Sidney Crosby 2016-17 PIT C 75 44 45 89 17 ... 25 0 0 5
17 18 19 20 21 22
0 1 251 11.9 21:07 24.37 43.2
1 1 255 17.3 19:52 24.69 48.2
[2 rows x 23 columns]
答案 2 :(得分:0)
你不能直接这样做。您应该将有关播放器的数据放在数组中。在和你有类似的东西
[Player1 Data, Player2 Data, ...]
其中Player1数据是列表播放器数据。在此之后,您可能需要Matrix Transpose in Python。
如何将Player数据放入数组中。实施例。
players_data = []
for post in PlayerStats:
player_data = []
for i in range(23): # where 23 is column count
player_data.append(post.text)
next(post) # iter item
players_data.append(player_data)
players_data = list(zip(*players_data)) # Here you gote "Python split csv column into rows"
print(players_data [0])#输出球员姓名