Python将csv列拆分为行

时间:2017-08-29 01:36:43

标签: python selenium selenium-chromedriver

我正在使用带有Selenium

的Python 3.6 Chrome web driver

尝试从nhl stats中提取数据。我可以获取数据,但我希望将其格式化为镜像excel中的网站表。我可以抓取数据,但只能收集整个数据的一列。

以下是我的代码 -

#Load Lib
import csv
from selenium import webdriver

#Browser load and player stats
driver = webdriver.Chrome(executable_path=r"ENTER PATH")
driver.get("http://www.nhl.com/stats/player?aggregate=0&gameType=2&report=skatersummary&pos=S&reportType=season&seasonFrom=20162017&seasonTo=20162017&filter=gamesPlayed,gte,1&sort=points,goals,assists")
PlayerStats = driver.find_elements_by_class_name("rt-tr-group")
for post in PlayerStats:
        print(post.text)

driver.close()

输出
1
Connor McDavid
2016-17
EDM
ç
82个
30个
70个
100个
27个
26个
1.22
3
27个
1
2
6
1
251个
11.9
21:07
24.37
43.22
西德尼克罗斯比
2016-17
PIT
ç
75个
44个
45个
89个
17个
24个
1.19
14个
25个
0
0
5
1
255个
17.3
19:52
24.69
48.23

3 个答案:

答案 0 :(得分:0)

已删除数据中有换行符,您可以使用其他内容替换换行符,例如翼片

for post in PlayerStats:
    print(post.text.replace('\n', '\t'))

答案 1 :(得分:0)

你只需要换行符。

print(post.text.split('\n')) # this is only print. How to split and save a list of lists, I will leave it as an exercise for you.

输出:

['1', 'Connor McDavid', '2016-17', 'EDM', 'C', '82', '30', '70', '100', '27', '26', '1.22', '3', '27', '1', '2', '6', '1', '251', '11.9', '21:07', '24.37', '43.2']
['2', 'Sidney Crosby', '2016-17', 'PIT', 'C', '75', '44', '45', '89', '17', '24', '1.19', '14', '25', '0', '0', '5', '1', '255', '17.3', '19:52', '24.69', '48.2']

要将列表列表转换为Excel,您可以使用pandas库。

df = pandas.DataFrame(PlayerStats) # after you save the list of lists
df = df.T # Transpose. rows become columns.
df = df.T # Transpose. change it once again from column to rows
# I know the above is like a hack. Would appreciate if someone came up with
# a neater solution.
# To add column names:
df.columns = ['Heading1', 'Heading2'] # -> get the headings from the site

# To save as excel
df.to_excel("filename.xlsx") # -> has arguments, please check Pandas documentation

更整洁的单线转换:

df = pd.DataFrame(PlayerStats).T.T

输出:

0               1        2    3  4   5   6   7    8   9   ...   13 14 15 16  \
0  1  Connor McDavid  2016-17  EDM  C  82  30  70  100  27  ...   27  1  2  6   
1  2   Sidney Crosby  2016-17  PIT  C  75  44  45   89  17  ...   25  0  0  5   

  17   18    19     20     21    22  
0  1  251  11.9  21:07  24.37  43.2  
1  1  255  17.3  19:52  24.69  48.2  

[2 rows x 23 columns]

答案 2 :(得分:0)

你不能直接这样做。您应该将有关播放器的数据放在数组中。在和你有类似的东西

[Player1 Data, Player2 Data, ...]

其中Player1数据是列表播放器数据。在此之后,您可能需要Matrix Transpose in Python

如何将Player数据放入数组中。实施例。

players_data = []
for post in PlayerStats:
    player_data = []
    for i in range(23):  # where 23 is column count
        player_data.append(post.text)
        next(post)  # iter item
    players_data.append(player_data)
players_data = list(zip(*players_data))  # Here you gote "Python split csv column into rows"

print(players_data [0])#输出球员姓名