使用字符串制作pandas数据帧

时间:2017-03-13 13:06:50

标签: python pandas dataframe beautifulsoup

大家好,

我尝试将数据框列设为“日期”,并将日期时间添加到列行。日期时间每5次出现在字符串列表中。

我认为像范围(开始,结束,步骤)这样的方法会很好但是在实践中会怎样做?

这是我的代码:

import requests, re, pandas
from bs4 import BeautifulSoup

r=requests.get("http://www.hltv.org/?pageid=188&statsfilter=2816&offset=0")
c=r.content

soup=BeautifulSoup(c,"html.parser")


for string in soup.find_all("div",{"class":"covSmallHeadline"})[6:]:
    print(string.text.replace("(","").replace(")",""))

这是输出(实际列表大小更大):

5/3 17
 Astralis 16
 FaZe 13
inferno
IEM Katowice 2017
5/3 17
 Astralis 16
 FaZe 12
nuke
IEM Katowice 2017
5/3 17
 Astralis 16
 FaZe 12
overpass
IEM Katowice 2017
5/3 17
 FaZe 16
 Astralis 9
cache
IEM Katowice 2017
4/3 17
 Astralis 16
 Heroic 12
nuke
IEM Katowice 2017
4/3 17
 Astralis 16
 Heroic 12
train
IEM Katowice 2017
4/3 17
 Immortals 10
 FaZe 16
mirage
IEM Katowice 2017
4/3 17
 FaZe 16
 Immortals 9
inferno
IEM Katowice 2017
3/3 17
 Natus Vincere 2
 Astralis 16
nuke
IEM Katowice 2017

1 个答案:

答案 0 :(得分:1)

首先将数据转换为CSV:

import re
In [83]: for row in table.find_all('div', style=re.compile(r'width:606px;height:22px;background-color')):
   ...:      print(row.get_text(strip=True, separator=','))


5/3 17,Astralis (16),FaZe (13),inferno,IEM Katowice 2017
5/3 17,Astralis (16),FaZe (12),nuke,IEM Katowice 2017
5/3 17,Astralis (16),FaZe (12),overpass,IEM Katowice 2017