从Beautifulsoup输出中删除文本行

时间:2017-03-13 00:01:00

标签: python beautifulsoup jupyter-notebook bs4

我正试图抓住这个网站CS:GO结果:http://www.hltv.org/?pageid=188&statsfilter=2816&offset=0

这是我的代码:

import requests, re
from bs4 import BeautifulSoup

r=requests.get("http://www.hltv.org/?pageid=188&statsfilter=2816&offset=0")
c=r.content

soup=BeautifulSoup(c,"html.parser")

for string in soup.find_all("div",{"class":"covSmallHeadline"}):
    print(string.text.replace("(","").replace(")",""))

,输出为:

CS:GO - ScreaM's Team History Shown In A Different Way
Date
Team1
Team2
Map
Event
5/3 17
 Astralis 16
 FaZe 13
inferno
IEM Katowice 2017
5/3 17
 Astralis 16
 FaZe 12
nuke
IEM Katowice 2017
5/3 17
 Astralis 16
 FaZe 12
overpass
IEM Katowice 2017
5/3 17
 FaZe 16
 Astralis 9
cache
IEM Katowice 2017
4/3 17
 Astralis 16
 Heroic 12
nuke
IEM Katowice 2017
4/3 17
 Astralis 16
 Heroic 12
train
IEM Katowice 2017
4/3 17
 Immortals 10
 FaZe 16
mirage
IEM Katowice 2017
4/3 17
 FaZe 16
 Immortals 9
inferno
IEM Katowice 2017
3/3 17
 Natus Vincere 2
 Astralis 16
nuke
IEM Katowice 2017
3/3 17
 Natus Vincere 11
 Astralis 16
mirage
IEM Katowice 2017
3/3 17
 Immortals 16
 North 6
cbble
IEM Katowice 2017
3/3 17
 North 19
 Immortals 15
overpass
IEM Katowice 2017
3/3 17
 Immortals 16
 North 14
cache
IEM Katowice 2017
2/3 17
 Virtus.pro 14
 Heroic 16
nuke
IEM Katowice 2017
2/3 17
 Cloud9 6
 Natus Vincere 16
mirage
IEM Katowice 2017
2/3 17
 SK 16
 North 8
cbble
IEM Katowice 2017
2/3 17
 Cloud9 12
 North 16
cbble
IEM Katowice 2017
2/3 17
 Natus Vincere 12
 Heroic 16
overpass
IEM Katowice 2017
2/3 17
 Virtus.pro 16
 SK 14
inferno
IEM Katowice 2017
2/3 17
 North 16
 Natus Vincere 12
cbble
IEM Katowice 2017
2/3 17
 Virtus.pro 16
 Cloud9 4
mirage
IEM Katowice 2017
2/3 17
 SK 16
 Heroic 5
mirage
IEM Katowice 2017
2/3 17
 North 16
 Virtus.pro 13
cbble
IEM Katowice 2017
2/3 17
 Cloud9 16
 Heroic 7
cbble
IEM Katowice 2017
2/3 17
 North 17
 Heroic 19
nuke
IEM Katowice 2017
2/3 17
 Natus Vincere 16
 SK 12
overpass
IEM Katowice 2017
2/3 17
 Cloud9 16
 SK 9
nuke
IEM Katowice 2017
2/3 17
 Virtus.pro 9
 Natus Vincere 16
train
IEM Katowice 2017
1/3 17
 Astralis 13
 Immortals 11
train
IEM Katowice 2017
1/3 17
 Astralis 2
 FaZe 4
train
IEM Katowice 2017
1/3 17
 FaZe 10
 Immortals 7
train
IEM Katowice 2017
1/3 17
 fnatic 14
 Immortals 16
mirage
IEM Katowice 2017
1/3 17
 NiP 16
 OpTic 13
inferno
IEM Katowice 2017
1/3 17
 Astralis 16
 FaZe 8
train
IEM Katowice 2017
1/3 17
 OpTic 12
 FaZe 16
train
IEM Katowice 2017
1/3 17
 Immortals 16
 NiP 14
cbble
IEM Katowice 2017
1/3 17
 fnatic 17
 Astralis 19
inferno
IEM Katowice 2017
1/3 17
 NiP 6
 FaZe 16
cache
IEM Katowice 2017
1/3 17
 Immortals 16
 Astralis 12
cache
IEM Katowice 2017
1/3 17
 fnatic 16
 OpTic 10
train
IEM Katowice 2017
1/3 17
 OpTic 8
 Immortals 16
inferno
IEM Katowice 2017
1/3 17
 fnatic 11
 FaZe 16
train
IEM Katowice 2017
1/3 17
 NiP 24
 Astralis 28
overpass
IEM Katowice 2017
1/3 17
 Immortals 6
 FaZe 16
overpass
IEM Katowice 2017
1/3 17
 OpTic 12
 Astralis 16
train
IEM Katowice 2017
1/3 17
 NiP 14
 fnatic 16
cache
IEM Katowice 2017
20/2 17
 SK 13
 Virtus.pro 16
mirage
DreamHack Masters Las Vegas 2017
20/2 17
 Virtus.pro 16
 SK 11
train
DreamHack Masters Las Vegas 2017
20/2 17
 Virtus.pro 8
 SK 16
cbble
DreamHack Masters Las Vegas 2017
19/2 17
 SK 16
 North 9
mirage
DreamHack Masters Las Vegas 2017

我如何摆脱“ CS:GO - ScreaM以不同方式展示的团队历史” - 出现在输出第一行的文字?我的目标是将结果发送到pandas数据框,并且该文本行会让我感到不安。

1 个答案:

答案 0 :(得分:0)

for string in soup.find_all("div",{"class":"covSmallHeadline"})[1:]:
    print(string.text.replace("(","").replace(")",""))

使用slice摆脱第一行