删除名称/字符不允许我

时间:2015-07-25 17:33:58

标签: python csv beautifulsoup

我尝试过做string.replace(“'”,“H”),但这会返回错误:

  

AttributeError:'list'对象没有属性'replace'

我也可以做re.sub但这会产生类似的错误

我可能找到了解决问题的方法:

    25 July 2015
Scottish Football
East Stirling 2 - Stenhousemuir 3
[u" Donaldson 30' ", u" McKenna 77' "]
[u" Stirling 35', 45' ", u" McMenamin 59' "]

我的输出是上面的,如何从外面删除[u“]然后替换'用H代表顶行,A代替第二行?

我正在尝试生成底部的2行,如下所示

 25 July 2015
    Scottish Football
    East Stirling 2 - Stenhousemuir 3
    30H, 77H,
    35A, 45A, 59A,

然后从文本中删除所有名称

import requests
from bs4 import BeautifulSoup
import csv
import re
from collections import OrderedDict

def parse_page(data):
        subsoup = BeautifulSoup(data)
        rs = requests.get("http://www.bbc.co.uk/sport/0/football/33578498")
        ssubsoup = BeautifulSoup(rs.content)
        matchoverview = subsoup.find('div', attrs={'id':'match-overview'})
        print '--------------'
        date = ssubsoup.find('div', attrs={'id':'article-sidebar'}).findNext('span').text
        league = ssubsoup.find('a', attrs={'class':'secondary-nav__link'}).findNext('span').findNext('span').text
        #HomeTeam info printing
        homeTeam = matchoverview.find('div', attrs={'class':'team-match-details'}).findNext('span').findNext('a').text
        homeScore = matchoverview.find('div', attrs={'class':'team-match-details'}).findNext('span').findNext('span').text
        homeGoalScorers = []

        for goals in matchoverview.find('div', attrs={'class':'team-match-details'}).findNext('p').find_all('span'):
            homeGoalScorers.append(goals.text.replace(u'\u2032', "'"))
        homeGoals = homeGoalScorers

        #AwayTeam info printing
        awayTeam = matchoverview.find('div', attrs={'id': 'away-team'}).find('div', attrs={'class':'team-match-details'}).findNext('span').findNext('a').text
        awayScore = matchoverview.find('div', attrs={'id': 'away-team'}).find('div', attrs={'class':'team-match-details'}).findNext('span').findNext('span').text
        awayGoalScorers = []
        for goals in matchoverview.find('div', attrs={'id': 'away-team'}).find('div', attrs={'class':'team-match-details'}).findNext('p').find_all('span'):
            awayGoalScorers.append(goals.text.replace(u'\u2032', "'"))
        awayGoals = awayGoalScorers

        #Printouts
        print date
        print league
        print '{0} {1} - {2} {3}'.format(homeTeam, homeScore, awayTeam, awayScore)
        print homeGoals
        print awayGoals
        if len(homeTeam) >1:
                with open('score.txt', 'a') as f:
                        writer = csv.writer(f)
                        writer.writerow([league,date,homeTeam,awayTeam])

def all_league_results():
    r = requests.get("http://www.bbc.co.uk/sport/football/league-one/results")
    soup = BeautifulSoup(r.content)

    # Save Teams
    for link in soup.find_all("a", attrs={'class': 'report'}):
        fullLink = 'http://www.bbc.com' + link['href']
        subr = requests.get(fullLink)
        parse_page(subr.text)

def specific_game_results(url):
    subr = requests.get(url)
    parse_page(subr.text)

#get specific games results
specific_game_results('http://www.bbc.co.uk/sport/0/football/33578498')

1 个答案:

答案 0 :(得分:0)

我相信你可以在这里更改代码

for goals in matchoverview.find('div', attrs={'class':'team-match-details'}).findNext('p').find_all('span'):
                homeGoalScorers.append(goals.text.replace(u'\u2032', "'") +'H')
homeGoals = ",".join(homeGoalScorers)

删除homeGoals = "H".join(homeGoalScorers)