一遍又一遍地反复反复

时间:2020-04-16 08:04:25

标签: python for-loop beautifulsoup

我一直在重复这段代码。我很想从该网站抓取过去的所有结果数据,但是我一直在一个接一个地循环浏览?

例如race_number打印为1,1,2,1,2,3等

最终目标是用数据填充所有列表,然后将其列出来查看结果和趋势。

import requests
import csv
import os
import numpy
import pandas

from bs4 import BeautifulSoup as bs

with requests.Session() as s:
    webpage_response = s.get('http://www.harness.org.au/racing/fields/race-fields/?mc=SW010420')
    soup = bs(webpage_response.content, "html.parser")
    #soup1 = soup.select('.content')
    results = soup.find_all('div', {'class':'forPrint'})
    race_number = []
    race_name = []
    race_title = []
    race_distance = []
    place = []
    horse_name = []
    Prizemoney = []
    Row = []
    horse_number = []
    Trainer = []
    Driver = []
    Margin = []
    Starting_odds = []
    Stewards_comments = []
    Scratching = []
    Track_Rating = []
    Gross_Time = []
    Mile_Rate = []
    Lead_Time = []
    First_Quarter = []
    Second_Quarter = []
    Third_Quarter = []
    Fourth_Quarter = []

    for race in results:
        race_number1 = race.find(class_='raceNumber').get_text()
        race_number.append(race_number1)
        race_name1 = race.find(class_='raceTitle').get_text()
        race_name.append(race_name1)
        race_title1 = race.find(class_='raceInformation').get_text(strip=True)
        race_title.append(race_title1)
        race_distance1 = race.find(class_='distance').get_text()
        race_distance.append(race_distance1)

需要一遍又一遍地帮助解决迭代问题,查看表数据而不是上面的标头的下一个最佳方法是什么? 干杯

1 个答案:

答案 0 :(得分:1)

这是您期望的输出吗?

import requests
import csv
import os
import numpy
import pandas as pd
import html

from bs4 import BeautifulSoup as bs

with requests.Session() as s:
    webpage_response = s.get('http://www.harness.org.au/racing/fields/race-fields/?mc=SW010420')
    soup = bs(webpage_response.content, "html.parser")
    #soup1 = soup.select('.content')
    data = {}
    data["raceNumber"] = [ i['rowspan'] for i in soup.find_all("td", {"class": "raceNumber", "rowspan": True})]
    data["raceTitle"] = [ i.get_text(strip=True) for i in soup.find_all("td", {"class": "raceTitle"})]
    data["raceInformation"] = [ i.get_text(strip=True) for i in soup.find_all("td", {"class": "raceInformation"})]
    data["distance"] = [ i.get_text(strip=True) for i in soup.find_all("td", {"class": "distance"})]
    print(data)

    data_frame = pd.DataFrame(data)
    print(data_frame)
##    Output
##      raceNumber                             raceTitle                                    raceInformation distance
##0          3                      PREMIX KING PACE  $4,500\n\t\t\t\t\t4YO and older.\n\t\t\t\t\tNR...    1785M
##1          3                 GATEWAY SECURITY PACE  $7,000\n\t\t\t\t\t4YO and older.\n\t\t\t\t\tNR...    2180M
##2          3                 PERRY'S FOOTWEAR TROT  $7,000\n\t\t\t\t\t\n\t\t\t\t\tNR 46 to 55.\n\t...    2180M
##3          3           DELAHUNTY PLUMBING 3YO TROT  $7,000\n\t\t\t\t\t3YO.\n\t\t\t\t\tNR 46 to 52....    2180M
##4          3  RAYNER'S FRUIT & VEGETABLES 3YO PACE  $7,000\n\t\t\t\t\t3YO.\n\t\t\t\t\tNR 48 to 56....    2180M
##5          3                 KAYE MATTHEWS TRIBUTE  $9,000\n\t\t\t\t\t4YO and older.\n\t\t\t\t\tNR...    2180M
##6          3                   TALQUIST TREES PACE  $7,000\n\t\t\t\t\t\n\t\t\t\t\tNR 62 to 73.\n\t...    2180M
##7          3            WEEKLY ADVERTISER 3WM PACE  $7,000\n\t\t\t\t\t\n\t\t\t\t\tNR 56 to 61.\n\t...    1785M
相关问题