使用Python和Beautiful Soup在我自己的网站上将表格数据从网站插入表格

时间:2016-07-28 18:34:55

标签: python html web-scraping beautifulsoup html-parsing

我写了一些代码,从this website抓取我需要的数字,但我不知道接下来该做什么。

它抓住底部表格中的数字。产犊缓解,出生体重,断奶体重,一岁体重,乳汁和总产妇。

#!/usr/bin/python
import urllib2
from bs4 import BeautifulSoup
import pyperclip

def getPageData(url):

    if not ('abri.une.edu.au' in url):
        return -1
    webpage = urllib2.urlopen(url).read()
    soup = BeautifulSoup(webpage, "html.parser")

    # This finds the epd tree and saves it as a searchable list
    pedTreeTable = soup.find('table', {'class':'TablesEBVBox'})

    # This puts all of the epds into a list.
    # it looks for anything in pedTreeTable with an td tag.
    pageData = pedTreeTable.findAll('td')
    pageData.pop(7)
    return pageData

def createPedigree(animalPageData):
    ''' make animalPageData much more useful. Strip the text out and put it in a dict.''' 
    animals = []
    for animal in animalPageData:
        animals.append(animal.text)

    prettyPedigree = {
    'calving_ease' : animals[18],
    'birth_weight' : animals[19],
    'wean_weight' : animals[20],
    'year_weight' : animals[21],
    'milk' : animals[22],
    'total_mat' : animals[23]
    }    

    for animalKey in prettyPedigree:
        if animalKey != 'year_weight' and animalKey != 'dam':
            prettyPedigree[animalKey] = stripRegNumber(prettyPedigree[animalKey])
    return prettyPedigree

def stripRegNumber(animal):
    '''returns the animal with its registration number stripped'''
    lAnimal = animal.split()
    strippedAnimal = ""
    for word in lAnimal:
        if not word.isdigit():
            strippedAnimal += word + " "
    return strippedAnimal

def prettify(pedigree):
    ''' Takes the pedigree and prints it out in a usable format  '''
    s = ''

    pedString = ""

    # this is also ugly, but it was the only way I found to format with a variable
    cFormat = '{{:^{}}}'
    rFormat = '{{:>{}}}'

    #row 1 of string
    s += rFormat.format(len(pedigree['calving_ease'])).format(
                            pedigree['calving_ease']) + '\n'

    #row 2 of string
    s += rFormat.format(len(pedigree['birth_weight'])).format(
                            pedigree['birth_weight']) + '\n'

    #row 3 of string
    s += rFormat.format(len(pedigree['wean_weight'])).format(
                            pedigree['wean_weight']) + '\n'

    #row 4 of string
    s += rFormat.format(len(pedigree['year_weight'])).format(
                            pedigree['year_weight']) + '\n'

    #row 4 of string
    s += rFormat.format(len(pedigree['milk'])).format(
                            pedigree['milk']) + '\n'

    #row 5 of string
    s += rFormat.format(len(pedigree['total_mat'])).format(
                            pedigree['total_mat']) + '\n'


    return s

if __name__ == '__main__':
    while True:
        url = raw_input('Input a url you want to use to make life easier: \n')
        pageData = getPageData(url)
        s = prettify(createPedigree(pageData))
        pyperclip.copy(s)
        if len(s) > 0:
            print 'the easy string has been copied to your clipboard'

我刚刚使用此代码轻松复制和粘贴。我所要做的就是插入URL,然后将数字保存到剪贴板中。

现在我想在我的网站上使用此代码;我希望能够在我的HTML代码中插入一个URL,并在表格中将这些数字显示在我的页面上。

我的问题如下:

  1. 如何在网站上使用python代码?
  2. 如何将收集的数据插入带有HTML的表格?

1 个答案:

答案 0 :(得分:0)

听起来你会想要使用像Django这样的东西。虽然学习曲线有点陡峭,但值得它它(当然)支持python。