Python:将数据刮到Excel电子表格时没有回溯

时间:2014-11-18 09:25:32

标签: python excel urllib xlsxwriter

我是一名没有经验的编程人员在python工作。我编写了一个脚本来自动化一个过程,在这个过程中,某些信息会从网页中被删除然后被复制,然后将其粘贴到新的Excel电子表格中。我编写并执行了代码,但我指定接收数据的excel电子表格完全是空的。最糟糕的是,没有回溯错误。你能帮我在我的代码中找到问题吗?如果没有提供追溯错误,您通常如何解决自己的问题?

import xlsxwriter, urllib.request, string


def main():

    #gets the URL for the expert page
open_sesame = urllib.request.urlopen('https://aries.case.com.pl/main_odczyt.php?strona=eksperci')
    #reads the expert page
readpage = open_sesame.read()
    #opens up a new file in excel
workbook = xlsxwriter.Workbook('expert_book.xlsx')
    #adds worksheet to file
worksheet = workbook.add_worksheet()

    #initializing the variable used to move names and dates
    #in the excel spreadsheet
boxcoA = ""
boxcoB = ""
    #initializing expert attribute variables and lists
expert_name = ""
url_ticker = 0
name_ticker = 0
raw_list = []
url_list = []
name_list= []
date_list= []
    #this loop goes through and finds all the lines
    #that contain the expert URL and name and saves them to raw_list::
    #raw_list loop
for i in readpage:
    i = str(i)
    if i.startswith('<tr><td align=left><a href='):
        raw_list += i

    #this loop goes through the lines in raw list and extracts
    #the name of the expert, saving it to a list::
    #name_list loop
for n in raw_list:
    name_snip = n.split('target=_blank>','</a></td><')[1]
    name_list += name_snip
    #this loop fills a list with the dates the profiles were last updated::
    #date_list
for p in raw_list:
        url_snipoff = p[28:]
        url_snip = url_snipoff.split('"')[0]
        url_list += url_snip
        expert_url = 'https://aries.case.com.pl/'+url_list[url_ticker]
        open_expert = urllib2.openurl(expert_url)
        read_expert = open_expert.read()
        for i in read_expert:
            if i.startswith('<p align=left><small>Last update:'):
                update = i.split('Last update:','</small>')[1]
        open_expert.close()
        date_list += update

    #now that we have a list of expert names and a list of profile update dates
    #we can work on populating the excel spreadsheet


    #this operation will iterate just as long as the list is long
    #meaning that it will populate the excel spreadsheet
    #with all of our names and dates that we wanted
for z in raw_list:
    boxcoA = string('A',z)
    boxcoB = string('B',z)
    worksheet.write(boxcoA, name_list[z])
    worksheet.write(boxcoB, date_list[z])
workbook.close()
print('Operation Complete')


main()

2 个答案:

答案 0 :(得分:1)

缺少回溯只意味着您的代码不会引发异常。这并不意味着您的代码在逻辑上是正确的。

我会通过添加print语句或使用pdbpudb之类的调试器来查找逻辑错误。

我注意到你的代码的一个问题是第一个循环似乎假设i是一行,而它实际上是一个字符。您可能会发现splitlines()更有用

答案 1 :(得分:0)

如果没有回溯,则没有错误。

您的抓取/解析代码很可能出现问题,并且未填充raw_list或其他数组。

尝试打印出应该写入最后一个循环中的工作表的数据,以查看是否有任何要写入的数据。

如果您没有将数据写入工作表,那么它将为空。