以前工作的脚本现在无法生成csv文件。为什么?

时间:2017-01-18 01:19:12

标签: python json csv web-scraping web-crawler

标题可能会产生误导:python脚本工作,但无法生成csv文件,因为它以前没有问题

来源:

import requests
import unicodecsv as csv
import json

api_url = 'http://api.indeed.com/ads/apisearch?publisher=8710117352111766&v=2&limit=100000&format=json'
number= 0
SearchTerm = 'McKinsey'
countries = set(['us','ar','au','at','bh','be','br','ca','cl','cn','co','cz','dk','fi','fr','de','gr','hk','hu','in','id','ie','il','it','jp','kr','kw','lu','my','mx','nl','nz','no','om','pk','pe','ph','pl','pt','qa','ro','ru','sa','sg','za','es','se','ch','tw','tr','ae','gb','ve'])


with open( SearchTerm + '.csv' , 'a' ) as csvfile:
    fieldnames = ['city','company','country','date','expired','formattedLocation','formattedLocationFull','formattedRelativeTime','indeedApply','jobkey','jobtitle','latitude','longitude','onmousedown','snippet','source','sponsored','state','url']
    writer = csv.DictWriter(csvfile, fieldnames = fieldnames, lineterminator = '\n')
    writer.writeheader()

    for SCountry in countries:

        Country = SCountry #this is the variable assigned to the country

        urlfirst = api_url + '&co=' + Country + '&q=' + SearchTerm

        grabforNum = requests.get(urlfirst)
        json_content = json.loads(grabforNum.content)
        print(json_content["totalResults"])

        numresults = (json_content["totalResults"])
        # must match the actual number of job results to the lower of the 25 increment or the last page will repeat over and over 

        for number in range(0, numresults, 25): 
            url = api_url + '&co=' + Country + '&q=' + SearchTerm + '&latlong=1' + '&start=' + str(number)
            response = requests.get(url)
            grabforclean = json.loads(response.content)
            clean_json = (grabforclean['results'])
            print 'Complete '+ url

            for job in clean_json:
                writer.writerow(job)

这是脚本的原始所有者。我在3天前使用它,直到我不得不重新安装我的操作系统。现在由于某种原因,它无法将收集的所有内容存储到CSV文件中。 API密钥有效,没有错误消息。 <{1}} requestsunicodecsv都已安装。

像这样的东西真的让我起了作用,你如何诊断以前有效的东西?我有多个版本的脚本搜索不同的关键字,所以我知道我的修改不应该受到责备,但也许脚本之外的东西都会被破坏。

1 个答案:

答案 0 :(得分:0)

该网站最近可能开始返回一个新领域,因此您有两个选择:

  1. stations添加到您的fieldnames
  2. 列表中
  3. extrasaction='ignore'添加到您的csv.Dictwriter参数中,以保留所有现有字段,并忽略所添加的任何新字段。
  4. 这两种解决方案都可以让您的脚本再次运行。