如何改进此脚本以使其更加pythonic?

时间:2016-02-29 00:41:46

标签: python python-2.7 csv hubspot

我对Python编程很陌生,到目前为止已经是以前开发人员制作的逆向工程代码,或者我自己拼凑了一些函数。

脚本本身有效;简而言之,它旨在解析CSV并(a)创建和/或更新CSV中的联系人,以及(b)正确地将联系人分配给他们的关联公司。全部使用HubSpot API。为实现此目的,我还导入了requestscsvmapper

我有以下问题:

  1. 如何改进此脚本以使其更加pythonic?
  2. 使此脚本在远程服务器上运行的最佳方法是什么, 请记住,Requests和CSVMapper可能不是 安装在该服务器上,我很可能不会 安装它们的权限 - “打包”这个的最佳方法是什么 脚本,或将Requests和CSVMapper上传到服务器?
  3. 任何建议都非常感谢。

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    from __future__ import print_function
    import sys, os.path, requests, json, csv, csvmapper, glob, shutil
    from time import sleep
    major, minor, micro, release_level, serial =  sys.version_info
    
    # Client Portal ID
    portal = "XXXXXX"
    
    # Client API Key
    
    hapikey = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
    
    # This attempts to find any file in the directory that starts with "note" and ends with ".CSV"
    # Server Version
    # findCSV = glob.glob('/home/accountName/public_html/clientFolder/contact*.CSV')
    
    # Local Testing Version
    findCSV = glob.glob('contact*.CSV')
    
    for i in findCSV:
    
        theCSV = i
    
        csvfileexists = os.path.isfile(theCSV)
    
        # Prints a confirmation if file exists, prints instructions if it doesn't.
    
        if csvfileexists:
            print ("\nThe \"{csvPath}\" file was found ({csvSize} bytes); proceeding with sync ...\n".format(csvSize=os.path.getsize(theCSV), csvPath=os.path.basename(theCSV)))
        else:
            print ("File not found; check the file name to make sure it is in the same directory as this script. Exiting ...")
            sys.exit()
    
        # Begin the CSVmapper mapping... This creates a virtual "header" row - the CSV therefore does not need a header row.
    
        mapper = csvmapper.DictMapper([
          [
            {'name':'account'}, #"Org. Code"
            {'name':'id'}, #"Hubspot Ref"
            {'name':'company'}, #"Company Name"
            {'name':'firstname'}, #"Contact First Name"
            {'name':'lastname'}, #"Contact Last Name"
            {'name':'job_title'}, #"Job Title"
            {'name':'address'}, #"Address"
            {'name':'city'}, #"City"
            {'name':'phone'}, #"Phone"
            {'name':'email'}, #"Email"
            {'name':'date_added'} #"Last Update"
          ]
        ])
    
        # Parse the CSV using the mapper
        parser = csvmapper.CSVParser(os.path.basename(theCSV), mapper)
    
        # Build the parsed object
        obj = parser.buildObject()
    
        def contactCompanyUpdate():
    
            # Open the CSV, use commas as delimiters, store it in a list called "data", then find the length of that list.
            with open(os.path.basename(theCSV),"r") as f:
                reader = csv.reader(f, delimiter = ",", quotechar="\"")
                data = list(reader)
    
                # For every row in the CSV ...
                for row in range(0, len(data)):
                    # Set up the JSON payload ...
    
                    payload = {
                                "properties": [
                                    {
                                        "name": "account",
                                        "value": obj[row].account
                                    },
                                    {
                                        "name": "id",
                                        "value": obj[row].id
                                    },
                                    {
                                        "name": "company",
                                        "value": obj[row].company
                                    },
                                    {
                                        "property": "firstname",
                                        "value": obj[row].firstname
                                    },
                                    {
                                        "property": "lastname",
                                        "value": obj[row].lastname
                                    },
                                    {
                                        "property": "job_title",
                                        "value": obj[row].job_title
                                    },
                                    {
                                        "property": "address",
                                        "value": obj[row].address
                                    },
                                    {
                                        "property": "city",
                                        "value": obj[row].city
                                    },
                                    {
                                        "property": "phone",
                                        "value": obj[row].phone
                                    },
                                    {
                                        "property": "email",
                                        "value": obj[row].email
                                    },
                                    {
                                        "property": "date_added",
                                        "value": obj[row].date_added
                                    }
                                ]
                            }
    
                    nameQuery = "{first} {last}".format(first=obj[row].firstname, last=obj[row].lastname)
    
                    # Get a list of all contacts for a certain company.
                    contactCheck = "https://api.hubapi.com/contacts/v1/search/query?q={query}&hapikey={hapikey}".format(hapikey=hapikey, query=nameQuery)
    
                    # Convert the payload to JSON and assign it to a variable called "data"
                    data = json.dumps(payload)
    
                    # Defined the headers content-type as 'application/json'
                    headers = {'content-type': 'application/json'}
    
                    contactExistCheck = requests.get(contactCheck, headers=headers)
    
                    for i in contactExistCheck.json()[u'contacts']:
    
                        # ... Get the canonical VIDs
                        canonicalVid = i[u'canonical-vid']
    
                        if canonicalVid:
                            print ("{theContact} exists! Their VID is \"{vid}\"".format(theContact=obj[row].firstname, vid=canonicalVid))
                            print ("Attempting to update their company...")
                            contactCompanyUpdate = "https://api.hubapi.com/companies/v2/companies/{companyID}/contacts/{vid}?hapikey={hapikey}".format(hapikey=hapikey, vid=canonicalVid, companyID=obj[row].id)
                            doTheUpdate = requests.put(contactCompanyUpdate, headers=headers)
                            if doTheUpdate.status_code == 200:
                                print ("Attempt Successful! {theContact}'s has an updated company.\n".format(theContact=obj[row].firstname))
                                break
                            else:
                                print ("Attempt Failed. Status Code: {status}. Company or Contact not found.\n".format(status=doTheUpdate.status_code))
    
        def createOrUpdateClient():
    
            # Open the CSV, use commas as delimiters, store it in a list called "data", then find the length of that list.
            with open(os.path.basename(theCSV),"r") as f:
                reader = csv.reader(f, delimiter = ",", quotechar="\"")
                data = list(reader)
    
                # For every row in the CSV ...
                for row in range(0, len(data)):
                    # Set up the JSON payload ...
    
                    payloadTest = {
                                "properties": [
                                    {
                                        "property": "email",
                                        "value": obj[row].email
                                    },
                                    {
                                        "property": "firstname",
                                        "value": obj[row].firstname
                                    },
                                    {
                                        "property": "lastname",
                                        "value": obj[row].lastname
                                    },
                                    {
                                        "property": "website",
                                        "value": None
                                    },
                                    {
                                        "property": "company",
                                        "value": obj[row].company
                                    },
                                    {
                                        "property": "phone",
                                        "value": obj[row].phone
                                    },
                                    {
                                        "property": "address",
                                        "value": obj[row].address
                                    },
                                    {
                                        "property": "city",
                                        "value": obj[row].city
                                    },
                                    {
                                        "property": "state",
                                        "value": None
                                    },
                                    {
                                        "property": "zip",
                                        "value": None
                                    }
                                ]
                            }
    
                    # Convert the payload to JSON and assign it to a variable called "data"
                    dataTest = json.dumps(payloadTest)
    
                    # Defined the headers content-type as 'application/json'
                    headers = {'content-type': 'application/json'}
    
                    #print ("{theContact} does not exist!".format(theContact=obj[row].firstname))
                    print ("Attempting to add {theContact} as a contact...".format(theContact=obj[row].firstname))
                    createOrUpdateURL = 'http://api.hubapi.com/contacts/v1/contact/createOrUpdate/email/{email}/?hapikey={hapikey}'.format(email=obj[row].email,hapikey=hapikey)
    
                    r = requests.post(createOrUpdateURL, data=dataTest, headers=headers)
    
                    if r.status_code == 409:
                        print ("This contact already exists.\n")
                    elif (r.status_code == 200) or (r.status_code == 202):
                        print ("Success! {firstName} {lastName} has been added.\n".format(firstName=obj[row].firstname,lastName=obj[row].lastname, response=r.status_code))
                    elif r.status_code == 204:
                        print ("Success! {firstName} {lastName} has been updated.\n".format(firstName=obj[row].firstname,lastName=obj[row].lastname, response=r.status_code))
                    elif r.status_code == 400:
                        print ("Bad request. You might get this response if you pass an invalid email address, if a property in your request doesn't exist, or if you pass an invalid property value.\n")
                    else:
                        print ("Contact Marko for assistance.\n")
    
        if __name__ == "__main__":
            # Run the Create or Update function
            createOrUpdateClient()
    
            # Give the previous function 5 seconds to take effect.
            sleep(5.0)
    
            # Run the Company Update function
            contactCompanyUpdate()
            print("Sync complete.")
    
            print("Moving \"{something}\" to the archive folder...".format(something=theCSV))
    
            # Cron version
            #shutil.move( i, "/home/accountName/public_html/clientFolder/archive/" + os.path.basename(i))
    
            # Local version
            movePath = "archive/{thefile}".format(thefile=theCSV)
            shutil.move( i, movePath )
    
            print("Move successful! Exiting...\n")
    
    sys.exit()
    

1 个答案:

答案 0 :(得分:4)

我只是从上到下。第一条规则是,在PEP 8.中做什么不是最终的风格指南,但它确实是Python编码器的参考基线,而且这一点更为重要,尤其是当你开始使用时。第二条规则是,使其可维护。从现在起几年后,当其他一些新孩子出现时,她应该很容易弄明白你在做什么。有时这意味着要做很多事情,以减少错误。有时它意味着做很短的事情,以减少错误。 : - )

#!/usr/bin/env python
# -*- coding: utf-8 -*-

两件事:根据PEP 8,您获得了正确的编码。

  

编写良好文档字符串的惯例(a.k.a。" docstrings")在PEP 257.

中永生化

你有一个做某事的程序。但是你没有记录什么。

from __future__ import print_function
import sys, os.path, requests, json, csv, csvmapper, glob, shutil
from time import sleep
major, minor, micro, release_level, serial =  sys.version_info

根据PEP 8:每行放一个import module语句。

Per Austin:让你​​的段落有不同的主题。您在某些版本信息旁边有一些导入。插入一个空行。此外,DO SOMETHING与数据!或者你并不需要它就在这里,是吗?

# Client Portal ID
portal = "XXXXXX"

# Client API Key

hapikey = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"

你以多种方式掩盖了这些。 WTF是个hapikey?我想你的意思是Hubspot_API_keyportal做了什么?

一条建议:更多"全球"一个问题是,正式的更多"它应该是。如果你有一个for循环,可以调用其中一个变量i。如果您在整个函数中使用了一段数据,请将其称为objportal。但是如果你有一个全局使用的数据,或者是一个类变量,那就把它放在领带和夹克上,这样每个人都可以认出它:把它Hubspot_api_key代替client_api_key。如果有多个API,甚至可以Hubspot_client_api_key。对portal执行相同的操作。

# This attempts to find any file in the directory that starts with "note" and ends with ".CSV"
# Server Version
# findCSV = glob.glob('/home/accountName/public_html/clientFolder/contact*.CSV')

评论成为谎言并不需要很长时间。如果它们不是真的,请删除它们。

# Local Testing Version
findCSV = glob.glob('contact*.CSV')

这是你应该创建一个函数的东西。只需创建一个名为" get_csv_files"的简单函数。或者其他什么,让它返回一个文件名列表。这将你与glob分离,这意味着你可以使你的测试代码数据被驱动(将文件名列表传递给一个函数,或者将一个文件传递给一个函数,而不是让它搜索它们)。此外,这些glob模式恰好是配置文件或全局变量中的那种,或者作为命令行参数传递。

for i in findCSV:

我打赌一直以大写字母输入CSV是一种痛苦。 findCSV是什么意思?读取该行,并找出应该调用的变量。也许csv_files?还是new_contact_files?表明有一系列东西的东西。

    theCSV = i
    csvfileexists = os.path.isfile(theCSV)

现在i做了什么?你有一个漂亮的小变量名,在BiiiiiiG循环中。这是一个错误,因为如果你不能在一个页面上看到变量的整个范围,那么它可能需要更长的名称。但后来你为它创建了一个别名。 itheCSV都指的是同一件事。而且......我再也没有看到你使用i。所以也许你的循环变量应该是theCSV。或者也许它应该the_csv以便更容易输入。或者只是csvname

    # Prints a confirmation if file exists, prints instructions if it doesn't.

这似乎有点不必要。如果您使用glob来获取文件名,那么它们几乎就会存在。 (如果他们不这样做,那是因为他们在您致电glob的时间和您尝试打开它们的时间之间被删除了。这是可能的,但很少见。只是{{{} {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{ 1}}或引发异常,视情况而定。)

continue

在此代码中,您使用 if csvfileexists: print ("\nThe \"{csvPath}\" file was found ({csvSize} bytes); proceeding with sync ...\n".format(csvSize=os.path.getsize(theCSV), csvPath=os.path.basename(theCSV))) 的值。但那是你唯一使用它的地方。在这种情况下,您可以将调用csvfileexists移动到if语句中并删除变量。

os.path.isfile()

请注意,在这种情况下,当出现实际问题时,您没有打印文件名?这有多大帮助?

另外,还记得您在远程服务器上的部分吗?您应该考虑使用Python logging module以有用的方式记录这些消息。

    else:
        print ("File not found; check the file name to make sure it is in the same directory as this script. Exiting ...")
        sys.exit()

您正在创建包含大量数据的对象。这将是一个功能的好地方。定义一个# Begin the CSVmapper mapping... This creates a virtual "header" row - the CSV therefore does not need a header row. mapper = csvmapper.DictMapper([ [ {'name':'account'}, #"Org. Code" {'name':'id'}, #"Hubspot Ref" {'name':'company'}, #"Company Name" {'name':'firstname'}, #"Contact First Name" {'name':'lastname'}, #"Contact Last Name" {'name':'job_title'}, #"Job Title" {'name':'address'}, #"Address" {'name':'city'}, #"City" {'name':'phone'}, #"Phone" {'name':'email'}, #"Email" {'name':'date_added'} #"Last Update" ] ]) 函数来为您完成所有这些,并将其移出行。

另请注意,标准csv module具有您正在使用的大部分功能。我认为你确实不需要make_csvmapper()

csvmapper

这是另一个功能的机会。也许不是制作csv映射器,而是返回# Parse the CSV using the mapper parser = csvmapper.CSVParser(os.path.basename(theCSV), mapper) # Build the parsed object obj = parser.buildObject()

obj

此时,事情变得可疑。你有这些函数定义缩进,但我认为你不需要它们。这是一个stackoverflow问题,还是你的代码看起来真的像这样?

def contactCompanyUpdate():

不,显然它看起来真的像这样。因为当您不需要时,您可以在此功能中使用 # Open the CSV, use commas as delimiters, store it in a list called "data", then find the length of that list. with open(os.path.basename(theCSV),"r") as f: 。请考虑使用形式函数参数,而不是仅仅抓取外部对象。另外,为什么在csv文件上使用theCSV?如果您是使用basename获得的,那么它是否已经拥有您想要的路径?

glob

在这里,您强制 reader = csv.reader(f, delimiter = ",", quotechar="\"") data = list(reader) # For every row in the CSV ... for row in range(0, len(data)): 成为从data获取的行列表,然后开始迭代它们。只需直接迭代reader,例如:reader 但等待!您实际上正在{{1}中迭代已打开的CSV文件变量。只需选择一个,然后迭代它。您不需要为此打开文件两次。

for row in reader:

好的,这是一段LOOOONG代码,并没有做太多。至少,将每个内部obj收紧到一行。但更好的是,编写一个函数来以您想要的格式创建字典。您可以使用 # Set up the JSON payload ... payload = { "properties": [ { "name": "account", "value": obj[row].account }, { "name": "id", "value": obj[row].id }, { "name": "company", "value": obj[row].company }, { "property": "firstname", "value": obj[row].firstname }, { "property": "lastname", "value": obj[row].lastname }, { "property": "job_title", "value": obj[row].job_title }, { "property": "address", "value": obj[row].address }, { "property": "city", "value": obj[row].city }, { "property": "phone", "value": obj[row].phone }, { "property": "email", "value": obj[row].email }, { "property": "date_added", "value": obj[row].date_added } ] } dicts按名称提取数据。

getattr

您可以在此处将API的详细信息编码到代码中。考虑将它们拉入功能。 (这样,你可以稍后再回来构建一个模块,在你的下一个程序中重复使用。)另外,要注意那些实际上并没有告诉你任何事情的评论。并且可以自由地将它们作为一个段落组合在一起,因为它们都在为同一个关键事项服务 - 进行API调用。

obj

我不确定这最后一位是否应该是例外。是"尝试失败"正常的行为,还是意味着事情被打破了?

无论如何,请查看您正在使用的API。我敢打赌,还有一些可用于轻微失败的信息。 (主要的失败是因特网被破坏或服务器处于脱机状态。)他们可能会提供错误"或"错误"例如,返回JSON中的字段。这些应该记录或打印出你的失败信息。

            nameQuery = "{first} {last}".format(first=obj[row].firstname, last=obj[row].lastname)

            # Get a list of all contacts for a certain company.
            contactCheck = "https://api.hubapi.com/contacts/v1/search/query?q={query}&hapikey={hapikey}".format(hapikey=hapikey, query=nameQuery)
            # Convert the payload to JSON and assign it to a variable called "data"
            data = json.dumps(payload)

            # Defined the headers content-type as 'application/json'
            headers = {'content-type': 'application/json'}

            contactExistCheck = requests.get(contactCheck, headers=headers)

这个功能大多与前一个功能相同。

            for i in contactExistCheck.json()[u'contacts']:

                # ... Get the canonical VIDs
                canonicalVid = i[u'canonical-vid']

                if canonicalVid:
                    print ("{theContact} exists! Their VID is \"{vid}\"".format(theContact=obj[row].firstname, vid=canonicalVid))
                    print ("Attempting to update their company...")
                    contactCompanyUpdate = "https://api.hubapi.com/companies/v2/companies/{companyID}/contacts/{vid}?hapikey={hapikey}".format(hapikey=hapikey, vid=canonicalVid, companyID=obj[row].id)
                    doTheUpdate = requests.put(contactCompanyUpdate, headers=headers)
                    if doTheUpdate.status_code == 200:
                        print ("Attempt Successful! {theContact}'s has an updated company.\n".format(theContact=obj[row].firstname))
                        break
                    else:
                        print ("Attempt Failed. Status Code: {status}. Company or Contact not found.\n".format(status=doTheUpdate.status_code))

除此之外。 从不将你的名字放在这样的地方。或者你将在10年后仍然接到这个代码的电话。输入您的部门名称(" IT运营")或支持号码。需要知道的人已经知道了。那些不需要知道的人可以通知已经知道的人。

def createOrUpdateClient():

这很尴尬。您可以考虑使用一些命令行参数并使用它们来确定您的行为。

            else:
                print ("Contact Marko for assistance.\n")

不要这样做。永远不要在模块范围内放置if __name__ == "__main__": # Run the Create or Update function createOrUpdateClient() # Give the previous function 5 seconds to take effect. sleep(5.0) # Run the Company Update function contactCompanyUpdate() print("Sync complete.") print("Moving \"{something}\" to the archive folder...".format(something=theCSV)) # Cron version #shutil.move( i, "/home/accountName/public_html/clientFolder/archive/" + os.path.basename(i)) # Local version movePath = "archive/{thefile}".format(thefile=theCSV) shutil.move( i, movePath ) print("Move successful! Exiting...\n") ,因为这意味着您无法导入此代码。也许有人想导入它来解析文档字符串。或许他们想借用你写的那些API函数。太糟糕了! sys.exit() 意味着总是要说"哦,抱歉,我必须为你做这件事。"将它放在实际exit()代码的底部。或者,由于您实际上并未传递值,因此请完全删除它。