我是一个相对较新的编码和尝试编写一个脚本,该脚本将在特定的S3存储桶中查找某些文件名。如果该文件可用,它将通过pandas读入进行少量编辑,然后在本地输出为新文件名。
我试图将文件名选项作为一个函数,我认为这最终会导致下面的错误,但不确定。关于我的代码中的内容导致此错误以及如何解决此问题的任何想法。
File "tester.py", line 23, in <module>
s3.Bucket(bucket_name).download_file(Key,filename)
File "/Users/b/Library/Python/3.5/lib/python/site-packages/boto3/s3/inject.py", line 168, in bucket_download_file
ExtraArgs=ExtraArgs, Callback=Callback, Config=Config)
File "/Users/b/Library/Python/3.5/lib/python/site-packages/boto3/s3/inject.py", line 130, in download_file
extra_args=ExtraArgs, callback=Callback)
File "/Users/b/Library/Python/3.5/lib/python/site-packages/boto3/s3/transfer.py", line 301, in download_file
raise ValueError('Filename must be a string')
ValueError: Filename must be a string
我的完整代码如下:
import pandas as pd
import fnmatch
import os
import time
import boto3
import botocore
bucket_name = 'data' # replace with your bucket name
Key = 'Setting/incoming/' # replace with your object Key
date = time.strftime("%Y%m%d")
s3 = boto3.resource('s3')
def filename(file):
Accountfile = "accountteam_table.csv"
Subscript = "subscription.csv"
Opportunity = "opportunity.csv"
Subtable = "subscription.csv"
Opttable = "opportunity.csv"
Accounttable = "account.csv"
try:
s3.Bucket(bucket_name).download_file(Key,file)
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
path = s3.Bucket(bucket_name,Key)
for file in os.listdir(path):
if fnmatch.fnmatch(file, Accountfile):
df = pd.read_csv(file, delimiter=',',encoding = "ISO-8859-1")
df.drop_duplicates(subset= 'Account Team Member Id', keep='first', inplace=True)
df['date_added'] = ""
df.to_csv('accountteam_table_'+ date + '.csv', sep=',')
#subscription Product and Charge pre-process file
elif fnmatch.fnmatch(file, Subscript):
df = pd.read_csv(file, delimiter=',',encoding = "ISO-8859-1")
df.drop_duplicates(subset= 'Subscription Product & Charge ID', keep='first', inplace=True)
df.drop('Opportunity Name','Account Name','Subscription Name', axis=1)
df['date_added'] = ""
df.to_csv('subscriptionproduct_charge_table_'+ date + '.csv', sep=',')
elif fnmatch.fnmatch(file, Opportunity):
df = pd.read_csv(file, delimiter=',',encoding = "ISO-8859-1")
df.drop_duplicates(subset= 'Opportunity Product Family: ID', keep='first', inplace=True)
df.drop('Opportunity Name','Non-Recurring TCV (OPF) Currency', axis=1)
df['date_added'] = ""
df.to_csv('opportunityproductfamily_table_' + date + '.csv', sep=',')