如何使用Python中的Boto检查文件是否已完成上传到S3 Bucket?

时间:2016-04-21 01:19:51

标签: python amazon-web-services amazon-s3 flask

我尝试使用boto将图像上传到S3存储桶。图像成功上传后,我想使用S3存储桶中图像的文件URL执行某个操作。问题是,有时图像上传速度不够快,当我想根据图像的文件URL执行操作时,我最终会出现服务器错误。

这是我的源代码。我正在使用python flask。

def search_test(consumer_id):

consumer = session.query(Consumer).filter_by(consumer_id=consumer_id).one()
products = session.query(Product).all()
product_dictionary = {'Products': [p.serialize for p in products]}

if request.method == 'POST':
    p_product_image_url = request.files['product_upload_url']
    s3 = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
    bucket = s3.get_bucket(AWS_BUCKET_NAME)
    k = Key(bucket)
    if p_product_image_url and allowed_file(p_product_image_url.filename):

        # Read the contents of the file
        file_content = p_product_image_url.read()

        # Use Boto to upload the file to S3
        k.set_metadata('Content-Type', mimetypes.guess_type(p_product_image_url.filename))
        k.key = secure_filename(p_product_image_url.filename)
        k.set_contents_from_string(file_content)
        print ('consumer search upload successful')

    new_upload = Uploads(picture_upload_url=k.key.replace(' ', '+'), consumer=consumer)
    session.add(new_upload)
    session.commit()

    new_result = jsonify(Result=perform_actual_search(amazon_s3_base_url + k.key.replace(' ', '+'),

                                                      product_dictionary))

    return new_result
else:
    return render_template('upload_demo.html', consumer_id=consumer_id)

jsonify方法需要有效的图像URL来执行操作。它有时有效,有时它不起作用。我怀疑是由于图像在执行该行代码时尚未上传的问题。

perform_actual_search方法如下:

def get_image_search_results(image_url):
global description
url = ('http://style.vsapi01.com/api-search/by-url/?apikey=%s&url=%s' % (just_visual_api_key, image_url))
h = httplib2.Http()
response, content = h.request(url,
                              'GET')  # alternatively write content=h.request((url,'GET')[1]) ///Numbr 2 in our array
result = json.loads(content)

result_dictionary = []

for i in range(0, 10):
    if result:
        try:
            if result['errorMessage']:
                result_dictionary = []
        except:
            pass

            if result['images'][i]:
                images = result['images'][i]
                jv_img_url = images['imageUrl']
                title = images['title']
                try:
                    if images['description']:
                        description = images['description']
                    else:
                        description = "no description"
                except:
                    pass

                # print("\njv_img_url: %s,\ntitle: %s,\ndescription: %s\n\n"% (
                # jv_img_url, title, description))

                image_info = {
                    'image_url': jv_img_url,
                    'title': title,
                    'description': description,
                }
                result_dictionary.append(image_info)

if result_dictionary != []:
    # for i in range(len(result_dictionary)):
    #     print (result_dictionary[i])
    #     print("\n\n")
    return result_dictionary
else:
    return []


def performSearch(jv_input_dictionary, imagernce_products_dict):
print jv_input_dictionary
print imagernce_products_dict

global common_desc_ratio
global isReady
image_search_results = []
if jv_input_dictionary != []:
    for i in range(len(jv_input_dictionary)):
        print jv_input_dictionary[i]
        for key in jv_input_dictionary[i]:
            if key == 'description':
                input_description = jv_input_dictionary[i][key]
                s1w = re.findall('\w+', input_description.lower())
                s1count = Counter(s1w)
                print input_description
                for j in imagernce_products_dict:
                    if j == 'Products':
                        for q in range(len(imagernce_products_dict['Products'])):
                            for key2 in imagernce_products_dict['Products'][q]:
                                if key2 == 'description':
                                    search_description = imagernce_products_dict['Products'][q]['description']
                                    print search_description
                                    s2w = re.findall('\w+', search_description.lower())
                                    s2count = Counter(s2w)
                                    # Commonality magic
                                    common_desc_ratio = difflib.SequenceMatcher(None, s1w, s2w).ratio()
                                    print('Common ratio is: %.2f' % common_desc_ratio)

                            if common_desc_ratio > 0.09:
                                image_search_results.append(imagernce_products_dict['Products'][q])

if image_search_results:

    print image_search_results
    return image_search_results
else:
    return {'404': 'No retailers registered with us currently own this product.'}


def perform_actual_search(image_url, imagernce_product_dictionary):
return performSearch(get_image_search_results(image_url), imagernce_product_dictionary)

任何帮助解决这个问题都将非常感激。

2 个答案:

答案 0 :(得分:2)

我会将S3配置为生成有关事件的通知,例如s3:ObjectCreated:*

通知可以发布到SNS主题,SQS队列或直接触发lambda函数。

有关S3通知的更多详细信息:http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

您应该重写代码以分离上传部分和图像处理部分。后者可以在Python中实现为Lambda函数 以异步方式工作是关键,编写阻塞代码通常是不可扩展的。

答案 1 :(得分:1)

您可以将写入s3的字节与文件大小进行比较。假设您使用以下方法写入s3:

bytes_written = key.set_contents_from_file(file_binary, rewind=True) 在你的情况下它是set_contents_from_string

然后我会将bytes_writtenp_product_image_url.seek(0, os.SEEK_END)

进行比较

如果匹配的话。整个文件已上传到s3。