我有一个使用scikit-learn
的Python 3.6应用程序,部署到IBM Cloud(Cloud Foundry)。它工作正常。我的本地开发环境是Mac OS High Sierra。
最近,我在应用程序中添加了IBM Cloud Object Storage功能(ibm_boto3
)。 COS功能本身工作正常。我可以使用^BAZ\s*=\s*([^\s\/]*).*/\1
^ Match beginning of line
BAZ Match BAZ
\s*=\s* Match equal sign surrounded by zero or more spaces
([^\s\/]*) Capture in Group 1 any character that is not space or slash
.* Match the rest of the text
/\1 Replace matched text with text in Group 1
库轻松上传,下载,列出和删除对象。
奇怪的是,使用ibm_boto3
的应用程序部分现在冻结了。
如果我注释掉ibm_boto3 scikit-learn
语句(以及相应的代码),那么import
代码就可以正常工作。
更令人困惑的是,这个问题只发生在运行OS X的本地开发机器上。当应用程序部署到IBM Cloud时,它运行正常 - scikit-learn
和scikit-learn
都可以正常工作侧的。
此时我们唯一的假设是ibm_boto3
库在某种程度上表现了ibm_boto3
中的一个已知问题(参见this - 当{{{3}}时,K-means算法的并行版本被破坏了{1}}在OS X上使用Accelerator)。
请注意,我们只有在向项目添加scikit-learn
时才会遇到此问题。
但是,我们需要能够在部署到IBM Cloud之前在localhost上进行测试。 Mac OS上的numpy
和ibm_boto3
之间是否存在已知的兼容性问题?
关于我们如何在开发机器上避免这种情况的任何建议?
干杯。
答案 0 :(得分:1)
到目前为止,还没有任何已知的兼容性问题。 :)
在某些时候,OSX附带的vanilla SSL库存在一些问题,但是如果您能够读取和写入不是问题的数据。
您使用的是HMAC credentials吗?如果是这样,我很好奇如果你使用原始boto3
库而不是IBM fork,行为是否会继续。
以下是一个简单示例,说明如何将pandas
与原始boto3
一起使用:
import boto3 # package used to connect to IBM COS using the S3 API
import io # python package used to stream data
import pandas as pd # lightweight data analysis package
access_key = '<access key>'
secret_key = '<secret key>'
pub_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
pvt_endpoint = 'https://s3-api.us-geo.objectstorage.service.networklayer.com'
bucket = 'demo' # the bucket holding the objects being worked on.
object_key = 'demo-data' # the name of the data object being analyzed.
result_key = 'demo-data-results' # the name of the output data object.
# First, we need to open a session and create a client that can connect to IBM COS.
# This client needs to know where to connect, the credentials to use,
# and what signature protocol to use for authentication. The endpoint
# can be specified to be public or private.
cos = boto3.client('s3', endpoint_url=pub_endpoint,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
region_name='us',
config=boto3.session.Config(signature_version='s3v4'))
# Since we've already uploaded the dataset to be worked on into cloud storage,
# now we just need to identify which object we want to use. This creates a JSON
# representation of request's response headers.
obj = cos.get_object(Bucket=bucket, Key=object_key)
# Now, because this is all REST API based, the actual contents of the file are
# transported in the request body, so we need to identify where to find the
# data stream containing the actual CSV file we want to analyze.
data = obj['Body'].read()
# Now we can read that data stream into a pandas dataframe.
df = pd.read_csv(io.BytesIO(data))
# This is just a trivial example, but we'll take that dataframe and just
# create a JSON document that contains the mean values for each column.
output = df.mean(axis=0, numeric_only=True).to_json()
# Now we can write that JSON file to COS as a new object in the same bucket.
cos.put_object(Bucket=bucket, Key=result_key, Body=output)