我正在尝试使用keras和tensorflow 1.15在GCP上训练模型。 从现在开始,我的代码类似于我可以在colab上执行的代码:
it('should have font-weight 600', async () => {
const { getByText } = render(<Tabs tabItems={tabItems} selected="test1" />);
expect(getByText('Test 1').parentElement).toHaveStyleRule(
'font-weight',
'600',
{ modifier: css`> button` }
);
});
但是我的数据在存储桶中,而我的代码在VM中。那我该怎么办?我尝试使用“ gs:// BUCKETS”加载数据,但是它不起作用。我该怎么办 ? 编辑:我添加我的代码以加载数据,我对不起。
# TPUs
import tensorflow as tf
print(tf.__version__)
cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver("tpu-name")
tf.config.experimental_connect_to_cluster(cluster_resolver)
tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
tpu_strategy = tf.distribute.experimental.TPUStrategy(cluster_resolver)
print("Number of accelerators: ", tpu_strategy.num_replicas_in_sync)
import numpy as np
np.random.seed(123) # for reproducibility
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Convolution2D, MaxPooling2D, Input
from tensorflow.keras import utils
from tensorflow.keras.datasets import mnist, cifar10
from tensorflow.keras.models import Model
# 4. Load data into train and test sets
(X_train, y_train) = load_data(sets="gs://BUCKETS/dogscats/train/",target_size=img_size)
(X_test, y_test) = load_data(sets="gs://BUCKETS/dogscats/valid/",target_size=img_size)
print(X_train.shape, X_test.shape)
# 5. Preprocess input data
#X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
#X_test = X_test.reshape(X_test.shape[0], 28, 28,1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255.0
X_test /= 255.0
print(y_train.shape, y_test.shape)
# 6. Preprocess class labels One hot encoding
Y_train = utils.to_categorical(y_train, 2)
Y_test = utils.to_categorical(y_test, 2)
print(Y_train.shape, Y_test.shape)
with tpu_strategy.scope():
model = make_model((img_size, img_size, 3))
# 8. Compile model
model.compile(loss='categorical_crossentropy',
optimizer="sgd",
metrics=['accuracy'])
model.summary()
batch_size = 1250 * tpu_strategy.num_replicas_in_sync
# 9. Fit model on training data
model.fit(X_train, Y_train, steps_per_epoch=len(X_train)//batch_size,
epochs=5, verbose=1)
EDIT2:如果其他人在同一情况下,请完成@daudnadeem的答案。
我的目标是从存储桶中获取图像,因此代码可以很好地工作并允许获取字节对象。要将其转换为图像,您只需要使用PIL库:
def load_data(sets="dogcats/train/", k = 5000, target_size=250):
# define location of dataset
folder = sets
photos, labels = list(), list()
# determine class
output = 0.0
for i, dog in enumerate(listdir(folder + "dogs/")):
if i >= k:
break
# load image
photo = load_img(folder + "dogs/" +dog, target_size=(target_size, target_size))
# convert to numpy array
photo = img_to_array(photo)
# store
photos.append(photo)
labels.append(output)
output = 1.0
for i, cat in enumerate(listdir(folder + "cats/") ):
if i >= k:
break
# load image
photo = load_img(folder + "cats/"+cat, target_size=(target_size, target_size))
# convert to numpy array
photo = img_to_array(photo)
# store
photos.append(photo)
labels.append(output)
# convert to a numpy arrays
photos = asarray(photos)
labels = asarray(labels)
print(photos.shape, labels.shape)
photos, labels = shuffle(photos, labels, random_state=0)
return photos, labels
答案 0 :(得分:0)
(X_train, y_train) = load_data(sets="gs://BUCKETS/dogscats/train/",target_size=img_size)
(X_test, y_test) = load_data(sets="gs://BUCKETS/dogscats/valid/",target_size=img_size)
这显然是行不通的,因为从本质上讲,您所做的所有操作都设置了字符串。您需要做的是将此数据下载为字符串,然后使用它。
首先安装软件包pip install google-cloud-storage
或pip3 install google-cloud-storage
pip-> Python
pip3-> Python3
看看this,您将需要一个服务帐户才能通过代码与GCP进行交互。用于身份验证。
当您将服务帐户作为json获取时,您需要执行以下两项操作之一:
将其设置为环境变量:
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"
或我更喜欢的解决方法
gcloud auth activate-service-account \
<repalce-with-email-from-json-file> \
--key-file=<path/to/your/json/file> --project=<name-of-your-gcp-project>
现在让我们看看如何使用google-cloud-storage库以字符串形式下载文件:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket("BUCKETS")
blob = bucket.get_blob('/dogscats/train/<you-will-need-to-point-to-a-file-and-not-a-directory>')
data = blob.download_as_string()
现在您已将数据作为字符串,您可以像这样data
一样简单地将(X_train, y_train) = load_data(sets=data,target_size=img_size)
传递到加载数据中
听起来很复杂,但是这里是一个快速的伪布局:
load_data(data)
希望有帮助!