Question

是否可以使用Colab提供的GPU 来更快地运行TFF的培训课程？训练联合模型需要1个多小时，而且使用GPU运行时似乎根本没有任何好处。

“高性能仿真”的“ TFF”页面仍然为空，我找不到任何将GPU与TFF一起使用的指南。

有什么建议吗？谢谢！

tf和tff版本：

export class Child extends React.Component {
  render() {
    return <h1>{this.props.data}</h1>;
  }
}

每轮客户数量：

2.4.0-dev20200917 
0.16.1

输入数据元素规范：

类似于我正在处理位置序列的文本生成教程，该模型非常相似：

OrderedDict([('x',
          OrderedDict([('start_place',
                        TensorSpec(shape=(8, 8), dtype=tf.int32, name=None)),
                       ('start_hour_sin',
                        TensorSpec(shape=(8, 8), dtype=tf.float64, name=None)),
                       ('start_hour_cos',
                        TensorSpec(shape=(8, 8), dtype=tf.float64, name=None)),
                       ('week_day_sin',
                        TensorSpec(shape=(8, 8), dtype=tf.float64, name=None)),
                       ('week_day_cos',
                        TensorSpec(shape=(8, 8), dtype=tf.float64, name=None)),
                       ('weekend',
                        TensorSpec(shape=(8, 8), dtype=tf.int32, name=None)),
                       ('month',
                        TensorSpec(shape=(8, 8), dtype=tf.int32, name=None))])),
         ('y', TensorSpec(shape=(8, 8), dtype=tf.int32, name=None))])

创建模型的功能：

    # Create a model
def create_keras_model(number_of_places, batch_size):
  
        # Shortcut to the layers package
  l = tf.keras.layers


  # Now we need to define an input dictionary.
    # Where the keys are the column names
    # This is a model with multiple inputs, so we need to declare and input layer for each feature
  feature_inputs = {
    'start_hour_sin': tf.keras.Input((N-1, ), batch_size=batch_size, name='start_hour_sin'),
    'start_hour_cos': tf.keras.Input((N-1, ), batch_size=batch_size, name='start_hour_cos'),
    'weekend': tf.keras.Input((N-1, ), batch_size=batch_size, name='weekend'),
    'week_day_sin': tf.keras.Input((N-1, ), batch_size=batch_size, name='week_day_sin'),
    'week_day_cos': tf.keras.Input((N-1, ), batch_size=batch_size, name='week_day_cos'),
  }

  
  # We cannot use anarray of features as always because we have sequences and we cannot match the shape otherwise
  # We have to do one by one
  start_hour_sin = feature_column.numeric_column("start_hour_sin", shape=(N-1))
  hour_sin_feature = l.DenseFeatures(start_hour_sin)(feature_inputs)

  start_hour_cos = feature_column.numeric_column("start_hour_cos", shape=(N-1))
  hour_cos_feature = l.DenseFeatures(start_hour_cos)(feature_inputs)

  weekend = feature_column.numeric_column("weekend", shape=(N-1))
  weekend_feature = l.DenseFeatures(weekend)(feature_inputs)
  
  week_day_sin = feature_column.numeric_column("week_day_sin", shape=(N-1))
  week_day_sin_feature = l.DenseFeatures(week_day_sin)(feature_inputs)

  week_day_cos = feature_column.numeric_column("week_day_cos", shape=(N-1))
  week_day_cos_feature = l.DenseFeatures(week_day_cos)(feature_inputs)

  
    # We have also to add a dimension to then concatenate
  hour_sin_feature = tf.expand_dims(hour_sin_feature, -1)
  hour_cos_feature = tf.expand_dims(hour_cos_feature, -1)
  weekend_feature = tf.expand_dims(weekend_feature, -1)
  week_day_sin_feature = tf.expand_dims(week_day_sin_feature, -1)
  week_day_cos_feature = tf.expand_dims(week_day_cos_feature, -1)

  # Declare the dictionary for the places sequence as before
  sequence_input = {
      'start_place': tf.keras.Input((N-1,), batch_size=batch_size, dtype=tf.dtypes.int32, name='start_place') # add batch_size=batch_size in case of stateful GRU
  }


  # Handling the categorical feature sequence using one-hot
  places_one_hot = feature_column.sequence_categorical_column_with_vocabulary_list(
      'start_place', [i for i in range(number_of_places)])
  
  # Embed the one-hot encoding
  places_embed = feature_column.embedding_column(places_one_hot, embedding_dim)


  # With an input sequence we can't use the DenseFeature layer, we need to use the SequenceFeatures
  sequence_features, sequence_length = tf.keras.experimental.SequenceFeatures(places_embed)(sequence_input)

  input_sequence = l.Concatenate(axis=2)([ sequence_features, hour_sin_feature, hour_cos_feature, weekend_feature, week_day_sin_feature, week_day_cos_feature])

  # Rnn
  recurrent = l.GRU(rnn_units,
                        batch_size=batch_size, #in case of stateful
                        return_sequences=True,
                        dropout=0.5,
                        stateful=True,
                        recurrent_initializer='glorot_uniform')(input_sequence)


    # Last layer with an output for each places
  dense_1 = layers.Dense(number_of_places)(recurrent)

    # Softmax output layer
  output = l.Softmax()(dense_1)
    
    # To return the Model, we need to define it's inputs and outputs
    # In out case, we need to list all the input layers we have defined 
  inputs = list(feature_inputs.values()) + list(sequence_input.values())

    # Return the Model
  return tf.keras.Model(inputs=inputs, outputs=output)

联盟平均

def create_tff_model():
  # TFF uses an `input_spec` so it knows the types and shapes
  # that your model expects.
  input_spec = preprocessed_example_dataset.element_spec
  keras_model_clone = create_keras_model(number_of_places, batch_size=BATCH_SIZE)
  return tff.learning.from_keras_model(
      keras_model_clone,
      input_spec=input_spec,
      loss=tf.keras.losses.SparseCategoricalCrossentropy(),

状态初始化：

# This command builds all the TensorFlow graphs and serializes them: 
fed_avg = tff.learning.build_federated_averaging_process(
    model_fn=create_tff_model,
    client_optimizer_fn=lambda: tf.keras.optimizers.Adam(learning_rate=0.001),
    server_optimizer_fn=lambda: tf.keras.optimizers.Adam(learning_rate=0.06))
          metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

训练循环：

state = fed_avg.initialize()

Answer 1

要注意，此模型每轮执行0个客户* 13个SGD步骤（接近1,000），尽管一个小时似乎仍然很长。一台机器上的70个客户端正在推动模拟的极限，当数量增加时，我们开始使用远程执行器查看多机器设置。

需要调查的一些事情

模拟 I / O 是否受约束？ Python环境可以在单个客户端数据集中进行迭代的速度有多快？在TF for batch in dataset:中，在此处花费时间可能会有用。
模拟 compute 是否受约束？也许要注意CPU和GPU的利用率。在单个客户端数据集上运行keras_model.fit()需要多长时间？ TFF模拟大约每轮执行70倍（每个客户一次）。

使用Colab在GPU上运行联邦TensorFlow

1 个答案: