微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何最小化 multigpu 发行版中的优化器

如何解决如何最小化 multigpu 发行版中的优化器

我正在尝试调整 MNIST 的 DL 模型,以便同时在多个 GPU 中运行。 但是,我找不到让它工作的方法。我是 DL 的新手,所以我不完全理解这段代码背后的所有逻辑。我尝试了很多东西,但似乎都没有奏效。这是代码

strategy = tf.distribute.MirroredStrategy()

with strategy.scope():
  y_ = tf.placeholder(tf.float32,[None,10])
  # LInes TO MAKE MODEL...
  y_conv = tf.matmul(h_fc1_drop,W_fc2) + b_fc2

  #Crossentropy
  cross_entropy = tf.reduce_mean(
      tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_,logits=y_conv))
  
  train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

  correct_prediction = tf.equal(tf.argmax(y_conv,1),tf.argmax(y_,1))
  accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

我遇到了这个问题:

RuntimeError: Use `_distributed_apply()` instead of `apply_gradients()` in a cross-replica context.

调整最小化函数以使用此 _distributed_apply() 函数并不能解决此问题。如果我改变

  #train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

  opt = tf.train.AdamOptimizer(1e-4)
  grads_and_vars = opt.compute_gradients(cross_entropy)
  train_step = opt._distributed_apply(strategy,grads_and_vars)

我收到以下错误

Traceback (most recent call last):
  File "/home/baq/.local/lib/python3.7/site-packages/tensorflow/python/distribute/cross_device_ops.py",line 108,in _make_tensor_into_per_replica
    device = input_tensor.device
AttributeError: 'nonetype' object has no attribute 'device'

During handling of the above exception,another exception occurred:

Traceback (most recent call last):
  File "multigpu.py",line 97,in <module>
    train_step = opt._distributed_apply(strategy,grads_and_vars)
  File "/home/baq/.local/lib/python3.7/site-packages/tensorflow/python/training/optimizer.py",line 665,in _distributed_apply
    ds_reduce_util.ReduceOp.SUM,grads_and_vars)
  File "/home/baq/.local/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py",line 1254,in batch_reduce_to
    return self._batch_reduce_to(reduce_op,value_destination_pairs)
  File "/home/baq/.local/lib/python3.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py",line 739,in _batch_reduce_to
    reduce_op,value_destination_pairs)
  File "/home/baq/.local/lib/python3.7/site-packages/tensorflow/python/distribute/cross_device_ops.py",line 285,in batch_reduce
    value_destination_pairs)
  File "/home/baq/.local/lib/python3.7/site-packages/tensorflow/python/distribute/cross_device_ops.py",line 133,in _normalize_value_destination_pairs
    per_replica = _make_tensor_into_per_replica(pair[0])
  File "/home/baq/.local/lib/python3.7/site-packages/tensorflow/python/distribute/cross_device_ops.py",line 110,in _make_tensor_into_per_replica
    raise ValueError("Cannot convert `input_tensor` to a `PerReplica` object "
ValueError: Cannot convert `input_tensor` to a `PerReplica` object because it doesn't have device set.

关于如何解决这个问题的任何想法?谢谢。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。