如何解决如何在 google colab 中设置 theano 标志?
我想使用 colab GPU 运行我的 theano 代码,因此我试图为此目的更改 theano 标志。我试过了
import os
os.environ['THEANO_FLAGS'] = """ device=cuda0,force_device=True,blas.ldflags="-L/usr/lib/ -lblas",floatX=float32,mode=FAST_RUN,lib.cnmem=.5,profile=True,CUDA_LAUNCH_BLOCKING=1 """
import theano
和
!printf """[global]\\ndevice = cuda\\nfloatX = float32\\nforce_device=True\\nmode=FAST_RUN\\nlib.cnmem=.5\\nprofile=True\\nCUDA_LAUNCH_BLOCKING=1""" > ~/.theanorc
!cat ~/.theanorc
但他们似乎都没有工作。因为(根据分析器)所有操作都是特定于 CPU 的(ElemWise,而不是 GpuElemWise,没有 GpuFromHost 等)。
我试过这个代码:
import numpy
import theano
import theano.tensor as T
input_data = numpy.matrix([[28,1],[35,2],[18,[56,[80,3]])
output_data = numpy.matrix([1600,2100,1400,2500,3200])
TS = theano.shared(input_data.astype('float32'),"training-set")
E = theano.shared(output_data.astype('float32'),"expected")
W1 = theano.shared(numpy.zeros((1,2),dtype = 'float32'))
O = T.dot(TS,W1.T)
cost = T.mean(T.sqr(E - O.T)).astype('float32')
gradient = T.grad(cost=cost,wrt=W1).astype('float32')
update = [[W1,W1 - gradient * numpy.float32(0.0001)]]
train = theano.function([],cost,updates=update,allow_input_downcast=True,profile = True)
for i in range(1000):
train()
train.profile.summary()
并收到以下输出:
Function profiling
==================
Message: <ipython-input-20-49bdedf42dbb>:27
Time in 1000 calls to Function.__call__: 1.391292e-02s
Time in Function.fn.__call__: 7.742643e-03s (55.651%)
Time in thunks: 3.543854e-03s (25.472%)
Total compile time: 5.829549e-02s
Number of Apply nodes: 16
Theano Optimizer time: 4.293251e-02s
Theano validate time: 7.207394e-04s
Theano Linker time (includes C,CUDA code generation/compiling): 1.048517e-02s
Import time 0.000000e+00s
Node make_thunk time 9.668112e-03s
Node InplaceDimShuffle{x,x}(Subtensor{int64}.0) time 1.002550e-03s
Node InplaceDimShuffle{1,0}(training-set) time 9.713173e-04s
Node InplaceDimShuffle{x,x}(Subtensor{int64}.0) time 9.384155e-04s
Node Gemm{inplace}(<TensorType(float32,matrix)>,TensorConstant{-1e-04},Elemwise{Composite{((i0 * i1) / i2)}}.0,training-set,TensorConstant{1.0}) time 7.627010e-04s
Node Gemm{no_inplace}(expected,TensorConstant{-1.0},<TensorType(float32,training-set.T,TensorConstant{1.0}) time 7.226467e-04s
Time in all call to theano.grad() 2.316711e-01s
Time since theano import 1824.793s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
28.5% 28.5% 0.001s 5.05e-07s C 2000 2 theano.tensor.blas.Gemm
20.6% 49.1% 0.001s 1.46e-07s C 5000 5 theano.tensor.elemwise.Elemwise
18.5% 67.6% 0.001s 2.18e-07s C 3000 3 theano.tensor.elemwise.DimShuffle
12.8% 80.4% 0.000s 4.54e-07s C 1000 1 theano.tensor.elemwise.Sum
9.3% 89.7% 0.000s 1.65e-07s C 2000 2 theano.tensor.subtensor.Subtensor
6.1% 95.8% 0.000s 1.08e-07s C 2000 2 theano.compile.ops.Shape_i
4.2% 100.0% 0.000s 1.50e-07s C 1000 1 theano.tensor.opt.MakeVector
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
15.9% 15.9% 0.001s 5.65e-07s C 1000 1 Gemm{no_inplace}
12.8% 28.8% 0.000s 4.54e-07s C 1000 1 Sum{acc_dtype=float64}
12.6% 41.3% 0.000s 4.45e-07s C 1000 1 Gemm{inplace}
11.0% 52.3% 0.000s 1.94e-07s C 2000 2 InplaceDimShuffle{x,x}
9.3% 61.6% 0.000s 1.65e-07s C 2000 2 Subtensor{int64}
7.5% 69.1% 0.000s 2.66e-07s C 1000 1 InplaceDimShuffle{1,0}
5.8% 74.9% 0.000s 2.05e-07s C 1000 1 Elemwise{mul,no_inplace}
5.4% 80.2% 0.000s 1.91e-07s C 1000 1 Elemwise{Composite{((i0 * i1) / i2)}}
5.0% 85.3% 0.000s 1.78e-07s C 1000 1 Elemwise{Cast{float32}}
4.2% 89.5% 0.000s 1.50e-07s C 1000 1 MakeVector{dtype='int64'}
3.1% 92.6% 0.000s 1.08e-07s C 1000 1 Shape_i{0}
3.0% 95.6% 0.000s 1.08e-07s C 1000 1 Shape_i{1}
2.2% 97.8% 0.000s 7.94e-08s C 1000 1 Elemwise{Composite{((i0 / i1) / i2)}}[(0,0)]
2.2% 100.0% 0.000s 7.68e-08s C 1000 1 Elemwise{Sqr}[(0,0)]
... (remaining 0 Ops account for 0.00%(0.00s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
15.9% 15.9% 0.001s 5.65e-07s 1000 3 Gemm{no_inplace}(expected,TensorConstant{1.0})
12.8% 28.8% 0.000s 4.54e-07s 1000 14 Sum{acc_dtype=float64}(Elemwise{Sqr}[(0,0)].0)
12.6% 41.3% 0.000s 4.45e-07s 1000 13 Gemm{inplace}(<TensorType(float32,TensorConstant{1.0})
7.5% 48.8% 0.000s 2.66e-07s 1000 0 InplaceDimShuffle{1,0}(training-set)
6.1% 54.9% 0.000s 2.15e-07s 1000 7 Subtensor{int64}(Elemwise{Cast{float32}}.0,Constant{1})
5.8% 60.6% 0.000s 2.05e-07s 1000 10 Elemwise{mul,no_inplace}(InplaceDimShuffle{x,x}.0,InplaceDimShuffle{x,x}.0)
5.6% 66.2% 0.000s 1.97e-07s 1000 8 InplaceDimShuffle{x,x}(Subtensor{int64}.0)
5.4% 71.6% 0.000s 1.92e-07s 1000 9 InplaceDimShuffle{x,x}(Subtensor{int64}.0)
5.4% 77.0% 0.000s 1.91e-07s 1000 11 Elemwise{Composite{((i0 * i1) / i2)}}(TensorConstant{(1,1) of -2.0},Gemm{no_inplace}.0,Elemwise{mul,no_inplace}.0)
5.0% 82.0% 0.000s 1.78e-07s 1000 5 Elemwise{Cast{float32}}(MakeVector{dtype='int64'}.0)
4.2% 86.3% 0.000s 1.50e-07s 1000 4 MakeVector{dtype='int64'}(Shape_i{0}.0,Shape_i{1}.0)
3.2% 89.5% 0.000s 1.15e-07s 1000 6 Subtensor{int64}(Elemwise{Cast{float32}}.0,Constant{0})
3.1% 92.6% 0.000s 1.08e-07s 1000 2 Shape_i{0}(expected)
3.0% 95.6% 0.000s 1.08e-07s 1000 1 Shape_i{1}(expected)
2.2% 97.8% 0.000s 7.94e-08s 1000 15 Elemwise{Composite{((i0 / i1) / i2)}}[(0,0)](Sum{acc_dtype=float64}.0,Subtensor{int64}.0,Subtensor{int64}.0)
2.2% 100.0% 0.000s 7.68e-08s 1000 12 Elemwise{Sqr}[(0,0)](Gemm{no_inplace}.0)
... (remaining 0 Apply instances account for 0.00%(0.00s) of the runtime)
Here are tips to potentially make your code run faster
(if you think of new ones,suggest them on the mailing list).
Test them first,as they are not guaranteed to always provide a speedup.
- Try the Theano flag floatX=float32
提前感谢您的帮助。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。