如何解决使用 databricks 运行时 ML 8.2 创建本地 conda 环境
我正在尝试在本地计算机中复制 Azure Databricks 运行时 ML 8.2,因此我不需要在 Azure Databricks 中启动集群以进行测试,但仍具有相同的环境(依赖项)。为此,我首先从运行 %conda env export -f /dbfs/path/to/environment_8.2_ML.yml
的 Databricks 笔记本中导出依赖项。
然后在我的 PC (Mac OS) 上,我已经尝试运行 conda env create --file=environment_8.2_ML.yml
但它没有找到一些库:
ResolvePackageNotFound:
- libgfortran-ng=7.3.0
- ld_impl_linux-64=2.33.1
- libstdcxx-ng=9.1.0
- libgcc-ng=9.1.0
我还需要从每个 conda 库中删除构建版本。
如果你们有合适的 YAML 文件或在本地计算机上成功实现复制数据块运行时 ML,请帮忙:)
先谢谢你!
解决方法
Release notes for Databricks Runtimes 包含有效的 YAML 文件,其中包含安装在特定版本中的所有库。例如,这里是 YAML file for 8.2 ML
,好的,我找到了通过创建 docker 映像然后在 Docker 容器中运行它来实现这一点的方法,因此将在该容器中创建 conda 环境。不知何故,如果我执行以下步骤,但在我的 Mac OS 计算机上(不是做 docker 容器,而是使用 brew 安装 CMAKE,然后创建 conda 环境),它不起作用。
为了能够从 Databricks 运行时 ML 8.2 YAML 文件安装所有依赖项,我遵循的步骤:
-
从 here 获取 Databricks 运行时 ML 8.2 YAML 文件(如 @alexott 在他的回答中提到的)
-
从 YAML 文件中删除那些 conda 库:
- libgfortran-ng=7.3.0
- ld_impl_linux-64=2.33.1
- libstdcxx-ng=9.1.0
- libgcc-ng=9.1.0
因此 YAML 文件将如下所示:
name: databricks-ml-8.2
channels:
- pytorch
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- absl-py=0.11.0=pyhd3eb1b0_1
- aiohttp=3.7.4=py38h27cfd23_1
- asn1crypto=1.4.0=py_0
- astor=0.8.1=py38h06a4308_0
- async-timeout=3.0.1=py38h06a4308_0
- attrs=20.3.0=pyhd3eb1b0_0
- backcall=0.2.0=pyhd3eb1b0_0
- bcrypt=3.2.0=py38h7b6447c_0
- blas=1.0=mkl
- blinker=1.4=py38h06a4308_0
- boto3=1.16.7=pyhd3eb1b0_0
- botocore=1.19.7=pyhd3eb1b0_0
- brotlipy=0.7.0=py38h27cfd23_1003
- bzip2=1.0.8=h7b6447c_0
- c-ares=1.17.1=h27cfd23_0
- ca-certificates=2021.4.13=h06a4308_1 # (updated from 2021.1.19 in May 18,2021 maintenance update)
- cachetools=4.2.1=pyhd3eb1b0_0
- certifi=2020.12.5=py38h06a4308_0
- cffi=1.14.3=py38h261ae71_2
- chardet=3.0.4=py38h06a4308_1003
- click=7.1.2=pyhd3eb1b0_0
- cloudpickle=1.6.0=py_0
- configparser=5.0.1=py_0
- cpuonly=1.0=0
- cryptography=3.1.1=py38h1ba5d50_0
- cycler=0.10.0=py38_0
- cython=0.29.21=py38h2531618_0
- decorator=4.4.2=pyhd3eb1b0_0
- dill=0.3.2=py_0
- docutils=0.15.2=py38h06a4308_1
- entrypoints=0.3=py38_0
- ffmpeg=4.2.2=h20bf706_0
- flask=1.1.2=pyhd3eb1b0_0
- freetype=2.10.4=h5ab3b9f_0
- future=0.18.2=py38_1
- gitdb=4.0.5=py_0
- gitpython=3.1.12=pyhd3eb1b0_1
- gmp=6.1.2=h6c8ec71_1
- gnutls=3.6.5=h71b1129_1002
- google-auth=1.22.1=py_0
- google-auth-oauthlib=0.4.2=pyhd3eb1b0_2
- google-pasta=0.2.0=py_0
- gunicorn=20.0.4=py38h06a4308_0
- h5py=2.10.0=py38h7918eee_0
- hdf5=1.10.4=hb1b8bf9_0
- icu=58.2=he6710b0_3
- idna=2.10=pyhd3eb1b0_0
- importlib-metadata=2.0.0=py_1
- intel-openmp=2019.4=243
- ipykernel=5.3.4=py38h5ca1d4c_0
- ipython=7.19.0=py38hb070fc8_1
- ipython_genutils=0.2.0=pyhd3eb1b0_1
- isodate=0.6.0=py_1
- itsdangerous=1.1.0=pyhd3eb1b0_0
- jedi=0.17.2=py38h06a4308_1
- jinja2=2.11.2=pyhd3eb1b0_0
- jmespath=0.10.0=py_0
- joblib=0.17.0=py_0
- jpeg=9b=h024ee3a_2
- jupyter_client=6.1.7=py_0
- jupyter_core=4.6.3=py38_0
- kiwisolver=1.3.0=py38h2531618_0
- krb5=1.17.1=h173b8e3_0
- lame=3.100=h7b6447c_0
- lcms2=2.11=h396b838_0
- libedit=3.1.20191231=h14c3975_1
- libffi=3.3=he6710b0_2
- libopus=1.3.1=h7b6447c_0
- libpng=1.6.37=hbc83047_0
- libpq=12.2=h20c2e04_0
- libprotobuf=3.13.0.1=hd408876_0
- libsodium=1.0.18=h7b6447c_0
- libtiff=4.1.0=h2733197_1
- libuv=1.40.0=h7b6447c_0
- libvpx=1.7.0=h439df22_0
- lightgbm=3.1.1=py38h2531618_0
- lz4-c=1.9.2=heb0550a_3
- mako=1.1.3=py_0
- markdown=3.3.3=py38h06a4308_0
- markupsafe=1.1.1=py38h7b6447c_0
- matplotlib-base=3.2.2=py38hef1b27d_0
- mkl=2019.4=243
- mkl-service=2.3.0=py38he904b0f_0
- mkl_fft=1.2.0=py38h23d657b_0
- mkl_random=1.1.0=py38h962f231_0
- more-itertools=8.6.0=pyhd3eb1b0_0
- multidict=5.1.0=py38h27cfd23_2
- ncurses=6.2=he6710b0_1
- nettle=3.4.1=hbb512f6_0
- networkx=2.5=py_0
- ninja=1.10.2=py38hff7bd54_0
- nltk=3.5=py_0
- numpy=1.19.2=py38h54aff64_0
- numpy-base=1.19.2=py38hfa32c7d_0
- oauthlib=3.1.0=py_0
- olefile=0.46=py_0
- openh264=2.1.0=hd408876_0
- openssl=1.1.1k=h27cfd23_0 # (updated from 1.1.1i in May 18,2021 maintenance update)
- packaging=20.4=py_0
- pandas=1.1.3=py38he6710b0_0
- paramiko=2.7.2=py_0
- parso=0.7.0=py_0
- patsy=0.5.1=py38_0
- pexpect=4.8.0=pyhd3eb1b0_3
- pickleshare=0.7.5=pyhd3eb1b0_1003
- pillow=8.0.1=py38he98fc37_0
- pip=20.2.4=py38h06a4308_0
- plotly=4.14.3=pyhd3eb1b0_0
- prompt-toolkit=3.0.8=py_0
- prompt_toolkit=3.0.8=0
- protobuf=3.13.0.1=py38he6710b0_1
- psutil=5.7.2=py38h7b6447c_0
- psycopg2=2.8.5=py38h3c74f83_1
- ptyprocess=0.6.0=pyhd3eb1b0_2
- pyasn1=0.4.8=py_0
- pyasn1-modules=0.2.8=py_0
- pycparser=2.20=py_2
- pygments=2.7.2=pyhd3eb1b0_0
- pyjwt=1.7.1=py38_0
- pynacl=1.4.0=py38h7b6447c_1
- pyodbc=4.0.30=py38he6710b0_0
- pyopenssl=19.1.0=pyhd3eb1b0_1
- pyparsing=2.4.7=pyhd3eb1b0_0
- pysocks=1.7.1=py38h06a4308_0
- python=3.8.8=hdb3f193_4 # (updated from 3.8.5 in May 18,2021 maintenance update)
- python-dateutil=2.8.1=pyhd3eb1b0_0
- python-editor=1.0.4=py_0
- pytorch=1.8.1=py3.8_cpu_0
- pytz=2020.5=pyhd3eb1b0_0
- pyzmq=19.0.2=py38he6710b0_1
- readline=8.0=h7b6447c_0
- regex=2020.10.15=py38h7b6447c_0
- requests=2.24.0=py_0
- requests-oauthlib=1.3.0=py_0
- retrying=1.3.3=py_2
- rsa=4.7.2=pyhd3eb1b0_1
- s3transfer=0.3.6=pyhd3eb1b0_0
- scikit-learn=0.23.2=py38h0573a6f_0
- scipy=1.5.2=py38h0b6359f_0
- setuptools=50.3.1=py38h06a4308_1
- simplejson=3.17.2=py38h27cfd23_2
- six=1.15.0=py38h06a4308_0
- smmap=3.0.5=pyhd3eb1b0_0
- sqlite=3.33.0=h62c20be_0
- sqlparse=0.4.1=py_0
- statsmodels=0.12.0=py38h7b6447c_0
- tabulate=0.8.7=py38h06a4308_0
- threadpoolctl=2.1.0=pyh5ca1d4c_0
- tk=8.6.10=hbc83047_0
- torchvision=0.9.1=py38_cpu
- tornado=6.0.4=py38h7b6447c_1
- tqdm=4.50.2=py_0
- traitlets=5.0.5=pyhd3eb1b0_0
- typing-extensions=3.7.4.3=hd3eb1b0_0
- typing_extensions=3.7.4.3=pyh06a4308_0
- unixodbc=2.3.9=h7b6447c_0
- urllib3=1.25.11=py_0
- wcwidth=0.2.5=py_0
- websocket-client=0.57.0=py38_2
- werkzeug=1.0.1=pyhd3eb1b0_0
- wheel=0.35.1=pyhd3eb1b0_0
- wrapt=1.12.1=py38h7b6447c_1
- x264=1!157.20191217=h7b6447c_0
- xz=5.2.5=h7b6447c_0
- yarl=1.6.3=py38h27cfd23_0
- zeromq=4.3.3=he6710b0_3
- zipp=3.4.0=pyhd3eb1b0_0
- zlib=1.2.11=h7b6447c_3
- zstd=1.4.5=h9ceee32_0
- pip:
- argon2-cffi==20.1.0
- astunparse==1.6.3
- async-generator==1.10
- azure-core==1.11.0
- azure-storage-blob==12.7.1
- bleach==3.3.0
- confuse==1.4.0
- databricks-cli==0.14.3
- defusedxml==0.7.1
- diskcache==5.2.1
- docker==4.4.4
- flatbuffers==1.12
- gast==0.3.3
- grpcio==1.32.0
- horovod==0.21.3
- htmlmin==0.1.12
- imagehash==4.2.0
- ipywidgets==7.6.3
- joblibspark==0.3.0
- jsonschema==3.2.0
- jupyterlab-pygments==0.1.2
- jupyterlab-widgets==1.0.0
- keras-preprocessing==1.1.2
- koalas==1.7.0
- llvmlite==0.36.0
- missingno==0.4.2
- mistune==0.8.4
- mleap==0.16.1
- mlflow-skinny==1.15.0
- msrest==0.6.21
- nbclient==0.5.3
- nbconvert==6.0.7
- nbformat==5.1.2
- nest-asyncio==1.5.1
- notebook==6.3.0
- numba==0.53.1
- opt-einsum==3.3.0
- pandas-profiling==2.11.0
- pandocfilters==1.4.3
- petastorm==0.9.8
- phik==0.11.2
- prometheus-client==0.9.0
- pyarrow==1.0.1
- pyrsistent==0.17.3
- pywavelets==1.1.1
- pyyaml==5.4.1
- querystring-parser==1.2.4
- seaborn==0.10.0
- send2trash==1.5.0
- shap==0.39.0
- slicer==0.0.7
- spark-tensorflow-distributor==0.1.0
- tangled-up-in-unicode==0.0.7
- tensorboard==2.4.1
- tensorboard-plugin-wit==1.8.0
- tensorflow-cpu==2.4.1
- tensorflow-estimator==2.4.0
- termcolor==1.1.0
- terminado==0.9.4
- testpath==0.4.4
- visions==0.6.0
- webencodings==0.5.1
- widgetsnbextension==3.5.1
- xgboost==1.3.3
prefix: /databricks/conda/envs/databricks-ml
- 创建您的 Dockerfile:
FROM continuumio/miniconda3
# Install CMAKE
RUN apt-get update && apt-get install build-essential -y
RUN apt-get -y install cmake
ADD <your-yaml-file.yml> /tmp/environment.yml
# ADD requirements.txt /tmp/requirements.txt
RUN conda env create -f /tmp/environment.yml
# Get the environment name out of the environment.yml
RUN echo "source activate $(head -1 /tmp/environment.yml | cut -d' ' -f2)" > ~/.bashrc
ENV PATH /opt/conda/envs/$(head -1 /tmp/environment.yml | cut -d' ' -f2)/bin:$PATH
现在你需要运行 docker 容器(一旦你构建了你的镜像)。不要忘记激活在该容器中创建的环境。
希望这对有同样问题的人有所帮助:)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。