微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

使用 spark-shell

如何解决使用 spark-shell

我正在尝试使用 spark-shell 安装 PySpark 包 Graphframes

pyspark --packages graphframes:graphframes:0.8.1-spark3.0-s_2.12

然而,在终端中有这样的错误

root@hpcc:~# pyspark --packages graphframes:graphframes:0.8.1-spark3.0-s_2.12
Python 3.6.9 (default,Jan 26 2021,15:33:00) 
[GCC 8.4.0] on linux
Type "help","copyright","credits" or "license" for more information.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/root/spark-3.0.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.0.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/root/spark-3.0.2-bin-hadoop3.2/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-bb0fc7e9-5af7-4189-98e4-7ac76a8d97a9;1.0
    confs: [default]
:: resolution report :: resolve 2691ms :: artifacts dl 1ms
    :: modules in use:
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
    ---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
        module not found: graphframes#graphframes;0.8.1-spark3.0-s_2.12

    ==== local-m2-cache: tried

      file:/root/.m2/repository/graphframes/graphframes/0.8.1-spark3.0-s_2.12/graphframes-0.8.1-spark3.0-s_2.12.pom

      -- artifact graphframes#graphframes;0.8.1-spark3.0-s_2.12!graphframes.jar:

      file:/root/.m2/repository/graphframes/graphframes/0.8.1-spark3.0-s_2.12/graphframes-0.8.1-spark3.0-s_2.12.jar

    ==== local-ivy-cache: tried

      /root/.ivy2/local/graphframes/graphframes/0.8.1-spark3.0-s_2.12/ivys/ivy.xml

      -- artifact graphframes#graphframes;0.8.1-spark3.0-s_2.12!graphframes.jar:

      /root/.ivy2/local/graphframes/graphframes/0.8.1-spark3.0-s_2.12/jars/graphframes.jar

    ==== central: tried

      https://repo1.maven.org/maven2/graphframes/graphframes/0.8.1-spark3.0-s_2.12/graphframes-0.8.1-spark3.0-s_2.12.pom

      -- artifact graphframes#graphframes;0.8.1-spark3.0-s_2.12!graphframes.jar:

      https://repo1.maven.org/maven2/graphframes/graphframes/0.8.1-spark3.0-s_2.12/graphframes-0.8.1-spark3.0-s_2.12.jar

    ==== spark-packages: tried

      https://dl.bintray.com/spark-packages/maven/graphframes/graphframes/0.8.1-spark3.0-s_2.12/graphframes-0.8.1-spark3.0-s_2.12.pom

      -- artifact graphframes#graphframes;0.8.1-spark3.0-s_2.12!graphframes.jar:

      https://dl.bintray.com/spark-packages/maven/graphframes/graphframes/0.8.1-spark3.0-s_2.12/graphframes-0.8.1-spark3.0-s_2.12.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: graphframes#graphframes;0.8.1-spark3.0-s_2.12: not found

        ::::::::::::::::::::::::::::::::::::::::::::::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: graphframes#graphframes;0.8.1-spark3.0-s_2.12: not found]
    at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1389)
    at org.apache.spark.deploy.DependencyUtils$.resolveMavendependencies(DependencyUtils.scala:54)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:308)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Traceback (most recent call last):
  File "/root/spark-3.0.2-bin-hadoop3.2/python/pyspark/shell.py",line 38,in <module>
    SparkContext._ensure_initialized()
  File "/root/spark-3.0.2-bin-hadoop3.2/python/pyspark/context.py",line 327,in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/root/spark-3.0.2-bin-hadoop3.2/python/pyspark/java_gateway.py",line 105,in launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
>>> quit()
root@hpcc:~# 

我使用的是 Ubuntu 操作系统 18.04.5 LTS

JDK 版本为 11.0.11

Scala 版本是 2.12.13

Spark-shel 版本是 3.0.2

我想知道是什么问题?以及如何克服这个问题?

解决方法

jar 必须从 repos.spark-packages.org 下载。不幸的是,在使用 pyspark 参数时,--packages 不会检查此存储库。如果您的机器有运行的 Maven 安装可用,解决问题的最简单方法是手动将 jar 下载到您的本地 Maven 存储库:

mvn org.apache.maven.plugins:maven-dependency-plugin:2.1:get
 -Dartifact=graphframes:graphframes:0.8.1-spark3.0-s_2.12 
 -DrepoUrl=https://repos.spark-packages.org

此命令会将 jar(以及所有必需的依赖项,如果有)下载到位于 /root/.m2/repository 的本地 Maven 存储库。从这个位置 pyspark 可以拿起罐​​子。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。