如何解决PySpark:在访问 RowMatrix 对象期间出现错误“No module named 'numpy'”
我在 pyspark 中从 RDD 创建了一个 Rowmatix 对象。但是当我尝试对其调用 numRows 方法时,我遇到了“没有名为 numpy 的模块”的错误。
代码:
import pandas as pd
import numpy as np
import os
from pyspark.mllib.linalg.distributed import RowMatrix
from pyspark import SparkContext
from pyspark.sql.session import SparkSession
from pyspark import SparkConf
conf = SparkConf().setAll([("spark.pyspark.python","/usr/local/bin/python3")])
sc= SparkContext(conf=conf)
spark = SparkSession(sc)
data = [[1,2],[3,4],[5,6]]
rows = sc.parallelize(data)
print(type(rows))
rmat = RowMatrix(rows)
print(type(rmat))
print(rmat.numRows())
sc.stop()
spark.stop()
错误堆栈跟踪:
Py4JJavaError: An error occurred while calling o89.numRows.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 Failed 4 times,most recent failure: Lost task 0.3 in stage 0.0 (TID 4,cgp-ebi-wkr08.robi.com.bd,executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/location_of_cloudera/lib/spark/python/lib/pyspark.zip/pyspark/worker.py",line 364,in main
func,profiler,deserializer,serializer = read_command(pickleSer,infile)
File "/location_of_cloudera/lib/spark/python/lib/pyspark.zip/pyspark/worker.py",line 69,in read_command
command = serializer._read_with_length(file)
File "/location_of_cloudera/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py",line 173,in _read_with_length
return self.loads(obj)
File "/location_of_cloudera/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py",line 587,in loads
return pickle.loads(obj,encoding=encoding)
File "/location_of_cloudera/lib/spark/python/lib/pyspark.zip/pyspark/mllib/__init__.py",line 28,in <module>
import numpy
ModuleNotFoundError: No module named 'numpy'
谁能告诉我为什么我会收到这个错误?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。