微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Pyspark - UnicodeEncodeError: 'ascii' 编解码器无法对位置 124 中的字符 '\ufffd' 进行编码:序号不在范围内 (128)

如何解决Pyspark - UnicodeEncodeError: 'ascii' 编解码器无法对位置 124 中的字符 '\ufffd' 进行编码:序号不在范围内 (128)

当我尝试使用以下代码在终端上显示 spark 数据框时,我收到了“UnicodeEncodeError”:

from pyspark.sql.types import StructType,StructField,StringType,IntegerType
import pyspark
from elasticsearch import Elasticsearch
from elasticsearch.exceptions import NotFoundError
### Creating Spark Session
spark = SparkSession \
                .builder \
                .appName("test") \
                .config("spark.executor.heartbeatInterval","60s") \
                .getorCreate() 

spark.conf.set('spark.sql.session.timeZone','UTC')
spark.sparkContext.setLogLevel("ERROR")

es_server_ip = "elasticsearch"
es_server_port = "9200"
es_conn = Elasticsearch("http://user:password@elasticsearch:9200",use_ssl=False,verify_certs=True)


#function to read dataframe from Elastic Search index
def readFromES(esIndex,esQuery):
    esDf = spark.read.format("org.elasticsearch.spark.sql") \
            .option("es.nodes",es_server_ip ) \
            .option("es.port",es_server_port) \
            .option("es.net.http.auth.user","user") \
            .option("es.net.http.auth.pass","password") \
            .option("es.net.ssl","false") \
            .option("es.net.ssl.cert.allow.self.signed","true") \
            .option("es.read.Metadata","false") \
            .option("es.mapping.date.rich","false") \
            .option("es.query",esQuery) \
            .load(esIndex)
    return esDf

#defining the elastic search query
q_ci = """{
       "query": {
        "match_all": {}
      }
    }"""

#invoking the function and saving the data to df1
df1 = readFromES("test_delete",q_ci)
df1.show(truncate=False)

错误

df1.show(truncate=False)
文件“/opt/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py”,行 382,在显示 UnicodeEncodeError: 'ascii' codec can't encode character 位置 124 中的 '\ufffd':序号不在范围内 (128)

我需要如下输出

+--------------------+------+-----+
|hostname            |kpi   |value|
+--------------------+------+-----+
|host4               |cpu   |95   |
|host3               |disk  |90   |
|Apr�ngli            |cpu   |78   |
|host2               |memory|85   |
+--------------------+------+-----+

您可以使用以下代码模拟数据框

data1 = [("Apr�ngli","cpu",78),("host2","memory",85),("host3","disk",90),("host4",95),]
schema1= StructType([ \
    StructField("hostname",StringType(),True),\
    StructField("kpi",\
    StructField("value",IntegerType(),True)
        ])
df1 = spark.createDataFrame(data=data1,schema=schema1)
df1.printSchema()
df1.show(truncate=False)

我采取的步骤: 正如其他stackoverflow答案中提到的,我做了以下但仍然收到错误

export PYTHONIOENCODING=utf8

版本详情:

PYTHON_VERSION=3.6.8
Spark version 2.4.5

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。