微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Pyspark'from_json',所有json值的数据帧返回null

如何解决Pyspark'from_json',所有json值的数据帧返回null

我下面有包含文本和json字符串的日志

2020-09-24T08:03:01.633Z 11.21.23.1 {"EventTime":"2020-09-24 13:33:01","Hostname":"abc-cde.india.local","Keywords":-1234}

为上面的日志创建DF,如下所示


| Date     |Source IP  | Event Type
|2020-09-24|11.21.23.1 | {"EventTime":"202|

用于将json字符串转换为另一个数据帧的架构模式

json_schema = StructType([
        StructField("EventTime",StringType()),StructField("Hostname",StructField("Keywords",IntegerType())
    ])

json_converted_df= df.select(F.from_json(F.col('Event Type'),json_schema).alias("data")).select("data.*").show()

但所有新的json模式的数据框都重新运行为空

+---------+--------+--------
|EventTime|Hostname|Keywords|
+---------+--------+--------
|     null|    null|null    |
+---------+--------+--------

如何解决此问题?

解决方法

和我一起很好...

# Preparation of test dataset

a = [
    (
        "2020-09-24T08:03:01.633Z","11.21.23.1",'{"EventTime":"2020-09-24 13:33:01","Hostname":"abc-cde.india.local","Keywords":-1234}',),]

b = ["Date","Source IP","Event Type"]

df = spark.createDataFrame(a,b)

df.show()
#+--------------------+----------+--------------------+
#|                Date| Source IP|          Event Type|
#+--------------------+----------+--------------------+
#|2020-09-24T08:03:...|11.21.23.1|{"EventTime":"202...|
#+--------------------+----------+--------------------+

df.printSchema()
#root
# |-- Date: string (nullable = true)
# |-- Source IP: string (nullable = true)
# |-- Event Type: string (nullable = true)
# Your code executed
from pyspark.sql.types import *

json_schema = StructType(
    [
        StructField("EventTime",StringType()),StructField("Hostname",StructField("Keywords",IntegerType()),]
)

json_converted_df = df.select(
    F.from_json(F.col("Event Type"),json_schema).alias("data")
).select("data.*")

json_converted_df.show()
#+-------------------+-------------------+--------+
#|          EventTime|           Hostname|Keywords|
#+-------------------+-------------------+--------+
#|2020-09-24 13:33:01|abc-cde.india.local|   -1234|
#+-------------------+-------------------+--------+

json_converted_df.printSchema()
#root
# |-- EventTime: string (nullable = true)
# |-- Hostname: string (nullable = true)
# |-- Keywords: integer (nullable = true)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。