微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

无法在 Spark.Sql 中将字符串转换为日期,出现错误

如何解决无法在 Spark.Sql 中将字符串转换为日期,出现错误

我无法在 spark.sql 中将字符串转换为日期格式。当我传递原始字符串时,它会成功转换,但是当我尝试将该值存储在变量中并传递该参数时,会出现类型不匹配错误。尝试了许多不同的技术,但仍然出现相同的错误。有人可以帮我解决这个问题吗:

>>> s
'2020-10-23'
>>> type(s)
<type 'str'>
>>> spark.sql("""select cast('2020-10-23' as date)""").show()
+------------------------+                                                      
|CAST(2020-10-23 AS DATE)|
+------------------------+
|              2020-10-23|
+------------------------+

>>> spark.sql("""select cast("""+s+""" as date)""").show()
Traceback (most recent call last):
  File "<stdin>",line 1,in <module>
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/session.py",line 778,in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery),self._wrapped)
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",line 1257,in __call__
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/utils.py",line 69,in deco
    raise AnalysisException(s.split(': ',1)[1],stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve 'CAST(((2020 - 10) - 23) AS DATE)' due to data type mismatch: cannot cast int to date; line 1 pos 7;\n'Project [unresolvedalias(cast(((2020 - 10) - 23) as date),None)]\n+- OneRowRelation\n"


>>> spark.sql("""select cast("""+str(s)+""" as date)""").show()
Traceback (most recent call last):
  File "<stdin>",None)]\n+- OneRowRelation\n"
>>> 
>>> s
'2020-10-23'
>>> type(s)
<type 'str'>
>>> 
>>> spark.sql("""select cast(date_format("""+s+""",'yyyy-MM-dd') as date)""").show()
Traceback (most recent call last):
  File "<stdin>",stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve 'date_format(((2020 - 10) - 23),'yyyy-MM-dd')' due to data type mismatch: argument 1 requires timestamp type,however,'((2020 - 10) - 23)' is of int type.; line 1 pos 12;\n'Project [unresolvedalias(cast(date_format(((2020 - 10) - 23),yyyy-MM-dd,Some(America/New_York)) as date),None)]\n+- OneRowRelation\n"
>>> 

解决方法

您缺少将 s 括起来的单引号:

spark.sql("""select cast('"""+s+"""' as date)""")

为了说服自己需要单引号,您可以打印您的查询

print("""select cast("""+s+""" as date)""")

你会看到

select cast(2020-10-23 as date)

缺少单引号的地方。


或者,您可以将 s 定义为包含单引号的字符串:

s = "'2020-10-23'"

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。