微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

胶水中的 DynamicFrame 给出 SchemaBuilderException:记录不完整?

如何解决胶水中的 DynamicFrame 给出 SchemaBuilderException:记录不完整?

我使用以下代码创建了数据框:

input1 : test.csv

     id,score,type,date
    41,0.4,2,2020-12-19
    42,0.41,2020-12-19

t_tbl="test"
t_db="test"
source_df=spark.read.option("delimiter",",").option("header","true").schema(schema).csv(source_path+'//test.csv')
tdf = DynamicFrame.fromDF(s_df,glueContext,"s_df")
target_path="s3://test//test"
sink = glueContext.getSink(connection_type="s3",path=target_path,enableupdateCatalog=True,\
              updateBehavior="UPDATE_IN_DATABASE",partitionKeys='')
sink.setFormat("glueparquet")
sink.setCatalogInfo(catalogDatabase=t_db,catalogTableName=t_tbl)
sink.writeFrame(tdf)

我可以毫无错误地编写动态框架,并且可以在 Athena 中查询数据。

接下来我使用 test1.csv 创建新的数据框,通过执行下面的代码(与上面的代码相同)

input 2: test1.csv

       id,date
        43,2020-12-19
        44,2020-12-19


t_tbl="test"
t_db="test"
source_df=spark.read.option("delimiter","true").schema(schema).csv(source_path+'//test1.csv')
 
tdf = DynamicFrame.fromDF(s_df,catalogTableName=t_tbl)
sink.writeFrame(tdf)

现在我收到如下错误

遇到错误

An error occurred while calling o386.pyWriteDynamicFrame.
: com.amazonaws.services.glue.schema.builders.SchemaBuilderException: Record is incomplete.
    at com.amazonaws.services.glue.schema.builders.SchemaBuilder.build(SchemaBuilder.java:301)
    at com.amazonaws.services.glue.schema.Schema.withoutField(Schema.java:604)
    at com.amazonaws.services.glue.schema.Schema.withoutFields(Schema.java:611)
    at com.amazonaws.services.glue.sinks.HadoopDataSink$$anonfun$writeDynamicFrame$1.apply(HadoopDataSink.scala:176)
    at com.amazonaws.services.glue.sinks.HadoopDataSink$$anonfun$writeDynamicFrame$1.apply(HadoopDataSink.scala:148)
    at com.amazonaws.services.glue.util.FileSchemeWrapper$$anonfun$executeWithQualifiedScheme$1.apply(FileSchemeWrapper.scala:66)
    at com.amazonaws.services.glue.util.FileSchemeWrapper$$anonfun$executeWithQualifiedScheme$1.apply(FileSchemeWrapper.scala:66)
    at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWith(FileSchemeWrapper.scala:58)
    at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWithQualifiedScheme(FileSchemeWrapper.scala:66)
    at com.amazonaws.services.glue.sinks.HadoopDataSink.writeDynamicFrame(HadoopDataSink.scala:147)
    at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:63)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

Traceback (most recent call last):
  File "<stdin>",line 20,in write_to_sink
  File "/mnt/yarn/usercache/livy/appcache/application_1609223793940_0004/container_1609223793940_0004_01_000001/Pyglue.zip/awsglue/data_sink.py",line 31,in writeFrame
    return DynamicFrame(self._jsink.pyWriteDynamicFrame(dynamic_frame._jdf,callsite(),info),dynamic_frame.glue_ctx,dynamic_frame.name + "_errors")
  File "/mnt/yarn/usercache/livy/appcache/application_1609223793940_0004/container_1609223793940_0004_01_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py",line 1257,in __call__
    answer,self.gateway_client,self.target_id,self.name)
  File "/mnt/yarn/usercache/livy/appcache/application_1609223793940_0004/container_1609223793940_0004_01_000001/pyspark.zip/pyspark/sql/utils.py",line 63,in deco
    return f(*a,**kw)
  File "/mnt/yarn/usercache/livy/appcache/application_1609223793940_0004/container_1609223793940_0004_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py",line 328,in get_return_value
    format(target_id,".",name),value)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。