如何解决胶水中的 DynamicFrame 给出 SchemaBuilderException:记录不完整?
我使用以下代码创建了数据框:
input1 : test.csv
id,score,type,date
41,0.4,2,2020-12-19
42,0.41,2020-12-19
t_tbl="test"
t_db="test"
source_df=spark.read.option("delimiter",",").option("header","true").schema(schema).csv(source_path+'//test.csv')
tdf = DynamicFrame.fromDF(s_df,glueContext,"s_df")
target_path="s3://test//test"
sink = glueContext.getSink(connection_type="s3",path=target_path,enableupdateCatalog=True,\
updateBehavior="UPDATE_IN_DATABASE",partitionKeys='')
sink.setFormat("glueparquet")
sink.setCatalogInfo(catalogDatabase=t_db,catalogTableName=t_tbl)
sink.writeFrame(tdf)
我可以毫无错误地编写动态框架,并且可以在 Athena 中查询数据。
接下来我使用 test1.csv 创建新的数据框,通过执行下面的代码(与上面的代码相同)
input 2: test1.csv
id,date
43,2020-12-19
44,2020-12-19
t_tbl="test"
t_db="test"
source_df=spark.read.option("delimiter","true").schema(schema).csv(source_path+'//test1.csv')
tdf = DynamicFrame.fromDF(s_df,catalogTableName=t_tbl)
sink.writeFrame(tdf)
现在我收到如下错误:
遇到错误:
An error occurred while calling o386.pyWriteDynamicFrame.
: com.amazonaws.services.glue.schema.builders.SchemaBuilderException: Record is incomplete.
at com.amazonaws.services.glue.schema.builders.SchemaBuilder.build(SchemaBuilder.java:301)
at com.amazonaws.services.glue.schema.Schema.withoutField(Schema.java:604)
at com.amazonaws.services.glue.schema.Schema.withoutFields(Schema.java:611)
at com.amazonaws.services.glue.sinks.HadoopDataSink$$anonfun$writeDynamicFrame$1.apply(HadoopDataSink.scala:176)
at com.amazonaws.services.glue.sinks.HadoopDataSink$$anonfun$writeDynamicFrame$1.apply(HadoopDataSink.scala:148)
at com.amazonaws.services.glue.util.FileSchemeWrapper$$anonfun$executeWithQualifiedScheme$1.apply(FileSchemeWrapper.scala:66)
at com.amazonaws.services.glue.util.FileSchemeWrapper$$anonfun$executeWithQualifiedScheme$1.apply(FileSchemeWrapper.scala:66)
at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWith(FileSchemeWrapper.scala:58)
at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWithQualifiedScheme(FileSchemeWrapper.scala:66)
at com.amazonaws.services.glue.sinks.HadoopDataSink.writeDynamicFrame(HadoopDataSink.scala:147)
at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Traceback (most recent call last):
File "<stdin>",line 20,in write_to_sink
File "/mnt/yarn/usercache/livy/appcache/application_1609223793940_0004/container_1609223793940_0004_01_000001/Pyglue.zip/awsglue/data_sink.py",line 31,in writeFrame
return DynamicFrame(self._jsink.pyWriteDynamicFrame(dynamic_frame._jdf,callsite(),info),dynamic_frame.glue_ctx,dynamic_frame.name + "_errors")
File "/mnt/yarn/usercache/livy/appcache/application_1609223793940_0004/container_1609223793940_0004_01_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py",line 1257,in __call__
answer,self.gateway_client,self.target_id,self.name)
File "/mnt/yarn/usercache/livy/appcache/application_1609223793940_0004/container_1609223793940_0004_01_000001/pyspark.zip/pyspark/sql/utils.py",line 63,in deco
return f(*a,**kw)
File "/mnt/yarn/usercache/livy/appcache/application_1609223793940_0004/container_1609223793940_0004_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py",line 328,in get_return_value
format(target_id,".",name),value)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。