如何解决给定输入列,无法解析“ column_name”:SparkSQL
我这里有一段简单的代码:
query_campaigns = """
select camp.campaign_id,camp.external_id,camp.start_date,camp.program_type,camp.advertiser_id from ads.dim_campaigns camp
"""
df_campaigns = spark.sql(query_campaigns)
我看到一条错误消息:
> > "cannot resolve '`camp.campaign_id`' given input columns:
> > [camp.ecs_snapshot,camp.ecs_version,camp.ecs_bundle_type]; line 2
> > pos 11;\n'Project ['camp.campaign_id,'camp.external_id,> > 'camp.start_date,'camp.program_type,'camp.advertiser_id]\n+-
> > SubqueryAlias `camp`\n +- SubqueryAlias `ads`.`dim_campaigns`\n
> > +- HiveTableRelation `ads`.`dim_campaigns`,amazon.conexio.hive.serde.edx.GenericEDXSerDe,[ecs_snapshot#192L,> > ecs_version#193L,ecs_bundle_type#194],Statistics(sizeInBytes=8.0 EB,> > hints=none)\n" Traceback (most recent call last): File
> > "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py",line
> > 767,in sql
> > return DataFrame(self._jsparkSession.sql(sqlQuery),self._wrapped) File
> > "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",> > line 1257,in __call__
> > answer,self.gateway_client,self.target_id,self.name) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py",> line 69,> > in deco
> > raise AnalysisException(s.split(': ',1)[1],stackTrace) pyspark.sql.utils.AnalysisException: "cannot resolve
> > '`camp.campaign_id`' given input columns: [camp.ecs_snapshot,> > camp.ecs_version,camp.ecs_bundle_type]; line 2 pos 11;\n'Project
> > ['camp.campaign_id,'camp.start_date,> > 'camp.program_type,'camp.advertiser_id]\n+- SubqueryAlias `camp`\n
> > +- SubqueryAlias `ads`.`dim_campaigns`\n +- HiveTableRelation `ads`.`dim_campaigns`,> amazon.conexio.hive.serde.edx.GenericEDXSerDe,> > [ecs_snapshot#192L,ecs_version#193L,> > Statistics(sizeInBytes=8.0 EB,hints=none)\n"
根据提供的解决方案尝试了一切。有趣的是,我对另一个工作正常的表有另一个查询。希望对此有所帮助。预先感谢。
这是表的架构:
dim_campaigns (
marketplace_id numeric(38,0) NOT NULL encode raw,campaign_id numeric(38,campaign_name varchar(765) NULL encode zstd,campaign_status varchar(765) NULL encode zstd,program_type varchar(765) NULL encode zstd,entity_id varchar(765) NULL encode zstd,external_id varchar(765) NULL encode zstd,advertiser_id numeric(38,0) NULL encode zstd,internal_status varchar(765) NULL encode zstd,start_date timestamp without time zone NULL encode zstd,bid_adjustment_percentage numeric(38,0) NULL encode az64,PRIMARY KEY (marketplace_id,campaign_id)
)
DISTKEY(campaign_id)
SORTKEY(marketplace_id);
解决方法
表camp.campaign_id
中不存在ads.dim_campaigns
列
此查询有效
>>> l = [[1],[2],[3]]
>>> df = spark.createDataFrame(l,['col_1'])
>>> df.createOrReplaceTempView('table')
>>> query = """SELECT table_alias.col_1 FROM table table_alias"""
>>> spark.sql(query).show()
+-----+
|col_1|
+-----+
| 1|
| 2|
| 3|
+-----+
此查询给出的错误与您的错误相同(请参见col_x
而不是col_1
)
>>> l = [[1],['col_1'])
>>> df.createOrReplaceTempView('table')
>>> query = """SELECT table_alias.col_x FROM table table_alias"""
>>> spark.sql(query).show()
/.../
Traceback (most recent call last):
File "<stdin>",line 1,in <module>
File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/session.py",line 767,in sql
return DataFrame(self._jsparkSession.sql(sqlQuery),self._wrapped)
File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",line 1257,in __call__
File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/utils.py",line 69,in deco
raise AnalysisException(s.split(': ',1)[1],stackTrace)
pyspark.sql.utils.AnalysisException: "cannot resolve '`table_alias.col_x`' given input columns: [table_alias.col_1];
,
请尝试运行代码并显示结果。
import spark.implicits._
val df1 = spark.table("ads.dim_campaigns")
df1.printSchema()
// Please,show result
val df2 = df1.select(
'campaign_id,'external_id,'start_date,'program_type,'advertiser_id
)
df2.printSchema()
// please,show result
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。