微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

AWS Glue 2.0 连接超时

如何解决AWS Glue 2.0 连接超时

我正在尝试使用 AWS glue 将数据从 EMR Hive 传输到 Redshift。 在创建 EMR 集群时,我提到将 glue Catalog 用于表元数据。 还创建并测试了 Redshift 胶水连接和爬虫。

为了测试我的脚本,我创建了一个开发端点并打开了 jupyter notebook。下面是代码

from pyspark.sql.functions import *
from pyspark.context import SparkContext
from pyspark.sql.window import *
from datetime import date,timedelta,datetime
from dateutil.relativedelta import relativedelta
from pyspark.sql.types import DateType,TimestampType
from awsglue.utils import getResolvedOptions
from awsglue.context import glueContext
from awsglue.job import Job
import sys
from awsglue.dynamicframe import DynamicFrame

glueContext = glueContext(SparkContext.getorCreate())
spark = glueContext.sparkSession

customer = spark.table('demo.customers').where(col("insert_timestamp") >= '2020-04-01')
cust_dyn = DynamicFrame.fromDF(customer,glueContext,'cust_dyn')
glueContext.write_dynamic_frame.from_catalog(frame = cust_dyn,database = "classic_models",catalog_connection = 'Redshift',table_name = "dev_classic_models_customers",redshift_tmp_dir = 's3://temp/')

现在,当我通过使用 Redshift 连接创建胶水 2.0 作业来运行此程序时,出现下面提到的超时错误

Traceback (most recent call last):
  File "/tmp/Demo",line 16,in <module>
    customer = spark.table('demo.customers').where(col("insert_timestamp") >= '2020-04-01')
  File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/session.py",line 780,in table
    return DataFrame(self._jsparkSession.table(tableName),self._wrapped)
  File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",line 1257,in __call__
    answer,self.gateway_client,self.target_id,self.name)
  File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py",line 69,in deco
    raise AnalysisException(s.split(': ',1)[1],stackTrace)
pyspark.sql.utils.AnalysisException: 'java.lang.RuntimeException: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to glue.ap-south-1.amazonaws.com:443 [glue.ap-south-1.amazonaws.com/3.7.225.56,glue.ap-south-1.amazonaws.com/13.234.179.141,glue.ap-south-1.amazonaws.com/13.127.192.167,glue.ap-south-1.amazonaws.com/13.235.123.170] Failed: connect timed out;'

当我不添加 Redshift 连接而只使用 select 语句时,我注意到一件事,然后它运行良好。 但是没有 Redshift 连接,我无法将数据传输到 Redshift 数据库。 它失败的任何原因可以做什么? 谢谢

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。