如何解决AWS Crawler S3 目标路径发生变化,但包括旧路径表
我有一个 AWS Crawler,我正在切换 s3 目标路径以切换基础表源。问题是表是从两个目标创建的:
配置:
aws glue get-crawler --name sand-main
{
"Crawler": {
"Name": "sand-main","Role": "Crawler-sand","Targets": {
"S3Targets": [
{
"Path": "s3://sand-main-green/main","Exclusions": [
"checkpoints/**","IsActive.txt","isactive.txt"
]
}
],"JdbcTargets": [],"MongoDBTargets": [],"DynamoDBTargets": [],"CatalogTargets": []
},"DatabaseName": "sand_main","Description": "","Classifiers": [],"RecrawlPolicy": {
"RecrawlBehavior": "CRAWL_EVERYTHING"
},"SchemaChangePolicy": {
"UpdateBehavior": "UPDATE_IN_DATABASE","DeleteBehavior": "DELETE_FROM_DATABASE"
},"LineageConfiguration": {
"CrawlerLineageSettings": "DISABLE"
},"State": "READY","CrawlElapsedTime": 0,"CreationTime": "2020-09-30T14:07:25-06:00","LastUpdated": "2021-01-28T11:32:15-07:00","LastCrawl": {
"Status": "SUCCEEDED","LogGroup": "/aws-glue/crawlers","LogStream": "sand-main","MessagePrefix": "5bb1907d-2847-46ef-8712-3a50deb2b7a0","StartTime": "2021-01-28T11:32:35-07:00"
},"Version": 24,"Configuration": "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"}},\"Grouping\":{\"TableGroupingPolicy\":\"CombineCompatibleSchemas\"}}"
}
}
我有一个将从以下位置切换的 lambda 的路径:
"Path": "s3://sand-main-green/main"
到:
"Path": "s3://sand-main-blue/main"
但我最终得到了表格:
名称 -> 位置
测试 -> s3://sand-main-blue/main/test
test_2398l50df -> s3://sand-main-green/main/test
我有 DELETE_IN_DATABASE
所以我希望旧的 s3 路径被删除。感觉就像爬虫保留了其 s3 目标的历史记录。我不想要这种行为
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。