使用CloudFormation模板运行搜寻器

如何解决使用CloudFormation模板运行搜寻器

此CloudFormation模板可以按预期工作，并创建本文所需的所有资源：

Data visualization and anomaly detection using Amazon Athena and Pandas from Amazon SageMaker | AWS Machine Learning Blog

但是WorkflowStartTrigger资源实际上并未运行搜寻器。如何使用CloudFormation模板运行搜寻器？

Resources:
  MyRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          -
            Effect: "Allow"
            Principal:
              Service:
                - "glue.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      Path: "/"
      Policies:
        -
          PolicyName: "root"
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              -
                Effect: "Allow"
                Action: "*"
                Resource: "*"
 
  MyDatabase:
    Type: AWS::Glue::Database
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseInput:
        Name: "dbcrawler123"
        Description: "TestDatabaseDescription"
        LocationUri: "TestLocationUri"
        Parameters:
          key1 : "value1"
          key2 : "value2"
 
  MyCrawler2:
    Type: AWS::Glue::Crawler
    Properties:
      Description: example classifier
      Name: "testcrawler123"
      Role: !GetAtt MyRole.Arn
      DatabaseName: !Ref MyDatabase
      Targets:
        S3Targets:
          - Path: 's3://nytaxi162/'
      SchemaChangePolicy:
        UpdateBehavior: "UPDATE_IN_DATABASE"
        DeleteBehavior: "LOG"
      TablePrefix: test-
      Configuration: "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}}}"


  WorkflowStartTrigger:
    Type: AWS::Glue::Trigger
    Properties:
      Description: Trigger for starting the Crawler
      Name: StartTrigger
      Type: ON_DEMAND
      Actions:
        - CrawlerName: "testcrawler123"

解决方法

您应该能够通过创建附加到lambda的自定义资源来做到这一点，从而使lambda实际上执行启动搜寻器的操作。您甚至应该可以让它等待搜寻器完成其执行

CloudFormation 不直接运行爬虫，它只是创建它们。但是你可以在定义触发器的同时创建一个时间表来运行一个爬虫：

ScheduledJobTrigger:
  Type: 'AWS::Glue::Trigger'
  Properties:
    Type: SCHEDULED
    StartOnCreation: true
    Description: DESCRIPTION_SCHEDULED
    Schedule: cron(5 * * * ? *)
    Actions:
      - CrawlerName: "testcrawler123"
    Name: ETLGlueTrigger

如果需要在 CloudFormation 堆栈创建过程中运行爬虫，可以使用 Lambda。

使用CloudFormation模板运行搜寻器

如何解决使用CloudFormation模板运行搜寻器

解决方法

相关推荐