微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

在 AWS Glue 上使用 Deequ

如何解决在 AWS Glue 上使用 Deequ

我在 AWS gluE 上使用 Deequ,令人惊讶的是,当我要运行在验证套件检查下列出的 hasMaxLength 时。我收到以下错误,有人可以帮忙吗?所有其他检查都通过/正在运行。它说支票 hasMaxLength 不是 amazon.deequ.checks 的成员

下载:s3://stg-dev-ire-library/jars/deequ-1.0.1.jar 到 ./jar0.jar SCRIPT_URL = /tmp/g-6aa13d15270ba0853894d7d6f2d26459f810d2ab-4863242765668262538/script_2021-02-04-08-26-12.scala 编译结果:/tmp/g-6aa13d15270ba0853894d7d6f2d26459f810d2ab-4863242765668262538/script_2021-02-04-08-26-12:scala.scala.com.aaa.scala.scala.com.de.com中错误a.scala deequ.analyzers.{Analyzer,Histogram,Patterns,State,KLLParameters} ^ /tmp/g-6aa13d15270ba0853894d7d6f2d26459f810d2ab-486324276566826250206826250206826250206826250206826250206-script26250206-script2625038/g-6aa13d15270ba0853895 com.amazon.deequ.checks.Check 可能的原因:可能在“value hasMaxLength”之前缺少分号? .hasMaxLength("* External Number",_==40) ^ 发现两个错误编译失败。

代码如下:

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import com.amazon.deequ.analyzers.runners.{AnalysisRunner,AnalyzerContext}
import com.amazon.deequ.analyzers.runners.AnalyzerContext.successMetricsAsDataFrame
import com.amazon.deequ.{VerificationSuite,VerificationResult}
import com.amazon.deequ.VerificationResult.checkResultsAsDataFrame
import com.amazon.deequ.checks.{Check,CheckLevel}
import com.amazon.deequ.constraints.{ConstrainableDataTypes}
import org.apache.spark.sql.functions.{length,max}





 object Deequ {


  def main(args: Array[String]) {
  val conf = new SparkConf().setAppName("dq")
  val spark = SparkSession.builder().appName("dq").getorCreate()

 val dataset = spark.read.option("header",true).option("delimiter",",").csv("s3://ct-ire- 
        fin-stg-data-dev-raw-gib/templates /Contracts_And_Coverages/FPSL- 
   CONTRACTS=VALIDATIONS-v2 - Sheet1.csv") 
   val verificationResult: VerificationResult = { VerificationSuite()
 // data to run the verification on
    .onData(dataset)
 // define a data quality check
  .addCheck(
      Check(CheckLevel.Error,"Template Validations") 



  .hasDataType("* External Number",ConstrainableDataTypes.String)
  .hasMaxLength("* External Number",_==40)
  .isComplete("* External Number")


  .hasDataType("* Business Record Date",ConstrainableDataTypes.Integral )
  .hasMax("* Business Record Date",_ == 8)
  .isComplete("* Business Record Date")
  
  .hasDataType("Organizational Unit (Owner)",ConstrainableDataTypes.String )
  .hasMax("Organizational Unit (Owner)",_ == 10)
  .isComplete("Organizational Unit (Owner)")
  //failing
  .isContainedIn("Organizational Unit (Owner)",Array("50000252","50000256","50000257"))
  
  .hasDataType("Object Status",ConstrainableDataTypes.Integral )
  .hasMax("Object Status",_ == 3)
  .isComplete("Object Status")
  .isContainedIn("Object Status",Array("0","1"))
  
  .hasDataType("Delivery Package",ConstrainableDataTypes.String )
  .hasMax("Delivery Package",_ == 20)
  .isComplete("Delivery Package")
  
  
  
  .hasDataType("Product Code",ConstrainableDataTypes.Integral)
  .hasMax("Product Code",_ == 10)
  .isComplete("Product Code")
  
  
  
  
  .hasDataType("Source System Basic Data",ConstrainableDataTypes.String )
  .hasMax("Source System Basic Data",_ == 10)
  .isComplete("Source System Basic Data")
  //LFST,CLPB,CLCB,CLHR,cclU
  //.isContainedIn("Source System Basic Data",Array("LSFT","CLPB","CLCB","CLHR","cclU"))
  .isContainedIn("Source System Basic Data",Array("LSFT"))
  
  .hasDataType("Production Control",ConstrainableDataTypes.String )
  .hasMax("Production Control",_ == 20)
  .isComplete("Production Control")
  .isContainedIn("Production Control",Array("Z_UL_PR1"))

  
  .hasDataType("Date of Start of Term",ConstrainableDataTypes.Integral )
  .hasMax("Date of Start of Term",_ == 8)
  .isComplete("Date of Start of Term")
  
  .hasDataType("Date of End of Term",ConstrainableDataTypes.Integral )
  .hasMax("Date of End of Term",_ == 8)
  .isComplete("Date of End of Term")
  
  .hasDataType("Legal Entity",ConstrainableDataTypes.Integral )
  .hasMax("Legal Entity",_ == 10)
  .isComplete("Legal Entity")
  //2001
  .isContainedIn("Legal Entity",Array("2001"))
  
  .hasDataType("IFRS17 Category",ConstrainableDataTypes.Integral )
  .hasMax("IFRS17 Category",_ == 80)
  .isComplete("IFRS17 Category")
  //To Change 
  .isContainedIn("IFRS17 Category",Array("1","2","3","4"))
  
  .hasDataType("IFRS17 Portfolio",ConstrainableDataTypes.String )
  .hasMax("IFRS17 Portfolio",_ == 40)
  .isComplete("IFRS17 Portfolio")


 )
  


   // compute metrics and verify check conditions
   .run()
     }
      //val metrics1 = successMetricsAsDataFrame(spark,analysisResult1)
        val resultDataFrame = checkResultsAsDataFrame(spark,verificationResult)
        resultDataFrame.write.mode("overwrite").parquet("s3://ct-ire-fin-stg-data-dev-raw- 
        gib/template_validations/template_validations_lifestyle/")
  
       }
      }

解决方法

你的代码看起来不错,我看不出明显的错误原因。我建议您验证以下内容:

  • 您应该使用最新版本的 deequ
  • 您需要使用与您的 Spark 版本匹配的 deequ 版本

您可能还想检查其他导入错误的来源 (error: object KLLParameters is not a member of package com.amazon.deequ.analyzers import com.amazon.deequ.analyzers.{Analyzer,Histogram,Patterns,State,KLLParameters})。解决此错误也可能有助于您解决 hasMaxLength 问题。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。