不同阶段在 Hive 查询执行计划中做同样的事情

如何解决不同阶段在 Hive 查询执行计划中做同样的事情

Hive 版本：1.1.0-cdh5.15.2，我最近开始学习 hive 源代码及其工作原理。下面是我遇到的问题

explain insert into testv1 select * from test_textfile  where val >200;

上面是一个简单的查询，下面是执行计划

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-7 depends on stages: Stage-1,consists of Stage-4,Stage-3,Stage-5
  Stage-4
  Stage-0 depends on stages: Stage-4,Stage-6
  Stage-2 depends on stages: Stage-0
  Stage-3
  Stage-5
  Stage-6 depends on stages: Stage-5

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: test_textfile
            Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (val > 200) (type: boolean)
              Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: UDFToString(val) (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: true
                  Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
                      output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetoutputFormat
                      serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
                      name: test.testv1

  Stage: Stage-7
    Conditional Operator

  Stage: Stage-4
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://xlclusterns1/tmp/hive-stagingdir/staging_hive_2021-04-14_15-14-30_205_4974356220876798617-1/-ext-10000

  Stage: Stage-0
    Move Operator
      tables:
          replace: false
          table:
              input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
              output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetoutputFormat
              serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
              name: test.testv1

  Stage: Stage-2
    Stats-Aggr Operator

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: true
              table:
                  input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
                  output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetoutputFormat
                  serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
                  name: test.testv1

  Stage: Stage-5
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: true
              table:
                  input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
                  output format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetoutputFormat
                  serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
                  name: test.testv1

  Stage: Stage-6
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://xlclusterns1/tmp/hive-stagingdir/staging_hive_2021-04-14_15-14-30_205_4974356220876798617-1/-ext-10000

问题是我无法解释为什么 stage-3 和 stage-5 做同样的事情，有人知道这个问题吗？

不同阶段在 Hive 查询执行计划中做同样的事情

如何解决不同阶段在 Hive 查询执行计划中做同样的事情

相关推荐