微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

用Apache压缩到HBase Standalone中的WordCount

目前,我正在评估Apache Crunch.我遵循一个简单的WordCount MapReduce job example
之后,我尝试将结果保存到独立的HBase中. HBase正在运行(已通过jps和HBase Shell进行检查),如下所述:http://hbase.apache.org/book/quickstart.html

现在,我采用编写HBase的示例:

Pipeline pipeline = new MRPipeline(WordCount.class,getConf());
PCollection<String> lines = pipeline.readTextFile(inputPath);
PTable<String,Long> counts = noStopWords.count();
pipeline.write(counts, new HBaseTarget("wordCountOutTable");
PipelineResult result = pipeline.done();

我收到异常:“ exception:java.lang.illegalArgumentException:HBaseTarget仅支持放置和删除”

任何线索出了什么问题?

解决方法:

PTable可能是PCollection,但是HBaseTarget只能处理Put或Delete对象.因此,您必须将PTable转换为PCollection,其中集合的每个元素都是Put或Delete.看一下Crunch-Examples完成的地方.

转换示例如下所示:

 public PCollection<Put> createPut(final PTable<String, String> counts) {
   return counts.parallelDo("Convert to puts", new DoFn<Pair<String, String>, Put>() {
     @Override
     public void process(final Pair<String, String> input, final Emitter<Put> emitter) {
       Put put;
       // input.first is used as row key
       put = new Put(Bytes.toBytes(input.first())); 
       // the value (input.second) is added with its family and qualifier
       put.add(COLUMN_FAMILY_TARGET, COLUMN_QUALIFIER_TARGET_TEXT, Bytes.toBytes(input.second())); 
       emitter.emit(put);
     }
   }, Writables.writables(Put.class));
 }

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐