Raster :: Precision中不准确的随机概率：使用type =“ prob”用因子变量预测Cforest计算

如何解决Raster :: Precision中不准确的随机概率：使用type =“ prob”用因子变量预测Cforest计算

我遇到了与raster :: predict函数有关的问题，该函数与生成cforest对象的在场概率预测有关。

当我生成分类输出时，我获得了正确的输出-但是，当添加type =“ prob”以获取概率时，输出是一个奇怪的带，与分类输出不对应。我在此处附上了两张照片：Correct output based on classification predict和Banded,incorrect output based on type="prob" predict，详细介绍了这些问题。

您会注意到，输出上的棋盘图案会随机放置高概率和低概率的随机区域，而分类输出会在预期的位置创建存在（1）和不存在（0）块。

下面是一段代码片段，其中包含训练数据和用于预测的光栅块的一部分。

如果有人能深入了解如何使用此函数获取有效概率，我将不胜感激。

我在predfun函数中缺少一些重要的东西吗？

提前谢谢！

以下是示例数据：

library(raster)
training = structure(list(
 Presence = structure(c(1L,1L,2L,2L),.Label = c("0","1"),class = "factor"),Elevation = c(3937.63541666689,384.003472222217,401.357638888879,226.43749999999,21.3055555555572,305.399305555546,38.3402777777742,347.302083333335,168.156250000001,700.708333333328,1034.2013888889,1033.78125,1426.99305555577,912.874999999952,665.854166666672,657.187499999983,1181.97916666667,696.062499999948,976.812500000002,1017.98263888889),Region = structure(c(3L,5L,3L,4L,4L),.Label = c("1","2","3","4","5"),Protected = structure(c(1L,Population = structure(c(17L,13L,9L,6L,7L,8L,14L,1L),"5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20"),Mean_Diunal_Range = c(126,151,152,153,138,125,137,158,129,170,172,115,150,178,149,146,165),Isothermality = c(31,49,46,54,51,59,48,47,62,61,37,63,75,55,57),Mean_Temp_Wettest_Q = c(-87,338,348,135,236,193,305,322,247,253,249,210,105,313,252,238,267,250,255),Mean_Temp_Driest_Q = c(64,231,230,324,194,263,261,328,133,130,315,188,232,173,248,166,151),Precip_Wettest_M = c(119,30,18,3,16,8,22,12,11,165,71,13,201,85,110,121,91),Precip_Dryest_m = c(2,2,0),Precip_Seasonality = c(81,136,65,106,126,101,77,76,111,116,123,93),Precip_Warmest_Q = c(19,29,6,23,1,148,56,15,143,140,174,195),Precip_Coldest_Q = c(201,14,5,17,9,10,2),Agriculture_Date = c(9000,3000,7200,7600,3500,5000,1000,1700,1250,9500,4000,2700,1000),distance_to_highways = c(0.697535828668298,3.21145962385929,4.70553674201054,1.26025341029057,0.0306302671744576,1.29795751420045,0.341335382947427,2.01515485126633,1.90935593837289,1.00778111672525,0.0841242925270876,0.0135315745860043,2.39038097221274,1.49284290327056,1.21019159581485,5.71817373942967,1.64045219527117,0.121375728043842,0.535675418474612,1.08581690073317)),row.names = c(NA,-20L),class = c("data.table","data.frame"))

##Create Prediction Raster Brick

## Create Data Frame for Raster Brick
data_segment <- structure(list(
 Elevation = c(1187.9,1173.5,1158.2,1143.3,1125.6,1112.4,1232.7,1203.3,1176.7,1156.8,1138.9,1140.8,1249.9,1216.9,1193.2,1171.4,1153.5,1157.6,1261.5,1233.2,1208.2,1185.6,1175.5,1184.2,1289.1,1256.4,1240.7,1212.2,1208.1,1220.9,1304.6,1297.2,1286.6,1249.2,1231,1256.1,1341.5,1329.3,1328.5,1291.8,1250.2,1288.6,1379.2,1322.8,1305.6,1293.7,1234.9,1260.5,1354.5,1274.5,1257.7,1281.1,1230.8,1196.3,1288.5,1237.4,1221.5,1241.6,1212.7,1162.3,1199.4,1185.4,1194.5,1216.1,1207.4,1149.8,1150.6,1157,1176,1176.1,1155.6,1138,1145,1131.9,1121.1,1116.3,1109.4,1116.2),Region = c(2,Protected = c(1,1),Population = c(10,12),Mean_Diunal_Range = c(10.5,10.5,10.6,10.7,10.8,10.9,10.4,10.3,10.2,10.1,9.9,9.8,9.7),Isothermality = c(68.5,68.7,69.3,69.4,69.6,69.7,68.4,68.5,68.8,69.2,67.8,68.1,68.6,68.3,67.4,67.6,67.9,68,67.2,66.8,66.9,67.3,67.5,66.4,66.6,66.7,67,66.5,66.2,65.7,66.3,66,65.5,65.4,65.1,65.2,65.3,64.2,64.5,64.1,64.6,64.7,63.8,64,63.7,63.4,63.6,63.5,63.2,63.1),Mean_Temp_Wettest_Q = c(24.5,24.5,24.7,24.8,25,24.2,24.4,24.1,24.3,24.6,24,23.9,23.8,23.5,23.7,24.9,25),Mean_Temp_Driest_Q = c(22.4,22.4,22.5,22.7,22.8,22.9,22.1,22.3,22.2,22.6,21.9,21.8,21.7,21.5,21.6,21.2,21.3,22.5),Precip_Wettest_M = c(161,160,161,159,163,162,164,169,167,168,171,176,179,175,180,181,177,174),Precip_Dryest_m = c(6,7,7),Precip_Seasonality = c(100.8,101.7,103,105.2,105.9,107.2,99.8,101.8,103.4,105.5,106.3,106.8,101.1,105.1,106.2,100.8,102,103.1,105.3,106.6,107.6,101.4,102.5,103.8,106.9,102.2,102.7,104.9,100.1,101.3,102.6,104.1,105.7,98.8,101.9,103.6,104.8,105.8,99.2,103.5,104.5,106.4,102.4,104.2,100.7,104.7,101.2,103.9,104.6,105.6,106.7,106.7),Precip_Warmest_Q = c(97,96,94,89,86,84,100,97,90,87,99,95,91,88,92,98,104,93,109,108,88),Precip_Coldest_Q = c(23,21,19,20,27,26,28,24),Agriculture_Date = c(4000,4000),distance_to_highways = c(0.1,0.1,0.2,0.2)),class = "data.frame",-78L))

## Create RasterBrick and assign values from dataframe
segment_brick <- brick(nrow=13,ncol=6,xmn=39,xmx=39.25,ymn=3.833333,ymx=4.375,crs="+proj=longlat +datum=wgs84 +no_defs",nl=15)
values(segment_brick) <-  as.matrix(data_segment)

现在建模：

library(party)
##Fit Cforest Model
cforest_example = party::cforest(Presence ~ .,data = training,control = cforest_unbiased(ntree=10,mtry=3,trace=TRUE))


##Create factor object for raster predict
f = list(levels(training$Protected),levels(training$Region),levels(training$Population))

##Rename factor lists to match original data
names(f) <- c("Protected","Region","Population")

##create a wrapper function for Cforest predict in raster package
predfun <- function(m,d,...) predict(m,newdata=d,...)

##Predict Raster with no Probabilities - results in correct prediction
Attempt_Processed = raster::predict(segment_brick,cforest_example,OOB=TRUE,factors=f,fun=predfun,progress="text",na.rm=TRUE)
plot(Attempt_Processed)

##Predict Raster with Probabilities - results in odd banded image
Attempt_Processed_prob = raster::predict(segment_brick,na.rm=TRUE,type="prob")
plot(Attempt_Processed_prob)

Correct output based on classification predict

Banded,incorrect output based on type="prob" predict

解决方法

解决此问题的方法是查看模型的预测函数返回什么

p <- predict(cforest_example,newdata=training[1:3,],OOB=TRUE,type="prob")
p
#$`1`
#     Presence.0 Presence.1
#[1,]  0.5461538  0.4538462
#
#$`2`
#     Presence.0 Presence.1
#[1,]  0.5461538  0.4538462
#
#$`3`
#     Presence.0 Presence.1
#[1,]  0.5461538  0.4538462

它返回一个列表，该列表由unlist raster::predict

unlist(p)
#       11        12        21        22        31        32 
#0.5461538 0.4538462 0.5461538 0.4538462 0.5461538 0.4538462

这解释了条带图案

您可以使用这样的预测功能来解决此问题

pfun <- function(m,d,...) {
    p <- predict(m,newdata=d,...)
    matrix(unlist(p),ncol=2,byrow=TRUE)
}
    
pfun(cforest_example,training[1:3,type="prob")
#          [,1]      [,2]
#[1,] 0.5461538 0.4538462
#[2,] 0.5461538 0.4538462
#[3,] 0.5461538 0.4538462

您现在可以将其与RasterBrick一起使用

prob = raster::predict(segment_brick,cforest_example,factors=f,fun=pfun,index=1:2,na.rm=TRUE,type="prob")
plot(prob)
prob
#class      : RasterBrick 
#dimensions : 13,6,78,2  (nrow,ncol,ncell,nlayers)
#resolution : 0.04166667,0.04166669  (x,y)
#extent     : 39,39.25,3.833333,4.375  (xmin,xmax,ymin,ymax)
#crs        : +proj=longlat +datum=WGS84 +no_defs 
#source     : memory
#names      :   layer.1,layer.2 
#min values : 0.5461538,0.4538462 
#max values : 0.5461538,0.4538462

由于您只有2个概率，因此可以简化函数以仅返回一个类别的概率（在这种情况下为不存在）：

pfun <- function(m,byrow=TRUE)[,1]
}
prob = raster::predict(segment_brick,type="prob")