## 1. Introduction

One of the key elements of water resources management and hydrological projects is to estimate the evaporation in a given region. This is even more important in managing water resources in arid and semi-arid regions [

1]. Some researchers have applied the Budyko framework, which is a straightforward model that considers only rainfall and potential evaporation as the required input for simulating and controlling various water management plans [

2,

3]. Simply, knowing the accurate amount of the evaporation is essential for water resources management projects.

Researchers have applied different approaches for modeling pan evaporation (EP) and evapotranspiration in the literature classified as (i) physically-based combination models that take into account mass and energy conservation principles; (ii) semi-physical models that use either mass or energy conservation; and (iii) data-driven models including soft computing and statistical techniques [

4,

5,

6]. The shortage of EP data (temporally or spatially) is a major problem in some areas because it is difficult and expensive to install evaporation pans. In these cases, applying data-driven and soft computing models for estimating water evaporation is an effective and appropriate approach [

7,

8,

9]. The accuracy of modeling approaches is the most important parameter to take into account.

Several researchers have used climatic variables to estimate EP values [

10,

11]. Climate-based approaches are appropriate methods when provided with specific climatic data, which cannot always be easily obtained from a determined area. Similarly, data-driven approaches, including computational intelligence and machine learning are also suitable for estimating the EP. Recently, regular hybrid and integrative data-driven models (e.g., artificial neural networks (ANN) as well as support vector machine (SVM) and adaptive neuro-fuzzy inference systems (ANFIS)) have been used for estimating the EP [

7,

12,

13,

14,

15,

16,

17,

18,

19,

20].

Tabari et al. [

21] estimated the daily EP of a region by using different methods (ANNs and multivariate nonlinear regression (MNLR)) and concluded that ANN was more accurate than MNLR. Kişi et al. [

22] applied three soft computing models, such as M5tree, ANN, and chi-squared automatic interaction, to predict the daily EPs in Turkey. They reported that the ANN model performed better than the two others. Tezel and Buyukyildiz [

23] investigated the usability of ANNs and ε-SVR to estimate monthly EP. According to the performance criteria, the ANN algorithms and ε-SVR had the same performance. Tezel and Buyukyildiz [

23] compared the accuracy of SVM basis ε-support vector regression, radial basis function (RBFNN), and multilayer perceptron ANN (MLPNN), and showed that the latter provided the most accurate results. Keshtegar and Kisi [

24] proposed the modified response surface method (RSM) and modified RSMs have been compared with ANFIS and M5Tree. Wang et al. [

25] investigated the capabilities of ANFIS, M5Tree, and fuzzy genetic (FG) for six stations in the Yangtze River Basin. The results stated that the FG model generally produced better results. In another study, Wang et al. [

26] compared the abilities of FG, SVR, MARS, M5Tree, and multiple linear regression (MLR). The overall results indicated that the soft computing models generally performed better than the regression methods. Ghorbani et al. [

27] applied a hybrid MLPNN for daily EP prediction at two stations. The results showed that the MLPNN model provided better performance compared to the SVM model. Majhi et al. [

28] applied a deep ANN model and compared it with the traditional MLPNN for three areas of the Chhattisgarh State in India. The findings of the study showed that the deep ANN model was more accurate than the traditional MLPNN. The abilities of ANN and extreme learning machine (ELM) models have been compared in predicting EP for two stations in Algeria by Sebbar et al. [

29]. The results indicated that the ELM could be successfully used to estimate the daily EP [

29]. Al-Mukhtar [

30] investigated the applicability of quantile regression forest for EP prediction in arid areas. In comparison to conventional NNs and linear regression models, the applied quantile regression forest provided better results. Mohammadi et al. [

31] predicted monthly EP using integrative ANFIS, MLP and RBNN models for two stations in Iran. The results showed that the integrative ANFIS model acted better compared to the MLP and RBNN. Yaseen et al. [

32] applied several machine learning models, including ANN, classification and regression tree (CART), gene expression programming and SVM for predicting EP in arid and semi-arid areas. The findings of the study indicated that the SVM was superior to the other applied models.

A literature review related to the kriging approach revealed that it has never been used to predict PE. However, this method was applied for the prediction of solar radiation [

33] and the daily total dissolved gas in aquatic systems concentration by Heddam et al. [

34]. The kriging interpolation, which is a flexible regression tool for approximating any nonlinear problem, can be introduced as a potential method for providing accurate EP predictions.

Indeed, soft computing techniques have provided satisfactory results in EP prediction [

35]. The majority of EP modeling reported the superiority of soft computing models over statistical models [

21,

26,

36]. The main objective of this study is to challenge the capability of soft computing techniques versus statistical data-driven models.

The present study investigates the accuracy of six soft computing methods, including M5tree, MARS, SVR, RBFNN, Levenberg–Marquardt perceptron ANN and conjugate gradient perceptron ANN and compares them with three different statistical approaches, RSM, kriging, and improved kriging. In improved kriging, the basic functions are transferred from polynomial to exponential functions to estimate monthly EP. In kriging models, the second-order polynomial function is applied as regressed function to predict nonlinear challenges. This function may not have predicted an accurate result for complex problems such as EP. Thus, novel improved modeling is parented to enhance the regressed function applied in original kriging. It uses a nonlinear transformation as an exponential map for input variables of the EP predictions as a complex engineering problem with nonlinear effect that the accurate results in the modeling process is a cap for prediction of the statistical regression approaches such as kriging and RSM models.

To our best knowledge, similar studies have not been carried out in applying the above-mentioned methods for the estimation of EP. The subsequent parts of the rest of the paper are organized as follows: In

Section 2, the two stations are introduced, and the data sets are presented. The third section deals with nine modeling methods applied in statistical and soft computing approaches. The results of the predicted EP are presented and compared in the fourth section. The fifth section of the paper deals with the hypothesis testing and relevant discussion. Finally, the last section provides the concluding remarks of the present work.

## 2. Case Study and Dataset

The input parameters for this study are monthly climatic data such as solar radiation (SR, as Langley), sunshine hours (HS), relative humidity (RH), wind speed (WS as m/s) as well as the minimum (Tmin as °C) and maximum temperature (Tmax as °C). Two stations in the Eastern Mediterranean Region as Adana (latitude 37.22° N, longitude 35.40° E and altitude 20 m) and Antakya (latitude 36.33° N, longitude 36.30° E and altitude 100 m) were selected for the comparing modeling results. The map of the study area is illustrated in

Figure 1. The studied area has a climate with cool and rainy winters, and moderately hot and dry summers and it receives yearly rainfall amounts of between 580 and 1300 mm. Data were gathered from the Turkish State Meteorological Service (TSMS) having a modernized calibration center. The calibration center was accredited by the Turkish Accreditation Agency to ensure the measurements’ reliability and to provide the validity of the measurements’ quality around the world. Temperature, RH, and WS calibration laboratories are accredited with TS EN ISO/IEC 17025 standards and work in accordance with this standard. Global radiation and wind direction data are also in accordance with the TS EN ISO/IEC 17025 standards. The evaporimeter used for obtaining pan evaporation in Turkey is the US Weather Bureau Class A pan. The raw datasets were directly utilized in the presented study without pre-processing. The available data period covers September 1981 to March 2016 for Adana and from October 1983 to December 2010 for Antakya. There is no gap in the data.

In

Figure 2, the general characteristics of the independent variables and the target value for the (a) Adana and (b) Antakya stations are shown using the box-whisker plot and related correlation values. These plots graphically depict the variability of each parameter in terms of minimum, quartiles, and maximum values. Moreover, outliers are plotted as individual points.

The Pearson correlation coefficient was applied to analyze the effects of Tmax, RH, WS, SR, and HS on EP. It can be seen from

Figure 2 that there were high correlations between EP, Tmin, Tmax, SR, and HS for both Stations. It is worth noting that the wind speed showed high correlation with EP for the Antakya Station (correlation = 0.804), whereas the reciprocal value of the WS correlation for the Adana Station is much lower (correlation = 0.245).

For both of the Stations, the correlation between the relative humidity (RH) and the EP was weaker than the other parameters. However, in the Antakya Station, the correlation value of RH is negative, which implies that the increase in relative humidity might lead to a decrease in EP. The main factors responsible for the EP in both Stations include sunshine hours (HS), Tmin, and SR. In the present study, data was split into two sets, 70% (for training) and 30% (for testing) for executing and assessing the applied models.

As for the sensitivity analysis, the best subset regression using adjusted R-Sq and Mallows CP was applied. The results indicated that all of the input variables have a significant impact on the EP variable; hence, all of the independent parameters are used as the input vector for constructing the models. Considering a descending order (from the most influential to the least influential independent variables on PE), the following results were achieved:

Antakya Station: HS, Tmin, SR, WS, Tmax, and RH.

Adana Station: HS, Tmin, SR, Tmax, WS, and RH.

## 3. Methods

Nine different approaches in terms of two main categories of statistical (RSM, kriging and improved kriging) and machine learning models (SVR, MARS, M5Tree, MLP-LM, MLP-CG and RBNN) were implemented for estimating EP.

#### 3.1. Artificial Neural Networks: MLP-LM, MLP-CG, RBFNN

The ANNs are adaptable learning structures constructed from different interconnected layers and a number of processing elements (called artificial neurons). So far, several types of ANNs were developed and implemented for simulating and predicting hydrological problems such as evaporation [

37]. Among all the developed models, the multilayer perceptron (MLP) and radial basis function neural network (RBFNN) have been used in several applications, and their potential in capturing nonlinear features of complex phenomena were proven by the following relation [

38,

39].

where

β_{0},

β_{j},

w_{j},

w_{ij} are respectively the biases and weights of the output and the M-hidden layer and NV represents the number of input variables.

f is an activation function for hidden neurons in the MLP and RBFNN models. Sigmoid functions were considered for the MLP and radial basis function were applied for the RBFNN models.

It should be noted that MLPs [

38] and RBFNNs [

40] can be considered as the fundamental versions of feed-forward networks with a supervised learning approach. In this study, two types of MLP networks have been developed using two different approaches for the learning algorithm: (1) the Levenberg–Marquardt algorithm, and (2) the conjugate gradient (CG) algorithm. In addition to the MLP-LM and MLP-CG neural networks, the efficiency of RBFNN was also challenged for the evaporation simulation [

41].

#### 3.2. Support Vector Regression (SVR)

The rapid application of SVMs in modeling various problems in engineering urges researchers to apply different types of SVMs to different research fields. The core analogy of constructing SVMs is to map variables from input space into high-dimensional feature space by using special functions as below [

42,

43]:

where

β_{0} is bias and

$K\left(\mathit{x},{\mathit{x}}_{i}\right)$ is the Kernel function for transferring the input data from x-space to N-set feature space which is computed as below relation [

44]:

where,

$\sigma $ is the parameter of the Kernel function.

${\alpha}_{i}$ and

${\alpha}_{i}^{*}$ represent the Lagrange multipliers as unknown coefficients in the SVR model. Recently, the application of the SVR model in hydrological time series modeling has provided promising outcomes [

45]. Several researchers have already claimed that SVR is efficient in modeling evaporation processes [

23,

46].

#### 3.3. Multivariate Regression Spline (MARS)

Proposed by Friedman in 1991 [

47], multivariate adaptive regression spline is a procedure for fitting adaptive nonlinear functions using a piecewise nonparametric regression method. Unlike the black box models (e.g., ANNs), MARS models are deterministic, which means that in the final regression form the input variables are identified and the interactions between them are specified. Therefore, the MARS models are much easier to be interpreted than the other techniques [

48,

49,

50]. Considering X as the only independent variable and Y as the dependent variable (target value), it can be seen in

Figure 3 that the space of X variable is divided into three sub-regions with three different equations. These equations relate the independent variable space to the target of the system.

The endpoints of the segments of each sub-region are called knots (

Figure 3). The resulting piecewise regression lines (basis functions,

BFs) make the final regression form flexible and appropriate for capturing trends from linear functions as bellow [

51]:

where

β_{i} = 0, 1, …,

m are unknown coefficients and

m is the number of basis functions (

BF) which is determined using a piecewise linear function as follows [

33]:

where

C represents the knot which is a constant coefficient. By considering more independent variables, more equations will be added to the final regression form. In order to determine the location of knots, an adaptive regression algorithm is used. In addition, BFs are generated by a stepwise searching process. In brief, the main procedure of the MARS method is categorized into two parts of the forward and backward phases. The location of potential knots and BFs equations are specified in the forward phase. To modify and improve the modeling accuracy, unnecessary and the least effective variables are removed in the backward phase [

48]. Further details for the mathematical procedure of the MARS method can be obtained from Friedman [

47].

#### 3.4. M5 Model Tree

Quinlan (1992) introduced a piecewise linear regression model, called the M5 model tree (M5Tree) [

52,

53], which has a tree structure based on binary decisions. The linear regression functions, which develop interconnection between the input and output vectors, can be extracted at the terminals (leaves) nodes.

Constructing an M5tree model requires two distinct phases; first the initial tree is generated and is then pruned. In the first phase, data sets are split into several subsets, which create a decision tree. In other words, the M5 model tree splits the data set space into subsets (sub-spaces) and generates a linear regression model [

54,

55]. As can be observed in

Figure 4, the two-dimensional dynamic space of the input vector (X1 & X2) is split schematically into five subsets.

The splitting criterion is determined by assuming the standard deviation (sd) of the class values that reach a node. Based on the sd, the standard deviation reduction (SDR) can be calculated as the following relation [

13,

56]:

where

T stands for the set of examples that reaches the node, and

T_{i} is the subset of examples with the

ith outcome of the potential set. After the first phase (viz. constructing the initial tree), a huge tree-like structure might be generated, which may cause poor generalization. To cope with this problem, in the second phase, the overgrown tree is pruned.

#### 3.5. Response Surface Methodology

The response surface methodology (RSM) presents the advantage of multiple regression analysis via a statistical technique to simulate a response space based on quantitative data obtained from the extracted multivariate equations as presented below [

57]:

where

NV denotes the number of input variables.

β_{0},

β_{i} and

β_{ij} are unknown coefficients for polynomial terms. During the mathematical process, RSM explores the influence of multiple independent variables on the response parameter and optimizes the trending procedure by tuning the number of required experiments [

58,

59,

60,

61].

#### 3.6. Kriging Interpolation Approach

The kriging basis nonlinear model is a well-known interpolation approach to approximate geological problems [

62]. It is defined by using stochastic terms according to the following relation [

63,

64]:

where

$\widehat{\beta}={\left[{\widehat{\beta}}_{1},{\widehat{\beta}}_{2},\dots ,{\widehat{\beta}}_{m}\right]}^{T}$ are regression coefficients for n-support points with m basic functions. The unknown coefficients are computed as follows [

65]:

where

$\widehat{Y}\left(X\right)$ is the predicted value.

$R$ represents the correlation matrix which is given as:

in which

$r\left({X}_{i},{X}_{j}\right)$ is the cross-correlation function computed by the following relation:

where,

${r}_{ij}$ is the distance between points,

${X}_{i}$ and

${X}_{j}$ and

$\theta >0$ are unknown correlation parameters, which are determined as presented below [

66,

67,

68,

69]:

where

n represents the number of training points, and

${\widehat{\sigma}}^{2}$ denotes the variance of the model obtained as:

In the kriging model, the basis function

$f$ can be defined as below:

where the vector [

${f}_{1}\left({X}_{1}\right),{f}_{2}\left({X}_{1}\right),\dots ,{f}_{m}\left({X}_{1}\right)]$ includes the basic functions that are evaluated at the data input point of

${X}_{1}$, and

m is the number of the basic functions. The basis function

$f$ can be used as polynomial and exponential functions for original kriging and improved kriging, as presented in this study.

In the kriging models, the basic functions are considered as follows:

where

$Mon$ represents the periodicity (month of the year),

$Ws$ is wind speed (m/s),

${T}_{max}$ and

${T}_{min}$ are respectively the maximum and minimum temperature (°C),

$RH$ is the relative humidity (%),

$SR$ is the solar radiation (langley), and

$Hs$ represents the hours of sunshine (h). The surrogate model that uses an adaptive kriging framework can be used for (i) reducing the computational burden and increase the accuracy results of the optimization problem [

50,

70], (ii) structural reliability analysis [

65,

68], and (iii) reliability-based design optimization [

67,

71].

#### 3.7. Improved Kriging

In the fitness process of the kriging model, the basic function term, i.e.,

$\widehat{\beta}f$ is an important factor for providing a flexible prediction. The stochastic term, i.e.,

${r}^{T}\left(X\right){R}^{-1}(Y-\widehat{\beta}f)$, may produce a smaller covariance for approximating data with accurate basic function. Thus, the nonlinear form of the basic function can improve the accuracy of the EP predictions. A schematic comparative view of the exponential and linear polynomial functions is presented in

Figure 5 to illustrate the fitness of the exponential basic function. We used the exponential basic function for the regression process instead of the linear basic function, in order to enhance the ability of the standard kriging model.

In improved kriging, the linear and exponential functions are used by the following basic function:

where

${X}_{k}$ are the input variables and exp denotes the exponential operator. The prediction accuracy of the improved kriging model for the estimation of the EP is tested based on an untried data set using

$r\left(X\right)$ in Equation (11) and predicted relation in Equation (8).

#### 3.8. Methodology and Models Evaluation

The modeling process focuses on the monthly predictions of the EP based on two different scenarios, as presented below:

In the first scenario (#I), the monthly averaged of six meteorological parameters including wind speed (WS, m/s), relative humidity (RH, %), solar radiation (SR, Langley), sunshine hours (HS, h), minimum (Tmin, °C), and maximum temperatures (Tmax, °C), are considered as the input vectors of the applied models.

In the second strategy, all of the mentioned independent meteorological parameters along with the time factor formed the input vector.

Due to the fact that in the second scenario, the order of the data is important for modeling, the time series cross-validation technique was applied. Thus, in both of the scenarios, 70% of the data was used for training the models, and the other 30% was used for the testing set. In the current work, the root mean square error (

RMSE) was used as a measure of accuracy, while the mean bias error (

MBE) was used as a measure of tendency. The absolute residual of the standard deviation between the actual EP values and the modeled ones (

RSTD) was used as a measure of precision as presented below [

60,

72,

73]:

where

N is the number of data,

EPm_{i} represents the modeled EP for the

ith data and

EPo_{i} stands for the observed EP values for the

ith data. In addition to the above-mentioned measures, other statistics and criteria such as the mean absolute error (

MAE), mean absolute percentage error (

MAPE), Willmott index (

d), total pan evaporation (Tot-EP), maximum value of the relative error between the calculated and observed EP (MAX (RE)) were also used for the evaluation of the applied methods [

58].

where

EPmean is the mean of monthly EP. In this study, the Wilcoxon nonparametric statistical hypothesis test is also implemented to evaluate the performance of statistical versus soft computing models at the 95% confidence level. The maximum relative absolute error ((Max (RE)) was computed as max (RE

_{i}) and RE

_{i} = (

$|{\widehat{Y}}_{i}-{Y}_{i})|/{Y}_{i}$, where

${\widehat{Y}}_{i}$ an

${Y}_{i}$ indicate the estimated and observed pan evaporation.

## 5. Discussion

This paper aimed to challenge the performance of different statistical and soft computing models based on (i) mathematical (accuracy, precision, and tendency), and (ii) statistical (at the 0.01 and 0.05 levels of significance) perspectives. In accordance with the mathematical comparisons (

Table 1,

Table 2,

Table 4 and

Table 5, and

Figure 8), it was concluded that the improved kriging model performed better than the other applied models, which means that an improved statistical model might even be able to surpass soft computing models.

Figure 9 illustrates the Taylor diagrams for (a) Adana and (b) Antakya Stations. As shown by these figures, the kriging models provide a better prediction for agreement than the RSM but worse than the soft computing models (viz. SVR, MARS, and RBFNN). The SVR provides a superior correlation with the observed data compared to the other soft computing models. As can be seen in

Figure 9, the improved kriging enhanced the predictions of the standard kriging model.

There is no doubt that in most cases, soft computing models perform better than traditional statistical models. Similarly, the standard kriging and RSM models failed to surpass the soft computing models due to linear cross-correlation regressed function based on the statistical measures. This assessment is based on pertinent studies in the literature. For instance, in a comparative study between the capability of machine learning versus ANN, and statistical technique versus MLR, for EP prediction, it was found that the model efficiency and correlation coefficient of the ANN was higher than the MLR model for the calibration and validation phases [

20]. The same result has been noted for the superiority of ANFIS model over the MLR statistical model [

74]. However, it is worth noting that the majority of recent published studies have solely focused on the evaluation of several machine learning models [

11,

32]. Investigating the outcomes of these studies indicates that the machine learning models perform well in predicting evaporation at different climatical regions.

In this study, in addition to the mathematical evaluation of the potential performance of statistical and soft computing models, the results of the Mann–Whitney hypothesis test were also taken into account. The outcomes of the Mann–Whitney hypothesis test showed that none of the soft computing applied models has significant superiority over the statistical ones. In other words, despite their ability to model nonlinear phenomena, soft computing models should not be taken into granted as the preliminary predictive models. The improved versions of the RSM or kriging-based statistical techniques can improve the accuracy of the prediction of nonlinear problems. Thus, the improved statistical kriging technique uses the exponential transformation of input variables and can also be applied for other engineering problems with nonlinear complex relations. Furthermore, the competency of this method can be apprised by comparing it with machine learning models for complex problems with highly nonlinear relations.