Hybrid modeling approaches for agricultural commodity prices using CEEMDAN and time delay neural networks | Scientific Reports

Scientific Reports volume 14, Article number: 26639 (2024) Cite this article

Metrics details

Improving the forecasting accuracy of agricultural commodity prices is critical for many stakeholders namely, farmers, traders, exporters, governments, and all other partners in the price channel, to evade risks and enable appropriate policy interventions. However, the traditional mono-scale smoothing techniques often fail to capture the non-stationary and non-linear features due to their multifarious structure. This study has proposed a CEEMDAN (Complete Ensemble Empirical Mode Decomposition with Adaptive Noise)-TDNN (Time Delay Neural Network) model for forecasting non-linear, non-stationary agricultural price series. This study has evaluated its suitability in comparison with the other three major EMD (Empirical Mode Decomposition) variants (EMD, Ensemble EMD and Complementary Ensemble EMD) and the benchmark (Autoregressive Integrated Moving Average, Non-linear Support Vector Regression, Gradient Boosting Machine, Random Forest and TDNN) models using monthly wholesale prices of major oilseed crops in India. Outcomes from this investigation reflect that the CEEMDAN-TDNN hybrid models have outperformed all other forecasting models on the basis of evaluation metrics under consideration. For the proposed model, an average improvement of RMSE (Root Mean Square Error), Relative RMSE and MAPE (Mean Absolute Percentage Error) values has been observed to be 20.04%, 19.94% and 27.80%, respectively over the other EMD variant-based counterparts and 57.66%, 48.37% and 62.37%, respectively over the other benchmark stochastic and machine learning models. The CEEMD-TDNN and CEEMDAN-TDNN models have demonstrated superior performance in predicting the directional changes of monthly price series compared to other models. Additionally, the accuracy of forecasts generated by all models has been assessed using the Diebold-Mariano test, the Friedman test, and the Taylor diagram. The results confirm that the proposed hybrid model has outperformed the alternative models, providing a distinct advantage.

The ever-increasing populace exerts considerable pressure on the agricultural production system while taking a simultaneous toll on fixed resources such as land and water. As a result, an increasing concern has been evident, and the associated uncertainty is expected to imply upward pressure on prices, especially in developing economies like India1. Moreover, the domestic and international market forces have contributed substantially to the increased price variability, and accorded importance to reliable and timely price forecasts2,3,4. These forecasts are expected to be useful for many stakeholders namely farmers, traders, exporters, governments, and all other partners in the price channel4,5. However, due to its strong dependence on biological processes and unpredictable natural events such as droughts, floods, pest, disease outbreaks, etc., agriculture price forecasting is considered a daunting task in the domain of time series forecasting6,7,8,9,10,11,12,13,14.

To strengthen this study’s foundation, recent literature highlights the significant potential and versatility of various machine learning models in modeling complex patterns across diverse research fields2,9,15,16,17,18,19,20,21,22,23,24,25,26. Among the numerous stochastic processes implemented, ARIMA (autoregressive integrated moving average) and its component models have virtually dominated agricultural commodity price forecasting27. However, these models assume a linear correlation structure among the time series observations. Therefore, if the observed feature of a real-world price series exhibits non-linearity, the forceful application of the ARIMA model is likely to result in unreliable forecasts and misleading economic implications. Several non-linear time series models, such as the bilinear model28and the threshold autoregressive (TAR) model29, were developed as early responses to overcome this impediment. However, these non-linear models also require a prespecified non-linear relationship, which may not be flexible enough to incorporate all the essential features30,31,32.

Different machine learning approaches, specifically artificial neural networks (ANNs)6,7,8, have emerged as viable alternatives to traditional predictive models32,33,34. Its non-parametric, data-driven, and self-adaptive nature has made it more appealing than other non-linear alternatives35,36. A comprehensive review of the literature on time series forecasting highlights the widespread interest among researchers in exploring the capabilities of artificial neural networks compared to traditional linear and non-linear statistical models6,7,8. In the context of agricultural price forecasting, Manogna and Mishra5 illustrated the superior forecasting capabilities of neural network models in forecasting spot prices for agricultural commodities in India. Singh and Mishra37 analyzed the groundnut oil price series in Mumbai and found that the ANN performed better than the ARIMA model in terms of mean squared error (MSE), root mean square error (RMSE), and mean absolute percentage error (MAPE) values. Areef and Radha38 compared the performance of ANN and generalized autoregressive conditional heteroskedastic (GARCH) models for forecasting potato prices in the Bengaluru market, Karnataka. It was apparent from their study that the forecasted prices were much closer to actual prices in the case of ANN rather than GARCH.

Even with the immense popularity and sheer power of the neural network models, their inability to model non-linear, non-stationary time series data has also been reported39,40. As a result, conventional mono-scale smoothing methods frequently struggle to accurately capture the intricate patterns and unpredictable fluctuations of agricultural commodity prices influenced by various factors41,42,43,44,45,46,47. As a result, it is logical to consider employing decomposition techniques to model price series, as they can address non-linearity and non-stationarity inherent in time series data48,49,50,51. Empirical mode decomposition (EMD) has demonstrated impressive competence in extracting valuable features from time series data characterized by non-linear and non-stationary attributes51. In contrast to conventional time series modeling, this self-adaptive decomposition method focuses on breaking down the original time series into multiple independent intrinsic mode functions (IMFs) along with a residue with varying amplitudes and frequencies. Consequently, recent years have witnessed the development and utilization of different EMD variations across divergent domains52,53. Table 1 provides an extensive review of the recent literature comparing different competitive decomposition strategies based on EMD in time series forecasting.

At this juncture, it is important to note that the forecasting of agricultural commodities prices differs from typical time series data due to its unique characteristics. However, few studies have been conducted on decomposition-based agricultural commodity price forecasting. There needs to be up to date and proper state of the art regarding the theoretical foundation and practical applications of decomposition-based modeling for addressing non-stationarity and non-linearity in agricultural price forecasting. This accounts for a research gap arising from the need for an in-depth understanding of effectively implementing and integrating decomposition techniques with advanced models and correctly interpreting and analyzing the results. Further, considering agricultural price forecasting as a sub-domain of time series analysis, the significance of each step in the implementation process is essential for understanding the statistical implications, an aspect often overlooked in existing studies. For example, most of the studies40,53,59,64,65 did not consider the essential steps of statistics, such as pretesting for non-stationarity and non-linearity, while applying EMD-based decompositions or non-linear A.I. (Artificial Intelligence) models for forecasting. Furthermore, despite introducing advanced versions of EMD over time, a comprehensive empirical comparison of these variants in the context of agricultural price systems needs to be more present. These observations underscore the need for a systematic examination in forecasting agricultural price series using decomposition-based hybrid models.

Hence, in this paper, encouraged by the successful application of CEEMDAN-based hybrid models across various fields and with the aim of filling the identified research gap, we propose a CEEMDAN-TDNN hybrid model. This model is designed to effectively capture the non-stationary and non-linear characteristics of agricultural price series data. We empirically evaluate the performance of our proposed model in forecasting the monthly wholesale prices of major oilseed crops in India, comparing it to three other prominent EMD variants (EMD, EEMD, and CEEMD). Three additional A.I. models, including Non-linear Support Vector Regression (NLSVR), Gradient Boosting Machine (GBM), and Random Forest (RF), which have gained upsurging interest in the field of artificial intelligence and predictive modeling10,12,66,67,68,69,70,71,72, have also been included in the study alongside ARIMA and TDNN as benchmark models for comparative analysis. The most effective model is determined based on performance evaluation metrics. To ensure robust validation and assess the forecasting superiority of the developed model, both parametric (Diebold-Mariano test) and non-parametric (Friedman test) tests, as well as a graphical approach (Taylor diagram), have been utilized. Moreover, accurately forecasting the price direction for the upcoming month is deemed a crucial factor in model selection. These forecasts are particularly valuable in economics for understanding the phases of the business cycle, specifically about turning points.

The oilseed price series in this study were chosen purposefully keeping the nature of the data required and the economic importance of these markets in mind73,74. The remainder of this paper proceeds as follows. ‘Materials and Methods’ provides the data used for the experimentation and elaborates on the time series forecasting methods adopted in this paper. Empirical results from real data and the relevant discussion are provided in ‘Results’ and ‘Discussion’, respectively. The last section eventually concludes.

For the current investigation on monthly wholesale price (₹/q) of significant oilseed crops, data series from January 2008 to December 2019 are obtained from the various issues of ‘Agricultural Prices in India’ published by the Directorate of Economics & Statistics, Department of Agriculture and Farmers Welfare, Ministry of Agriculture and Farmers Welfare, Government of India. Three markets, namely Rajkot, Delhi, and Kanpur have been considered for Groundnut, rapeseed & mustard, and linseed, respectively. For all the markets, out of the 144 observations available, the first 132 observations are utilized for model building while retaining the rest for testing.

ARIMA models are an extension of ARMA models that use an appropriate order of differencing to handle a wide range of non-stationary time series75,76. ARIMA assumes that the differenced series is a linear function of past actual values and random shocks75. A process $\:\left\{{\text{y}}_{\text{t}}\right\}$ is said to follow an ARIMA (p, d, q) model if it can be expressed as:

where p, d, and q, being non-negative integers, refer to the order of autoregression, differencing, and moving average, respectively. B is the backshift operator defined as $\:\text{B}{\text{y}}_{\text{t}}={\text{y}}_{\text{t}-1}$. $\:{\phi\:}\left(\text{B}\right)$ and $\:{\uptheta\:}\left(\text{B}\right)$ are respective polynomials of degree p and q in B. The random error $\:{{\upepsilon\:}}_{\text{t}}$ is supposed to be a standard white noise process following $\:\text{N}(0,{{\upsigma\:}}^{2})$. A detailed discussion of various aspects of this method can be found in Box et al.77.

ANNs are a class of non-linear, non-parametric, self-adaptive, and data-driven computational methods30,32,78. These are specifically useful when the underlying data relationship is unknown. A general neural network architecture consists of an input layer that receives the input data, one or more hidden layers that offer non-linearity to the model, and an output layer that yields the target value79. The general expression for a multilayer feed-forward neural network is represented by:

where $\:{{\upalpha\:}}_{\text{j}}\:(\text{j}=\text{0,1},2,\dots\:,\text{q})$ and $\:{{\upbeta\:}}_{\text{i}\text{j}}\:(\text{i}=\text{0,1},2,\dots\:,\text{p})$ are the model parameters, often called as the connection weights. p and q refer to the number of input and hidden nodes, respectively.

ANN can represent time series data by offering an implicit functional representation of time, whereby a static neural network, such as a multilayer perceptron, is assigned dynamic properties. One of the easiest ways to embed short-term memory into a neural network’s structure is to utilize time delay at the input layer. One such architecture is TDNN. The logistic function has served as the hidden layer activation function with the form:

A typical TDNN structure with one hidden layer is denoted by I: Hs: O, where I, H and O are the number of nodes in the input, hidden layer, and output layer, respectively and s denotes the logistic transfer function. For p input nodes (tapped delay), q hidden nodes, one output node, and biases at both hidden and output layers, the total number of weights in a three-layer feed-forward neural network is q (p + 2) + 1.

In the context of a given data set {$\:{\left\{{\text{x}}_{\text{i}},{\text{y}}_{\text{i}}\:\right\}}_{\text{i}=1}^{\text{n}}$, with $\:{\text{x}}_{\text{i}}\in\:{\text{R}}^{\text{n}}$ representing the input vector, $\:{\text{y}}_{\text{i}}\in\:\text{R}$ as the scalar output, and n denoting the size of the data set, the NLSVR estimating function80 can be expressed in the general form as follows:

Here, $\:{\phi\:}\left(.\right)$ denotes a non-linear mapping function that transforms the original input space into a higher dimensional feature space. In this equation, w represents the weight vector, b denotes the bias term, and the superscript T signifies the transpose operation. The coefficients w and b are estimated from data by minimizing the following regularized risk function:

In the equation above, the term $\:\frac{1}{2}{\Vert\text{w}\Vert}^{2}$ is referred to as the ‘regularized term’, which evaluates the smoothness of the function. The term $\:\frac{1}{\text{n}}\sum\:_{\text{i}=1}^{\text{n}}{\text{L}}_{{\upepsilon\:}}({\text{y}}_{\text{i}},\:\text{f}({\text{x}}_{\text{i}}\left)\right)$ is known as the ‘empirical error’, and it is estimated using the Vapnik ε-insensitive loss function. Both C and ε are hyperparameters that the user can set. The Vapnik Loss function is defined as:

where $\:{\text{y}}_{\text{i}}$ represents the actual value and $\:\text{f}\left({\text{x}}_{\text{i}}\right)$ indicates the estimated value at $\:{\text{i}}^{\text{t}\text{h}}$ period.

Gradient Boosting66 is a powerful boosting algorithm that merges multiple weak learners to form strong learners. Each successive model is trained using gradient descent to reduce the loss function, like the mean squared error, of the preceding model. During each iteration, the algorithm calculates the gradient of the loss function concerning the current ensemble’s predictions. Subsequently, a new weak model is trained to minimise this gradient. The new model’s predictions are then incorporated into the ensemble, and the process continues until a particular stopping criterion is satisfied.

Let us consider an ensemble comprising M trees. Tree-1 is trained with feature matrix X and output vector y. The predictions $\:{\widehat{\text{y}}}_{1}$ are used to calculate residuals $\:{\text{r}}_{1}$. Tree-2 is subsequently trained using feature matrix X as input and the residual error $\:{\text{r}}_{1}$ from Tree-1 as the output. The predicted results $\:{\widehat{\text{r}}}_{1}$ are then used to calculate residual $\:{\text{r}}_{2}$. This process is iterated until all M trees in the ensemble are trained. A critical parameter utilized in this technique is ‘Shrinkage’, which involves reducing the prediction of each tree in the ensemble by multiplying it with the learning rate (η) that falls within the range of 0 to 1. A trade-off exists between η and the number of estimators, as reducing the learning rate requires compensation with an increase in estimators to achieve a specific model performance level. Once all trees are trained, predictions can be generated. Each tree predicts an output vector, and the final prediction $\:\widehat{\text{y}}$ can be calculated using the following formula:

Bagging or bootstrap aggregation decreases the variance of an estimated prediction function65,81. It is particularly effective for procedures with high variance and low bias, like trees. In the regression case, the same regression tree is fitted to bootstrapped training data samples, and the outcomes are averaged. Random forests represent a significant adaptation of bagging, constructing a vast array of uncorrelated trees and averaging them71. The algorithm for regression with random forests can be summarized as follows:

Generate a bootstrap sample of size N from the training dataset.

Construct a random forest tree $\:{\text{T}}_{\text{b}}$ (b = 1, 2, …, B) using the bootstrapped data by iteratively applying the following steps (a)-(c) up to each terminal node of the tree until the minimum node size $\:{\text{n}}_{\text{m}\text{i}\text{n}}$ is reached:

(a) Pick m variables at random from the total p variables.

(b) Determine the best variable/split-point from the selected m variables.

Combine the output of all trees {$\:{\left\{{\text{T}}_{\text{b}}\right\}}_{\text{b}=1}^{\text{B}}$ to obtain the final prediction.

Make predictions for a new data point x using the random forest model:

The EMD decomposes the complex original series into a series of IMFs and a residue based on the local characteristics of the series such as the local maxima, local minima, and zero-crossings60. Its essence lies in transitioning from non-stationary and non-linear signals to linear and stationary ones. Since the features are obtained empirically, the process is adaptive and efficient. The EMD method can be depicted as follows82.

Identify all local maxima and local minima of the original time series y(t).

Obtain an upper envelope u(t) and a lower envelope l(t) by interpolating all the local extrema.

Calculate the average of the upper and lower envelopes as $\:\text{m}\left(\text{t}\right)=\frac{\text{u}\left(\text{t}\right)+\text{l}\left(\text{t}\right)}{2}$.

Obtain a detailed component d(t) by subtracting the average m(t) from the original time series y(t) as $\:d\left(t\right)=y\left(t\right)-m\left(t\right)$.

If m(t) and d(t) meet any stopping criteria, then the first IMF $\:{\text{c}}_{1}\left(\text{t}\right)=\text{m}\left(\text{t}\right)$ and the first residue $\:{\text{r}}_{1}\left(\text{t}\right)=\text{d}\left(\text{t}\right).$ The stopping criterion refers to if m(t) tends to zero or the number of local extrema and the number of zero crossings of d(t) maximally differs by one or the user-defined maximum iteration is reached. Rilling et al.36 have also reported two threshold-based stopping criteria specified as:

where $\:{\updelta\:}\left(\text{t}\right)=\left|\frac{\text{u}\left(\text{t}\right)+\text{l}\left(\text{t}\right)}{\text{u}\left(\text{t}\right)-\text{l}\left(\text{t}\right)}\right|$ and α, $\:{{\uptheta\:}}_{1}$, $\:{{\uptheta\:}}_{2}$ are user-defined constants. c(.) denotes a function to count the numbers in a set.

However, if none of the stopping criteria are reached after step (iv), repeat steps (i) to (iv) until all (let n) the IMFs and the residue are obtained.

Finally, the original time series is reconstructed as:

The major lacuna of EMD application is the problem of mode mixing, in which an IMF is made up of signals covering a large frequency range, or many IMFs comprising signals in a similar frequency band exist83. To tackle this problem, Wu and Huang84 developed an ensemble version of EMD called EEMD. In EEMD, multiple trials are carried out and each trial is similar to EMD except that the input series is a mixture of the original time series at hand and a finite Gaussian white noise. Although the resulting decompositions are noisier, the uncorrelated finite white noise will negate each other in the time of mean computation over all trials. Thus, the relevant time series can be retained, eliminating the mode mixing problem. The EEMD procedure is as follows84:

Generate several noise-added time series by adding independent Gaussian white noises.

where i = 1, 2,…, I. $\:{\text{y}}^{\text{i}}\left(\text{t}\right)$ and $\:{{\upepsilon\:}}^{\text{i}}\left(\text{t}\right)$ denote the $\:{\text{i}}^{\text{t}\text{h}}$ noise-added series and $\:{\text{i}}^{\text{t}\text{h}}$ independent Gaussian white noise, respectively.

For each $\:{\text{y}}^{\text{i}}\left(\text{t}\right)$, EMD is applied to obtain the decomposed IMFs and residue as:

The original time series can be reconstructed by averaging over all trials.

where $\:{{\upepsilon\:}}_{\text{I}}\left(\text{t}\right)=\frac{{\upepsilon\:}\left(\text{t}\right)}{\sqrt{\text{I}}}$.

Even though EEMD is a remarkable improvement over EMD in terms of stability, it cannot still completely neutralize the added noise. Hence, Yeh et al.85 developed CEEMD, which can achieve the same decomposition effect as EEMD while reducing the reconstruction error by using paired noise with positive and negative signals. The CEEMD procedure is the same as EEMD except $\:{{\upepsilon\:}}^{\text{i}}\left(\text{t}\right)\in\:\left\{{{\upepsilon\:}}_{+}^{\raisebox{1ex}{$\text{i}$}\!\left/\:\!\raisebox{-1ex}{$2$}\right.}\left(\text{t}\right),\:\:{{\upepsilon\:}}_{\_}^{\raisebox{1ex}{$\text{i}$}\!\left/\:\!\raisebox{-1ex}{$2$}\right.}\left(\text{t}\right)\right\}$, where $\:{{\upepsilon\:}}_{+}^{\raisebox{1ex}{$\text{i}$}\!\left/\:\!\raisebox{-1ex}{$2$}\right.}\left(\text{t}\right)+\:{{\upepsilon\:}}_{\_}^{\raisebox{1ex}{$\text{i}$}\!\left/\:\!\raisebox{-1ex}{$2$}\right.}\left(\text{t}\right)=0$; i = 1, 2,…, I.

Another problem associated with EEMD is the high cost of computation. To minimize the number of trials while preserving the capability of solving the mode mixing problem, Torres et al.86 proposed CEEMDAN. The CEEMDAN method proceeds as follows.

Generate several noise-added time series by adding independent Gaussian white noises with unit variance.

where i = 1, 2,…, I and $\:{{\upomega\:}}_{0}$ denotes the noise coefficient.

For each $\:{\text{y}}^{\text{i}}\left(\text{t}\right)$, apply EMD to obtain the first decomposed IMF and calculate the mean as $\:{\text{c}}_{1}\left(\text{t}\right)=\frac{1}{\text{I}}\sum\:_{\text{i}=1}^{\text{I}}{\text{c}}_{1}^{\text{i}}\left(\text{t}\right)$. Subsequently, the first residue is obtained as: $\:{\text{r}}_{1}\left(\text{t}\right)=\text{y}\left(\text{t}\right)-{\text{c}}_{1}\left(\text{t}\right)$.

Decompose the noise-added residue to obtain the second IMF:

where $\:{\text{E}}_{\text{j}}(.)$ refers to a function to extract the $\:{\text{j}}^{\text{t}\text{h}}$ IMF decomposed by EMD.

Repeat for the remaining IMFs until at most two extrema of the residue exist.

The proposed CEEMDAN-TDNN hybrid modeling technique is schematically represented in Fig. 1.

Schematic representation of the CEEMDAN-TDNN model.

The forecasting performance of the models employed in this study is evaluated concerning three common accuracy measures, viz., RMSE, RRMSE (Relative RMSE) and MAPE. However, Niu and Xu87argue that the forecasting performance of non-linear models should be assessed by their ability to correctly predict the direction of change rather than by error-based measures such as RMSE, RRMSE, MAPE, etc. Hence, a comprehensive evaluation has been carried out in terms of both error-based measures (RMSE, RRMSE and MAPE)88,89,90,91,92 and the directional prediction statistics $\:\left({\text{D}}_{\text{s}\text{t}\text{a}\text{t}}\right)$93,94,95.

where $\:{\text{a}}_{\text{t}}=1$ if $\:({\text{y}}_{\text{t}}-{\text{y}}_{\text{t}-1})(\widehat{{\text{y}}_{\text{t}}}-{\widehat{\text{y}}}_{\text{t}-1})\ge\:0$, otherwise $\:=0$.

where $\:{\text{y}}_{\text{t}}$ and $\:\widehat{{\text{y}}_{\text{t}}}$ denote the $\:{\text{t}}^{\text{t}\text{h}}$ actual and predicted values in the test data set. n refers to the size of the test set.

Taylor diagrams34,96 provide a visual representation of the correspondence between patterns and observations. This comparison is based on measures of correlation, centered root-mean-square difference, and standard deviations. These diagrams are valuable for assessing the relative performance of various models. The Taylor diagram illustrates the statistical connection between two datasets: a ‘test field’ (typically a model simulation) and a ‘reference field’ (usually observational data). Each point on the diagram represents three distinct statistics simultaneously - the centered RMS difference, correlation, and standard deviation, due to their interrelatedness as defined by the formula below:

where R is the correlation coefficient between the test and reference fields, E’ is the centred RMS difference between the fields, and $\:{{\upsigma\:}}_{\text{f}}^{2}$ and $\:{{\upsigma\:}}_{\text{r}}^{2}$ are the variances of the test and reference fields, respectively. The diagram is constructed based on the correlation given by the cosine of the azimuthal angle, drawing similarities between the equation and the Law of Cosines:

To confirm the accuracy and evaluate the effectiveness of the developed forecasting model, we have conducted both the Diebold–Mariano test97and Friedman test98. Instructions for conducting these tests are detailed in ‘Supplementary Information S1’.

The basic features of the price series involved in this experiment are briefed in Table 2. The average price in these markets is around ₹ 3700–3900 per quintal. The CV(%) values indicate the presence of a relatively higher degree of instability in the data series. The auto-correlation and partial auto-correlation functions display no significant and regular seasonal pattern. Seasonal indices, as presented in Table 3, further confirm it. The augmented Dickey-Fuller (ADF) test99 has been utilized to decide on the non-seasonal differencing of the price series under study and is reported in Table 4. For all the markets, non-stationarity and stationarity have been observed for the level and first difference series, respectively.

As our investigation focuses on modeling techniques for non-linear, non-stationary time series data, assessing if the provided time series is non-linear before proceeding further is crucial. The Brock–Dechert–Scheinkman (BDS) test100 has been implemented in this study to test non-linearity. This test examines the spatial dependence of the observed series. The results in Table 5 reflect the strong rejection of linearity in all cases. Therefore, upon confirmation of these series’ non-linear and non-stationary nature, EMD and its improved variants can be effectively implemented for these price series forecasting.

The present study applies ARIMA, NLSVR, GBM, RF and TDNN models to the original series. For the EMD variant-based models, the IMFs and residue are obtained first and then the appropriate models are selected for each sub-series.

The ACF and PACF plots serve as a reliable guide for the possible order of the ARIMA model. Minimum Akaike information criteria (AIC), Bayesian information criteria (BIC) and minimum RMSE, RRMSE and MAPE values have been used to select the best model. The parameter estimates of the selected ARIMA models are provided in Table 6.

One crucial aspect of NLSVR modeling is the selection of hyper-parameters. The performance of NLSVR is significantly affected by the choice of input lags, kernel function, regularization parameter, kernel width, and margin of tolerance. For this study, we have utilized the popular radial basis function (RBF) as the kernel function to construct NLSVR models following the specifications provided in Table 7.

The performance of the GBM also crucially relies on the optimal selection of hyper-parameters. The hyper-parameters fine-tuned for training this model include the number of input lags, number of estimators, maximum depth, minimum samples per leaf, subsample, and learning rate. The best-performing GBM models have been developed according to the specifications outlined in Table 7.

Similarly, when tuning random forests, it is essential to consider the number of input lags, number of estimators, minimum samples per leaf, and maximum depth. Several automated techniques in existing literature can be utilized for hyper-parameter combinations. Among these methods, we have opted for grid search, which systematically explores all possible combinations of the hyper-parameters. The results of the optimized configurations can be found in Table 7.

This study identified the most effective time-delay neural network with a single hidden layer for both the original and stationary series. By altering the range of input and hidden nodes from 1 to 6 and from 1 to 10, respectively, we have optimized the network performance. The Levenberg-Marquardt back-propagation algorithm has been utilized for training. Detailed specifications of the selected TDNN models can be found in Table 7. The superiority of TDNN models over other machine learning models is evident in the Friedman F ranks presented in the table, indicating it as the prime choice for further decomposition-based improvements. The performance enhancement of TDNN models when utilizing the stationary (first differenced) series as input rather than the original non-stationary series also supports the theory that proper pre-processing techniques (such as differencing and decomposing) of such data can significantly elevate the efficacy of neural network models40,101.

EMD has decomposed each price series into independent IMFs and residues through a sifting process. Characteristics and the selected TDNN model specifications of these IMFs and residue are presented in Table 8, whereas the graphical representation of the decomposed series is given in ‘Supplementary Information Fig. S1-S3’. Among the decomposed sub-series, the highest average value has been observed for residue in each case. However, the highest fluctuation has been found for residue in the case of Rapeseed & mustard and Linseed, and for IMF-4 in the case of groundnut. The highest correlation with the original series is observed for IMF-4 in the case of groundnut and for residue in the case of rapeseed & mustard and linseed. Most of the IMFs have exhibited a positive correlation. However, negative correlations are observed for one of the IMFs in the case of groundnut and rapeseed & mustard.

The purpose of ensembling in EEMD is to avoid the problem of mode mixing. ‘Supplementary Information Figure S4-S6’ illustrates the IMFs and residue obtained through EEMD. Like EMD, all the IMFs are obtained from the highest to the lowest frequency. The residue varies slowly around the long-term average. The average fluctuation and correlation patterns of the obtained IMFs are similar to those of EMD and can be observed in Table 9. However, except IMF-3 of linseed, all other IMFs and residue are positively correlated with the original series.

CEEMD can handle the noise generated by non-negligible residues in the EEMD process. Characteristics and the selected TDNN model specifications of the IMFs and residue obtained through CEEMD are presented in Table 10, whereas the decomposed series is graphically represented in ‘Supplementary Information Fig. S7-S9’. Among the decomposed sub-series, the residue component has consistently displayed the highest average value. Notably, the residue in Rapeseed & Mustard and Linseed, and IMF-1 in groundnut, have exhibited the highest fluctuations among the sub-series. All the IMFs exhibited a positive correlation except for the IMF-4 of groundnut. Among the positive ones, the highest correlation is observed for IMF-5 in the case of groundnut and for residue in the case of rapeseed & mustard, and linseed.

CEEMDAN utilizes the same ‘divide and conquer’ framework as the original EMD method. However, by adding finite adaptive white noises, CEEMDAN yields IMFs more stable and closer to a normal distribution than the IMFs obtained through EMD. ‘Supplementary Information Figure S10-S12’ illustrates the IMFs and residue obtained through CEEMDAN. From Table 11, it can be observed that the residue has the highest average value in each case. However, the highest fluctuations have been observed for residue in the case of Rapeseed & mustard and Linseed, as well as for IMF-5 in groundnut. Notably, there is a positive correlation between all IMFs and residue and their original series. The pattern of the highest correlation is the same as CEEMD, i.e., IMF-5 in the case of groundnut and residue in the case of rapeseed & mustard and linseed.

The comparative results of the time series models under investigation concerning the post-sample RMSE, RRMSE and MAPE values are given in Table 12. The error-based metrics reveal that the values of the forecasted series are closer to the values of the actual price series when obtained using the proposed CEEMDAN-TDNN models. It also reflects an almost uniform order of accuracy, i.e., CEEMDAN-TDNN > CEEMD-TDNN > EEMD-TDNN > EMD-TDNN > Stationary-TDNN > TDNN > RF > GBM > NLSVR > ARIMA for all the three-price series considered. Moreover, it can be noted that as the data series is truly non-linear, non-linear machine learning models such as NLSVR, GBM, RF, TDNN have clear advantages over the model rendering linear forecasts such as ARIMA. Among these machine models, TDNN has shown a comparative edge over the others and hence is considered for further hybridization to obtain more accurate forecasts.

The advantage of using a stationary series as an input to a TDNN model is also substantiated by the superior performance of the stationary-TDNN model over the TDNN model. However, as the underlying non-linear and non-stationary features of commodity prices significantly impact on the robustness of the neural network models102, more than mere differencing is needed to handle such multiscale complexity. Hence, the EMD variant-based TDNN models, which can simultaneously deal with non-linear and non-stationary features, have provided far better forecasts103. However, one important point that emerged is the potential for using additional explanatory variables for each decomposed component. Each component of the decomposed series (such as trend, seasonality, and high-frequency variations) could potentially be explained by different factors beyond lagged observations. For example, a trend component might be influenced by macroeconomic variables such as inflation or GDP, while short-term fluctuations might be more closely tied to speculative trading or market sentiment. Future work should consider such factors to enhance the interpretability and accuracy of forecasts for agricultural prices. Furthermore, external factors like rainfall or weather conditions, which have a direct impact on agricultural production and price volatility, can also be explored for inclusion.

It is worth noting that even though EEMD, CEEMD, and CEEMDAN are modifications over EMD, substantial differences in the sub-series characteristics are observed. Consequently, the input series (lagged observations of that sub-series) and the target output (the sub-series) have varied from one decomposition technique to another, allowing the neural networks to capture patterns in significantly different ways83, which certainly affects the model performance. The gradual improvements in the accuracy of EEMD over EMD and CEEMD over EEMD have been evident, mainly due to the ensembling algorithm and the use of complementary white noise pair, respectively. Zhang et al.104 have also observed such incremental performance of EMD-variants for neural network forecasting of groundwater depth prediction. Finally, the remarkable improvements in the accuracy of CEEMDAN as opposed to almost all of its counterparts and marginal improvement over CEEMD can be attributed to the extra noise coefficient vector $\:{\upomega\:}$, which controls the noise level at each stage of decomposition. In this technique, the signal is adaptively decomposed and noise adaptive processing is carried out on individual sub-series components, enabling better adaptation to the non-linear and non-smooth characteristics of the signal, which significantly enhance the decomposition accuracy and stability. The efficiency of the CEEMDAN-based hybrid models, due to its unique feature extraction technique, was also observed in the studies of Gao et al.105 for predicting nitrogen content in citrus leaves, Bennia et al.106 for minimizing the impact of additive noises on non-invasive biomedical signals, Gyamerah and Owusu107 for improving weather prediction amidst extreme climate change in Africa, etc. Time plots of actual vs. predicted series employing CEEMDAN-TDNN models are presented in Fig. 2.

Actual and best-predicted price series (by the CEEMDAN-TDNN hybrid model) of (a) groundnut, (b) rapeseed & mustard and (c) linseed.

However, as indicated earlier, several researchers have suggested that the conventional error-based metrics may not be suitable for evaluating non-linear models, since a linear model can outperform the non-linear ones even when the true data-generating process is non-linear87,108. A non-linear model can generate comparatively more variation in the forecast values than a linear model. Hence, errors with a larger magnitude are likely to be unduly penalized. Table 13 provides the post-sample percentage of forecasts of correct signs.

The linear ARIMA model has performed equally or even better than the non-linear machine learning models. At this juncture, the impact of the inherent non-linearity and non-stationarity on the performance of the machine learning models is distinctly more realized in terms of turning point prediction. Among the EMD-variant based hybrids, comparable performance is observed in the EMD-TDNN and EEMD-TDNN models. The CEEMD-TDNN and CEEMDAN-TDNN models have also exhibited the equal ability to forecast the change direction. Thus, the comprehensive assessment of the forecasting models indicates that the relative forecasting performance also crucially relies on the evaluation metrics. The superiority of the forecasting accuracy of the CEEMDAN-TDNN hybrid over the other competing models is determined by both the DM test and the Friedman test. The results of the DM and Friedman test (‘Supplementary Information Table S1 and S2’) along with the Taylor diagram confirm that the CEEMDAN-TDNN has outperformed other benchmark models in forecasting accuracy across all series (Fig. 3).

Taylor diagram of the forecasting models for (a) groundnut, (b) rapeseed & mustard and (c) linseed price series.

This study has evaluated the suitability of the CEEMDAN-TDNN hybrid model for non-linear, non-stationary agricultural price series forecasting in comparison with the other three major EMD variants (EMD, EEMD and CEEMD) and the benchmark (ARIMA, NLSVR, GBM, RF and TDNN) models using monthly wholesale prices of major oilseed crops in India. Outcomes from this investigation reflect that the CEEMDAN-TDNN models have provided uniformly better results than all other forecasting models regarding RMSE, RRMSE and MAPE values. However, in the case of turning point prediction, the CEEMD-TDNN model has exhibited the same ability as the CEEMDAN-TDNN model.

One important avenue for future research is incorporating additional explanatory variables tailored to each decomposed component. For instance, macroeconomic factors like inflation or export trends could be used to explain long-term trends, while short-term fluctuations could be modeled based on speculative or market-related variables. Weather variables such as rainfall, which directly affect agricultural production and, consequently, price movements, can be critical for improving the forecasting accuracy of price volatility. A richer set of explanatory variables will make the models more robust and provide better insights into the factors driving price dynamics.

The proposed hybrid model can be applied to agricultural data and similar time series data such as stock market, weather, pollution data, etc. Future research will explore implementing other machine learning or deep learning models to enhance efficiency further. Additionally, exploring the use of the Improved CEEMDAN (ICEEMDAN) method in place of CEEMDAN to assess potential improvements in results would be of interest.

The data supporting this study’s findings are available from the corresponding author upon reasonable request.

Anjoy, P. & Paul, R. K. Comparative performance of wavelet-based neural network approaches. Neural Comput. Appl. 31, 3443–3453 (2019).

Article Google Scholar

Jin, B. & Xu, X. Pre-owned housing price index forecasts using gaussian process regressions. J. Model. Manag. https://doi.org/10.1108/JM2-12-2023-0315 (2024).

Article Google Scholar

Jin, B. & Xu, X. Machine learning predictions of regional steel price indices for east China. Ironmak. Steelmak Process. Prod. Appl. https://doi.org/10.1177/03019233241254891 (2024).

Article Google Scholar

Jin, B. & Xu, X. Palladium price predictions via machine learning. Mater. Circ. Econ. 6, 32 (2024).

Article Google Scholar

Manogna, R. L. & Mishra, A. K. Forecasting spot prices of agricultural commodities in India: application of deep-learning models. Intell. Syst. Acc. Financ Manag. 28, 72–83 (2021).

Article Google Scholar

Jin, B. & Xu, X. Wholesale price forecasts of green grams using the neural network. Asian J. Econ. Bank. https://doi.org/10.1108/AJEB-01-2024-0007 (2024).

Article Google Scholar

Jin, B. & Xu, X. Price forecasting through neural networks for crude oil, heating oil, and natural gas. Meas. Energy. 1, 100001 (2024).

Article Google Scholar

Xu, X. & Zhang, Y. Corn cash price forecasting with neural networks. Comput. Electron. Agric. 184, 106120 (2021).

Article Google Scholar

Jin, B. & Xu, X. Forecasting wholesale prices of yellow corn through the gaussian process regression. Neural Comput. Appl. 36, 8693–8710 (2024).

Article Google Scholar

Elbeltagi, A. et al. Prediction of meteorological drought and standardized precipitation index based on the random forest (RF), random tree (RT), and gaussian process regression (GPR) models. Environ. Sci. Pollut Res. 30, 43183–43202 (2023).

Article Google Scholar

Masinde, M. Artificial neural networks models for predicting effective drought index: factoring effects of rainfall variability. Mitig Adapt. Strateg Glob Chang. 19, 1139–1162 (2014).

Article Google Scholar

Elbeltagi, A. et al. Drought indicator analysis and forecasting using data driven models: case study in Jaisalmer, India. Stoch. Environ. Res. Risk Assess. https://doi.org/10.1007/s00477-022-02277-0 (2022).

Article Google Scholar

Adikari, K. E. et al. Evaluation of artificial intelligence models for flood and drought forecasting in arid and tropical regions. Environ. Model. Softw. 144, 105136 (2021).

Article Google Scholar

Achite, M. et al. Performance of Machine Learning Techniques for Meteorological Drought Forecasting in the Wadi Mina Basin, Algeria. Water. 15, 765 (2023).

Article Google Scholar

Alade, I. O., Zhang, Y. & Xu, X. Modeling and prediction of lattice parameters of binary spinel compounds (AM2X4) using support vector regression with Bayesian optimization. New. J. Chem. 45, 15255–15266 (2021).

Article Google Scholar

Zhang, Y. & Xu, X. Solid particle erosion rate predictions through LSBoost. Powder Technol. 388, 517–525 (2021).

Article Google Scholar

Zhang, Y. & Xu, X. Solubility predictions through LSBoost for supercritical carbon dioxide in ionic liquids. New. J. Chem. 44, 20544–20567 (2020).

Article Google Scholar

Adnan, R. M., Parmar, K. S., Heddam, S., Shahid, S. & Kisi, O. Suspended sediment modeling using a Heuristic regression Method hybridized with kmeans Clustering. Sustainability. 13, 4648 (2021).

Article Google Scholar

Zhang, Y. & Xu, X. Disordered MgB2 superconductor critical temperature modeling through regression trees. Phys. C Supercond its Appl. 597, 1354062 (2022).

Article ADS Google Scholar

Araghi, A., Mousavi-Baygi, M., Adamowski, J., Martinez, C. & van der Ploeg, M. Forecasting soil temperature based on surface air temperature using a wavelet artificial neural network. Meteorol. Appl. 24, 603–611 (2017).

Article ADS Google Scholar

Wells, L. G., Ward, A. D., Moore, I. D. & Phillips, R. E. Comparison of four infiltration models in characterizing infiltration through Surface Mine profiles. Trans. Am. Soc. Agric. Eng. 29, 785–793 (1986).

Article Google Scholar

Barrera-Animas, A. Y. et al. Rainfall prediction: a comparative analysis of modern machine learning algorithms for time-series forecasting. Mach. Learn. Appl. 7, 100204 (2022).

Google Scholar

Aderemi, B. A., Olwal, T. O., Ndambuki, J. M. & Rwanga, S. S. Groundwater levels forecasting using machine learning models: a case study of the groundwater region 10 at Karst Belt, South Africa. Syst. Soft Comput. 5, 200049 (2023).

Article Google Scholar

Hounkpè, J. et al. Potential for seasonal flood forecasting in West Africa using climate indexes. J. Flood Risk Manag. n/a, e12833 (2022).

Article Google Scholar

Zhai, Y. et al. Modelling Soil Water infiltration and wetting patterns in variable Working-Head Moistube Irrigation. Agronomy. 13, 2987 (2023).

Article Google Scholar

Salem, S. et al. Applying Multivariate Analysis and Machine Learning approaches to evaluating Groundwater Quality on the Kairouan Plain. Tunisia Water. 15, 3495 (2023).

Article Google Scholar

Jadhav, V., Reddy, C., Gaddi, G. M. & B. V & Application of ARIMA Model for forecasting agricultural prices. J. Agric. Sci. Technol. 19, 981–992 (2017).

Google Scholar

Liu, T., Truong, N. D., Nikpour, A., Zhou, L. & Kavehei, O. Epileptic seizure classification with symmetric and hybrid bilinear models. IEEE J. Biomed. Heal Inf. 24, 2844–2851 (2020).

Article Google Scholar

Abebe, A., Temesgen, A. & Kebede, B. Modeling inflation rate factors on present consumption price index in Ethiopia: threshold autoregressive models approach. Futur Bus. J. 9, 72 (2023).

Article Google Scholar

Dalavi, P. et al. Modeling runoff in Bhima River catchment, India: a comparison of artificial neural networks and empirical models. Water Pract. Technol. https://doi.org/10.2166/wpt.2024.157 (2024).

Article Google Scholar

Raza, A. et al. Use of gene expression programming to predict reference evapotranspiration in different climatic conditions. Appl. Water Sci. 14, 152 (2024).

Article ADS Google Scholar

Kushwaha, N. L. et al. Stacked hybridization to enhance the performance of artificial neural networks (ANN) for prediction of water quality index in the Bagh river basin, India. Heliyon. 10, e31085 (2024).

Article PubMed PubMed Central Google Scholar

Gupta, S. et al. Sensitivity of daily reference evapotranspiration to weather variables in tropical savanna: a modelling framework based on neural network. Appl. Water Sci. 14, 138 (2024).

Article ADS Google Scholar

Joshi, B. et al. A comparative survey between cascade correlation neural network (CCNN) and feedforward neural network (FFNN) machine learning models for forecasting suspended sediment concentration. Sci. Rep. 14, 10638 (2024).

Article ADS PubMed PubMed Central Google Scholar

Babu, C. N. & Reddy, B. E. A moving-average filter based hybrid ARIMA–ANN model for forecasting time series data. Appl. Soft Comput. 23, 27–38 (2014).

Article Google Scholar

Tealab, A. Time series forecasting using artificial neural networks methodologies: a systematic review. Futur Comput. Inf. J. 3, 334–340 (2018).

Google Scholar

Singh, A. & Mishra, G. C. Application of Box-Jenkins method and Artificial neural network procedure for time series forecasting of prices. Stat. Transit. new. Ser. 16, 83–96 (2015).

Article Google Scholar

Areef, M. & Radha, Y. Application of GARCH and ANN models for potato price forecasting: a case study of Bangalore market, Karnataka state. Indian J. Agric. Mark. 34, 44–52 (2020).

Google Scholar

Dhifaoui, Z., Khalfaoui, R., Ben Jabeur, S. & Abedin, M. Z. Exploring the effect of climate risk on agricultural and food stock prices: fresh evidence from EMD-Based variable-lag transfer entropy analysis. J. Environ. Manage. 326, 116789 (2023).

Article PubMed Google Scholar

Ahmad, N., Yi, X., Tayyab, M., Zafar, M. H. & Akhtar, N. Water resource management and flood mitigation: hybrid decomposition EMD-ANN model study under climate change. Sustain. Water Resour. Manag. 10, 71 (2024).

Article Google Scholar

Rana, H., Farooq, M. U., Kazi, A. K., Baig, M. A. & Akhtar, M. A. Prediction of Agricultural Commodity prices using Big Data Framework. Eng. Technol. Appl. Sci. Res. 14, 12652–12658 (2024).

Article Google Scholar

Fu, L. & Zhang, H. Analysis of factors influencing small-scale agricultural product prices from the perspective of the online public—a case study of China. Front. Sustain. Food Syst. 8, 1355853 (2024). https://doi.org/10.3389/fsufs.2024.1355853.

Bonato, M., Cepni, O., Gupta, R. & Pierdzioch, C. Forecasting the realized volatility of agricultural commodity prices: does sentiment matter? J. Forecast. 43, 2088–2125 (2024).

Article MathSciNet Google Scholar

Singla, S. K., Garg, R. D. & Dubey, O. P. Ensemble Machine Learning Methods to Estimate the sugarcane yield based on remote sensing information. Rev. d’Intelligence Artif. 34, 731–743 (2020).

Google Scholar

Shankar, S. V. et al. Comparative study on Key Time Series models for exploring the Agricultural Price volatility in Potato prices. Potato Res. https://doi.org/10.1007/s11540-024-09776-3 (2024).

Article Google Scholar

Suna, R. & Ma, H. Commodity Price Fluctuation Prediction Based on Neural Network. in IEEE 3rd International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA) 1603–1607 (IEEE, 2024). https://doi.org/10.1109/EEBDA60612.2024.10485707 (2024).

Guo, Y. et al. Agricultural price prediction based on data mining and attention-based gated recurrent unit: a case study on China’s hog. J. Intell. Fuzzy Syst. 46, 9923–9943 (2024).

Article Google Scholar

Cheng, M., Xu, K., Geng, G., Liu, H. & Wang, H. Carbon price prediction based on advanced decomposition and long short-term memory hybrid model. J. Clean. Prod. 451, 142101 (2024).

Article Google Scholar

Zhu, Y. et al. A hybrid model for Carbon Price forecasting based on Improved feature extraction and non-linear integration. Mathematics. 12, 1428 (2024).

Article Google Scholar

Ghimire, S. et al. Half-hourly electricity price prediction with a hybrid convolution neural network-random vector functional link deep learning approach. Appl. Energy. 374, 123920 (2024).

Article Google Scholar

Lotfipoor, A., Patidar, S. & Jenkins, D. P. Deep neural network with empirical mode decomposition and bayesian optimisation for residential load forecasting. Expert Syst. Appl. 237, 121355 (2024).

Article Google Scholar

Kontopoulou, V. I., Panagopoulos, A. D., Kakkos, I. & Matsopoulos, G. K. A review of ARIMA vs. Machine Learning approaches for Time Series forecasting in Data Driven Networks. Futur Internet. 15, 255 (2023).

Article Google Scholar

Zhang, X., Ren, H., Liu, J., Zhang, Y. & Cheng, W. A monthly temperature prediction based on the CEEMDAN–BO–BiLSTM coupled model. Sci. Rep. 14, 808 (2024).

Article ADS PubMed PubMed Central Google Scholar

Niu, M., Wang, Y., Sun, S. & Li, Y. A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2.5 concentration forecasting. Atmos. Environ. 134, 168–180 (2016).

Article ADS Google Scholar

Fang, K., Zhang, H., Qi, H. & Dai, Y. Comparison of EMD and EEMD in rolling bearing fault signal analysis. in IEEE International Instrumentation and Measurement Technology Conference (I2MTC) 1–5 (IEEE, 2018).https://doi.org/10.1109/I2MTC.2018.8409666 (2018).

Tayyab, M., Ahmad, I., Sun, N., Zhou, J. & Dong, X. Application of Integrated Artificial neural networks based on decomposition methods to Predict Streamflow at Upper Indus Basin, Pakistan. Atmos. (Basel). 9, 494 (2018).

ADS Google Scholar

Aamir, M., Shabri, A. & Ishaq, M. Crude oil price forecasting by CEEMDAN based hybrid model of ARIMA and Kalman filter. J. Teknol. 80, 67–79 (2018).

Article Google Scholar

Cao, J., Li, Z. & Li, J. Financial time series forecasting model based on CEEMDAN and LSTM. Phys. Stat. Mech. its Appl. 519, 127–139 (2019).

Article Google Scholar

Fang, Y., Guan, B., Wu, S. & Heravi, S. Optimal forecast combination based on ensemble empirical mode decomposition for agricultural commodity futures prices. J. Forecast. 39, 877–886 (2020).

Article MathSciNet Google Scholar

Lin, Y., Yan, Y., Xu, J., Liao, Y. & Ma, F. Forecasting stock index price using the CEEMDAN-LSTM model. North. Am. J. Econ. Financ. 57, 101421 (2021).

Article Google Scholar

Seyrek, P., Şener, B., Özbayoğlu, A. M. & Ünver, H. Ö. An evaluation study of EMD, EEMD, and VMD for Chatter Detection in Milling. Procedia Comput. Sci. 200, 160–174 (2022).

Article Google Scholar

Liu, X., Zhang, Y. & Zhang, Q. Comparison of EEMD-ARIMA, EEMD-BP and EEMD-SVM algorithms for predicting the hourly urban water consumption. J. Hydroinformatics. 24, 535–558 (2022).

Article Google Scholar

Liao, S. et al. Runoff Forecast Model based on an EEMD-ANN and Meteorological factors using a multicore parallel algorithm. Water Resour. Manag. 37, 1539–1555 (2023).

Article Google Scholar

Shahbazi, M., Zarei, H. & Solgi, A. A new approach in using the GRACE satellite data and artificial intelligence models for modeling and predicting the groundwater level (case study: Aspas aquifer in Southern Iran). Environ. Earth Sci. 83, 240 (2024).

Article ADS Google Scholar

Heddam, S. et al. Hybrid river stage forecasting based on machine learning with empirical mode decomposition. Appl. Water Sci. 14, 46 (2024).

Article Google Scholar

Effrosynidis, D., Spiliotis, E., Sylaios, G. & Arampatzis, A. Time series and regression methods for univariate environmental forecasting: an empirical evaluation. Sci. Total Environ. 875, 162580 (2023).

Article PubMed Google Scholar

Pandit, P. et al. Hybrid time series models with exogenous variable for improved yield forecasting of major Rabi crops in India. Sci. Rep. 13, 22240 (2023).

Article ADS PubMed PubMed Central Google Scholar

Vaughan, L. et al. An exploration of challenges associated with machine learning for time series forecasting of COVID-19 community spread using wastewater-based epidemiological data. Sci. Total Environ. 858, 159748 (2023).

Article PubMed Google Scholar

Mirzania, E., Vishwakarma, D. K., Bui, Q. A. T., Band, S. S. & Dehghani, R. A novel hybrid AIG-SVR model for estimating daily reference evapotranspiration. Arab. J. Geosci. 16, 301 (2023).

Article Google Scholar

Abed, M., Imteaz, M. A., Ahmed, A. N. & Huang, Y. F. Application of long short-term memory neural network technique for predicting monthly pan evaporation. Sci. Rep. 11, 20742 (2021).

Article ADS PubMed PubMed Central Google Scholar

Kumar, D. et al. Multi-ahead electrical conductivity forecasting of surface water based on machine learning algorithms. Appl. Water Sci. 13, 192 (2023).

Article ADS Google Scholar

Satpathi, A. et al. Estimation of crop evapotranspiration using statistical and machine learning techniques with limited meteorological data: a case study in Udham Singh Nagar, India. Theor. Appl. Climatol. https://doi.org/10.1007/s00704-024-04953-3 (2024).

Article Google Scholar

Venujayakanth, B., Dudhat, A. S., Swaminathan, B. & Ardeshana, N. J. Price integration analysis of Major Groundnut domestic markets in India. Econ. Aff. 62, 233 (2017).

Article Google Scholar

Kumari, A. A., Subbarao, D. V. & Suseela, K. Cointegration and Market Integration: an application to the Oilseeds markets in India. Trends Biosci. 10, 4242–4252 (2017).

Google Scholar

Singh, V. P., Singh, R., Paul, P. K., Bisht, D. S. & Gaur, S. Time Series Analysis. in Hydrological Processes Modelling and Data Analysis. Water Science and Technology Library (eds. Singh, V. P., Singh, R., Paul, P. K., Bisht, D. S. & Gaur, S.) 35–71Springer Nature Singapore. https://doi.org/10.1007/978-981-97-1316-5_3 (2024).

Haque, M. A. & Ahmed, A. Time Series modeling and forecasting on GDP Data of Bangladesh: an application of Arima Model. Int. J. Latest Technol. Eng. Manag Appl. Sci. XIII, 199–207 (2024).

Article Google Scholar

Box, G. E. P., Jenkins, G. M., Reinsel, G. C. & Ljung, G. M. Time Series Analysis: Forecasting and Control (Wiley, 2015).

Elbeltagi, A. et al. Modelling daily reference evapotranspiration based on stacking hybridization of ANN with meta-heuristic algorithms under diverse agro-climatic conditions. Stoch. Environ. Res. Risk Assess. 36, 3311–3334 (2022).

Article Google Scholar

Kang, H., He, B., Song, R. & Wang, W. ECAPA-TDNN based online discussion activity-level evaluation. Sci. Rep. 14, 14744 (2024).

Article PubMed PubMed Central Google Scholar

Wang, Y. G., Wu, J., Hu, Z. H. & McLachlan, G. J. A new algorithm for support vector regression with automatic selection of hyperparameters. Pattern Recognit. 133, 108989 (2023).

Article Google Scholar

Elbeltagi, A., Al-Mukhtar, M., Kushwaha, N. L., Al-Ansari, N. & Vishwakarma, D. K. Forecasting monthly pan evaporation using hybrid additive regression and data-driven models in a semi-arid environment. Appl. Water Sci. 13, 42 (2023).

Article ADS Google Scholar

Huang, N. E. et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. London. Ser. A Math. Phys. Eng. Sci. 454, 903–995 (1998).

Ren, Y., Suganthan, P. N. & Srikanth, N. A. Comparative study of empirical Mode decomposition-based short-term wind speed forecasting methods. IEEE Trans. Sustain. Energy. 6, 236–244 (2015).

Article ADS Google Scholar

Wu, Z. & Huang, N. E. Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv. Adapt. Data Anal. 01, 1–41 (2009).

Article Google Scholar

Yeh, J. R., Shieh, J. S. & Huang, N. E. Complementary ensemble empirical mode decomposition: a novel noise enhanced data analysis method. Adv. Adapt. Data Anal. 02, 135–156 (2010).

Article MathSciNet Google Scholar

Torres, M. E., Colominas, M. A., Schlotthauer, G. & Flandrin, P. A complete ensemble empirical mode decomposition with adaptive noise. in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4144–4147 (IEEE, 2011). https://doi.org/10.1109/ICASSP.2011.5947265 (2011).

Niu, H. & Xu, K. A hybrid model combining variational mode decomposition and an attention-GRU network for stock price index forecasting. Math. Biosci. Eng. 17, 7151–7166 (2020).

Article MathSciNet PubMed Google Scholar

Prasad, R., Deo, R. C., Li, Y. & Maraseni, T. Input selection and performance optimization of ANN-based streamflow forecasts in the drought-prone Murray Darling Basin region using IIS and MODWT algorithm. Atmos. Res. 197, 42–63 (2017).

Article Google Scholar

Ağbulut, Ü., Gürel, A. E. & Biçen, Y. Prediction of daily global solar radiation using different machine learning algorithms: evaluation and comparison. Renew. Sustain. Energy Rev. 135, 110114 (2021).

Article Google Scholar

Xu, X. & Zhang, Y. House price forecasting with neural networks. Intell. Syst. Appl. 12, 200052 (2021).

Google Scholar

Xu, X. & Zhang, Y. Commodity price forecasting via neural networks for coffee, corn, cotton, oats, soybeans, soybean oil, sugar, and wheat. Intell. Syst. Acc. Financ Manag. 29, 169–181 (2022).

Article Google Scholar

Xu, X. & Zhang, Y. Coking coal futures price index forecasting with the neural network. Min. Econ. 36, 349–359 (2023).

Article Google Scholar

Youssef, A. & R004 (IPTC. Online Sequence-Based Deep Learning Approach for Metallic Debossed and Embossed Turbomachinery Blade Text Recognition Application. in Day 1 Mon, February 12, D011S011, 2024).https://doi.org/10.2523/IPTC-23115-MS (2024).

Wang, J., Zhou, Y., Zhuang, L., Shi, L. & Zhang, S. A model of maritime accidents prediction based on multi-factor time series analysis. J. Mar. Eng. Technol. 22, 153–165 (2023).

Article Google Scholar

Ley, C. & Verdebout, T. Modern directional statistics. Chapman Hall/CRC. https://doi.org/10.1201/9781315119472 (2017).

Article Google Scholar

Markuna, S. et al. Application of innovative machine learning techniques for long-term Rainfall Prediction. Pure Appl. Geophys. 180, 335–363 (2023).

Article ADS Google Scholar

Khan, A. M. & Osińska, M. Comparing forecasting accuracy of selected grey and time series models based on energy consumption in Brazil and India. Expert Syst. Appl. 212, 118840 (2023).

Article Google Scholar

Wójcik, M. & Siatkowski, I. The effect of cranial techniques on the heart rate variability response to psychological stress test in firefighter cadets. Sci. Rep. 13, 7780 (2023).

Article ADS PubMed PubMed Central Google Scholar

Worden, K., Iakovidis, I. & Cross, E. J. New results for the ADF statistic in nonstationary signal analysis with a view towards structural health monitoring. Mech. Syst. Signal. Process. 146, 106979 (2021).

Article Google Scholar

Inglada-Perez, L. A Comprehensive Framework for uncovering Non-linearity and Chaos in Financial markets: empirical evidence for four Major Stock Market Indices. Entropy. 22, 1435 (2020).

Article ADS MathSciNet PubMed PubMed Central Google Scholar

Maharana, K., Mondal, S. & Nemade, B. A review: data pre-processing and data augmentation techniques. Glob Transitions Proc. 3, 91–99 (2022).

Article Google Scholar

Wang, J. N., Du, J., Jiang, C. & Lai, K. K. Chinese Currency Exchange Rates Forecasting with EMD-Based neural network. Complexity. 2019, 1–15 (2019).

Google Scholar

Dong, J., Dai, W., Tang, L. & Yu, L. Why do EMD-based methods improve prediction? A multiscale complexity perspective. J. Forecast. 38, 714–731 (2019).

Article MathSciNet Google Scholar

Zhang, X., Wang, T. & He, S. Prediction of groundwater depth based on CEEMD-BP coupling model in irrigation area. Desalin. Water Treat. 228, 444–455 (2021).

Article Google Scholar

Gao, C. et al. Hyperspectral Prediction Model of Nitrogen Content in Citrus leaves based on the CEEMDAN–SR Algorithm. Remote Sens. 15, 5013 (2023).

Article ADS Google Scholar

Bennia, F., Moussaoui, S., Boutalbi, M. C. & Messaoudi, N. Comparative study between EMD, EEMD, and CEEMDAN based on De-Noising Bioelectric Signals. in 8th International Conference on Image and Signal Processing and their Applications (ISPA) 1–6 (IEEE, 2024). doi: (2024). https://doi.org/10.1109/ISPA59904.2024.10536839

Gyamerah, S. A. & Owusu, V. Short- and long-term weather prediction based on a hybrid of CEEMDAN, LMD, and ANN. PLoS One. 19, e0304754 (2024).

Article PubMed PubMed Central Google Scholar

Zhang, P. & Ci, B. Deep belief network for gold price forecasting. Resour. Policy. 69, 101806 (2020).

Article Google Scholar

Download references

The authors appreciate the Researchers Supporting Project number (RSP 2024R75), King Saudi University, Riyadh, Saudi Arabia. The authors acknowledge the technical support from Rabindra Nath Tagore Agriculture College, Deoghar during this research.

Open Access funding enabled and organized by Projekt DEAL. This research received no external funding.

Open Access funding enabled and organized by Projekt DEAL.

Department of Agricultural Statistics & Computer Application, Rabindra Nath Tagore Agriculture College, Birsa Agricultural University, Ranchi, 834006, India

Pramit Pandit

Department of Agricultural Engineering, Rabindra Nath Tagore Agriculture College, Birsa Agricultural University, Ranchi, 834006, India

Atish Sagar

Department of Agricultural Statistics, Bidhan Chandra Krishi Viswavidyalaya, Mohanpur, 741252, India

Bikramjeet Ghose & Moumita Paul

Department of Civil Engineering, University of Applied Sciences, 23562, Lübeck, Germany

Ozgur Kisi

Department of Civil Engineering, Ilia State University, Tbilisi, 0162, Georgia

Ozgur Kisi

School of Civil, Environmental and Architectural Engineering, Korea University, Seoul, 02841, South Korea

Ozgur Kisi

Department of Irrigation and Drainage Engineering, G. B. Pant University of Agriculture and Technology, Pantnagar, Uttarakhand, 263145, India

Dinesh Kumar Vishwakarma

Department of Zoology, College of Science, King Saud University, Riyadh, 11472, Saudi Arabia

Lamjed Mansour

Department of Environmental Science, Parul Institute of Applied Sciences, Parul University, Vadodara, 391760, Gujarat, India

Krishna Kumar Yadav

Environmental and Atmospheric Sciences Research Group, Scientific Research Center, Al-Ayen University, Thi- Qar, Nasiriyah, 64001, Iraq

Krishna Kumar Yadav

You can also search for this author in PubMed Google Scholar

Conceptualization, P.P. and A.S.; methodology, P.P. and B.G.; software, P.P., A.S. and B.G.; validation, B.G., M.P. and L.M.; formal analysis, P.P., B.G. and M.P.; investigation, P.P. and A.S.; data curation, A.S., and D.K.V.; writing—original draft preparation, P.P. and M.P.; writing—review and editing, A.S., O.K., D.K.V., L.M., and K.K.Y.; visualization, A.S. and D.K.V.; supervision, P.P. and A.S. All authors have read and agreed to the published version of the manuscript.

Correspondence to Ozgur Kisi, Dinesh Kumar Vishwakarma or Lamjed Mansour.

The authors declare no competing interests.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Below is the link to the electronic supplementary material.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

Pandit, P., Sagar, A., Ghose, B. et al. Hybrid modeling approaches for agricultural commodity prices using CEEMDAN and time delay neural networks. Sci Rep 14, 26639 (2024). https://doi.org/10.1038/s41598-024-74503-4

Download citation

Received: 20 May 2024

Accepted: 26 September 2024

Published: 04 November 2024

DOI: https://doi.org/10.1038/s41598-024-74503-4

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative