An Enhanced Online Boosting Ensemble Classification Technique to Deal with Data Drift
Keywords:Boosting, concept drift, drift detectors, data stream mining, ensemble classification
Over the last two decades, big data analytics has become a requirement in the research industry. Stream data mining is essential in many areas because data is generated in the form of streams in a wide variety of online applications. Along with the size and speed of the data stream, concept drift is a difficult issue to handle. This paper proposes an Enhanced Boosting-like Online Learning Ensemble Method based on a heuristic modification to the Boosting-like Online Learning Ensemble (BOLE). This algorithm has been improved by implementing a data instance that retains the previous state policy. During the boosting phase of this modified algorithm, the selection and voting strategy for an instance is advanced. Extensive experimental results on a variety of real-world and synthetic datasets show that the proposed method adequately addresses the drift detection problem. It has outperformed several state-of-the-art boosting-based ensembles dedicated to data stream mining (statistically). The proposed method improved overall accuracy by 1.30 percent to 14.45 percent when compared to other boosting-based ensembles on concept drifted datasets.
H. M. Gomes, J. P. Barddal, A. F. Enembreck, and A. Bifet, “A survey on ensemble learning for data stream classification,” ACM Comput. Surv., vol. 50, no. 2, pp. 1-36, 2017. https://doi.org/10.1145/3054925.
B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, and M. Woźniak, “Ensemble learning for data stream analysis: A survey,” Inf. Fusion, vol. 37, pp. 132–156, 2017. https://doi.org/10.1016/j.inffus.2017.02.004.
T. Phanomsophon, N. Jaisue, N. Tawinteung, L. Khurnpoon, and P. Sirisomboon, “Classification of N, P, and K concentrations in durian (Durio Zibethinus Murray CV. Mon Thong) leaves using near-infrared spectroscopy,” Eng. Appl. Sci. Res., vol. 49, no. 1, pp. 127–132, 2022. https://doi.org/10.14456/easr.2022.15.
T. Evgeniou and M. Pontil, “Support vector machines: Theory and applications,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 2049 LNAI, no. May, pp. 249–257, 2001, https://doi.org/10.1007/3-540-44673-7_12.
H. Abdulsalam, D. B. Skillicorn, and P. Martin, “Classification using streaming random forests,” IEEE Trans. Knowl. Data Eng., vol. 23, no. 1, pp. 22–36, 2011, https://doi.org/10.1109/TKDE.2010.36.
I. Zliobaite, A. Bifet, B. Pfahringer, and G. Holmes, “Active learning with drifting streaming data,” IEEE Trans. Neural Networks Learn. Syst., vol. 25, no. 1, pp. 27–39, 2014, https://doi.org/10.1109/TNNLS.2012.2236570.
K. Nishida and K. Yamauchi, “Adaptive classifiers-ensemble system for tracking concept drift,” Proc. of the Sixth Int. Conf. Mach. Learn. Cybern. ICMLC 2007, vol. 6, no. August, pp. 3607–3612, 2007, https://doi.org/10.1109/ICMLC.2007.4370772.
J. Liu, G. S. Xu, S. H. Zheng, D. Xiao, and L. Z. Gu, “Data streams classification with ensemble model based on decision-feedback,” J. China Univ. Posts Telecommun., vol. 21, no. 1, pp. 79–85, 2014, https://doi.org/10.1016/S1005-8885(14)60272-7.
L. L. Minku and X. Yao, “DDD: A new ensemble approach for dealing with concept drift,” IEEE Trans. Knowl. Data Eng., vol. 24, no. 4, pp. 619–633, 2012, https://doi.org/10.1109/TKDE.2011.58.
H. He and S. Chen, “Towards incremental learning of nonstationary imbalanced data stream: A multiple selectively recursive approach,” Evol. Syst., vol. 2, no. 1, pp. 35–50, 2011, https://doi.org/10.1007/s12530-010-9021-y.
K. K. Wankhade, K. C. Jondhale, and S. S. Dongre, “A clustering and ensemble based classifier for data stream classification,” Appl. Soft Comput., vol. 102, p. 107076, 2021, https://doi.org/10.1016/j.asoc.2020.107076.
P. Zhang, X. Zhu, J. Tan, and L. Guo, “Classifier and cluster ensembles for mining concept drifting data streams,” Proc. of the IEEE Int. Conf. Data Mining, ICDM, pp. 1175–1180, 2010, https://doi.org/10.1109/ICDM.2010.125.
N. C. Oza and S. Russell, “Experimental comparisons of online and batch versions of bagging and boosting,” Proc. of the Seventh ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 359–364, 2001, https://doi.org/10.1145/502512.502565.
J. Gama, P. P. Rodrigues, and R. Sebastião, “Evaluating algorithms that learn from data streams,” Proc. of the ACM Symp. Appl. Comput., pp. 1496–1500, 2009, https://doi.org/10.1145/1529282.1529616.
Okfalisa et al., “Forecasting company financial distress: C4.5 and adaboost adoption,” Eng. Appl. Sci. Res., vol. 49, no. 3, pp. 300–307, 2022, https://doi.org/10.14456/easr.2022.31.
R. Elwell and R. Polikar, “Incremental learning of concept drift in nonstationary environments,” IEEE Trans. Neural Networks, vol. 22, no. 10, pp. 1517–1531, 2011, https://doi.org/10.1109/TNN.2011.2160459.
R. C. Samant and D. D. M. Thakore, “A rigorous review on an ensemble based data stream drift classification methods,” Int. J. Comput. Sci. Eng., vol. 7, no. 5, pp. 380–385, 2019. https://doi.org/10.26438/ijcse/v7i5.380385.
R. C. Samant and S. H. Patil, “Adequacy of effectual ensemble classification approach to detect drift in data streams,” Proceedings of the 2022 International Conference for Advancement in Technology (ICONAT), Jan. 2022, pp. 1–6. https://doi.org/10.1109/ICONAT53423.2022.9725854.
S. R. Nikunj oza, “Online bagging and boosting,” in 8th Int. Workshop on Artificial Intelligence and Statistics, 2001, pp. 105–112.
S. G. T. D. C. Santos, P. M. Gonçalves, G. D. D. S. Silva, and R. S. M. De Barros, “Speeding up recovery from concept drifts,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 8726 LNAI, no. PART 3, pp. 179–194, 2014, https://doi.org/10.1007/978-3-662-44845-8_12.
R. S. M. d. Barros, S. Garrido T. de Carvalho Santos and P. M. Gonçalves Júnior, “A Boosting-like Online Learning Ensemble,” Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), 2016, pp. 1871-1878, https://doi.org/10.1109/IJCNN.2016.7727427.
R. Pelossof, M. Jones, I. Vovsha, and C. Rudin, “Online coordinate boosting,” Proceedings of the 2009 IEEE 12th Int. Conf. Comput. Vis. Work. ICCV Work. 2009, pp. 1354–1361, 2009, https://doi.org/10.1109/ICCVW.2009.5457454.
A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, “MOA: Massive Online Analysis,” J. Mach. Learn. Res., vol. 11, pp. 1601–1604, 2010.
J. Gama, P. Medas, G. Castillo, and P. Rodrigues, “Learning with drift detection,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3171, no. September, pp. 286–295, 2004, https://doi.org/10.1007/978-3-540-28645-5_29.
R. Samant and S. Patil, “Comparative analysis of drift detection techniques used in ensemble classification approach,” Proceedings of the International Conference on Recent Challenges in Engineering Science and Technology (ICRCEST 2K21), 2021, pp. 201–204.
How to Cite
LicenseInternational Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:
• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.