New Feature Selection Using Principal Component Analysis

Zaid Mundher Radeef, University of Technology – Iraq, Department of Computer Science, Al-Sina’a St., Al-Wehda District, 10066 Baghdad, IraqFollow
Soukaena Hassan Hashem, University of Technology – Iraq, Department of Computer Science, Al-Sina’a St., Al-Wehda District, 10066 Baghdad, IraqFollow
Ekhlas Khalaf Gbashi, University of Technology – Iraq, Department of Computer Science, Al-Sina’a St., Al-Wehda District, 10066 Baghdad, IraqFollow

Abstract

Dimensionality reduction techniques streamline machine learning by reducing data complexity, improving model accuracy, and cutting computational costs. They remove noise and irrelevant features, making models faster and more efficient. These techniques also enhance data visualization and interpretation by condensing data into manageable, insightful dimensions. Ultimately, dimensionality reduction leads to simpler, more interpretable models without sacrificing critical information, making it a cornerstone of efficient data analysis and machine learning applications. Theoretically, feature extraction tends to create new features that encapsulate more information by combining multiple existing features, resulting in more concentrated and informative features. In contrast, feature selection involves choosing a subset of the original features without altering their content. In this paper, a feature selection method based on Principal Component Analysis (PCA) is proposed, along with a comparative study of PCA performance as a feature extraction technique and the newly proposed feature selection method. The proposed feature selection method utilizes the variance scored by the principal component to identify the features with the most effect on the principal component's variance and selects them as the best set of features.

Experimental results demonstrate that the proposed PCA-based feature selection achieves comparable or improved performance across various classifiers, while maintaining high accuracy and precision, even with fewer features. For instance, when using the proposed method with Network Security Laboratory - Knowledge Discovery in Databases (NSL-KDD) to select only one feature and employing six different classifiers (Decision Tree, Naive Bayes, Logistic Regression, K-Neighbors Classifier, XGBoost, and AdaBoost) to evaluate performance, the accuracy of 80.88%, 81.29%, 43.07%, 44.53%, 84.94%, and 82.87% were obtained using the listed classifiers in the same order. On the other hand, when using PCA for feature extraction the following accuracy values, listed in the same classifier order, were obtained: 76.64%, 76.10%, 43.07%, 47.40%, 80.57%, and 82.05%, demonstrating that the proposed method delivers higher accuracies. Similarly, for the mushroom dataset, the accuracies were 51.38%, 51.38%, 48.62%, 51.06%, 87.08%, and 86.22%, compared to 50.14%, 50.30%, 50.88%, 50.59%, 73.60%, and 71.51%.

Recommended Citation

Radeef, Zaid Mundher; Hashem, Soukaena Hassan; and Gbashi, Ekhlas Khalaf (2024) "New Feature Selection Using Principal Component Analysis," Journal of Soft Computing and Computer Applications: Vol. 1: Iss. 2, Article 1012.
DOI: https://doi.org/10.70403/3008-1084.1012

Download

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

New Feature Selection Using Principal Component Analysis

Authors

Abstract

Recommended Citation

Included in

Share

Search