Interpretability

Investigating the prediction of semiconductor wafer production through classification AI models

Ahan Mathew, Rajat Dandenkar

Global Indian International School

May 22, 2026

Abstract

Semiconductor wafer manufacturing is highly susceptible to defects that can reduce production yield and increase manufacturing costs. This study investigates the use of machine learning classification models to predict defective semiconductor wafers at an early stage of production using the SECOM dataset from the University of California Irvine. Several classification approaches, including Logistic Regression, Random Forest, Support Vector Machine (SVM), Gradient Boosting, and XGBoost, were evaluated alongside imbalance-handling techniques such as SMOTE oversampling and cost-sensitive learning. Model performance was assessed using precision, recall, F1-score, accuracy, and AUC metrics. Results showed that baseline models achieved high overall accuracy due to the dominance of non-defective samples but performed poorly in detecting defective wafers. Hybrid approaches using SMOTE and class balancing slightly improved minority-class recall, although defect detection performance remained limited because of the severe class imbalance within the dataset. The findings highlight both the challenges and potential of AI-based approaches for semiconductor defect prediction and suggest that future improvements should focus on advanced resampling methods, imbalance-specific algorithms, and ensemble learning techniques to enhance predictive reliability in manufacturing environments.

Full Paper