A research paper · 2026 · 12 chapters

Data‑Driven
Customer Review
Analysis

A machine learning approach for business decision making — predicting review outcomes and turning signal into strategy across 99,441 Brazilian e‑commerce orders.

Best Model

Random Forest

F1 0.7565

Top Accuracy

75.40%

Random Forest · 80/20 split

Strongest Signal

Delivery

delay_days · is_late · delivery_time

Models Compared

3

Logistic · RF · XGBoost

02 · Abstract

An overview of the study.

This project presents a data‑driven analysis of customer review outcomes using machine learning techniques to support business decision‑making. The study investigates how operational, product, and customer‑related variables influence review ratings and evaluates the predictive performance of three models: Logistic Regression, Random Forest, and XGBoost.

A complete analytical pipeline was implemented — data preprocessing, exploratory data analysis, feature engineering, model training, and performance evaluation. Results show that Random Forest achieved the highest F1 Score, demonstrating strong predictive reliability and balanced performance across classes. Feature importance analysis revealed that delivery‑related attributes, pricing characteristics, and product features significantly affect customer satisfaction.

The findings provide actionable insights for improving operational efficiency and customer experience, while highlighting opportunities for future research and model enhancement.

04 · Introduction

Why review outcomes matter.

Customer reviews have become a central component of modern digital marketplaces, influencing consumer decisions and shaping business reputation. As competition intensifies across industries, organizations increasingly rely on data‑driven methods to understand customer sentiment, identify operational weaknesses, and guide strategic improvements.

This project applies machine learning techniques to predict customer review ratings and identify the most influential factors driving customer satisfaction. The analytical workflow includes data preprocessing, exploratory data analysis, feature engineering, model training, and performance evaluation. By comparing multiple machine learning models, the study aims to determine which approach provides the most reliable predictive performance for business applications.

Beyond prediction, the project emphasizes the translation of analytical findings into actionable business insights. The results support evidence‑based decision making, enabling organizations to optimize operations, improve customer experience, and enhance overall business performance.

Theoretical foundation

From a data science perspective, customer reviews represent a classification problem in which the goal is to predict discrete outcomes based on measurable attributes. Machine learning provides a robust framework for modeling such relationships, as it can capture complex, nonlinear interactions between variables that traditional statistical methods may overlook. Algorithms such as Logistic Regression, Random Forest, and XGBoost are widely used in predictive analytics due to their ability to generalize patterns, handle large datasets, and provide interpretable insights through feature importance analysis.

Table of Contents

Five chapters, one investigation.