From Pixels to Predictions: Our First Dive into Image Recognition
By Bitan Das, Baratam Nidhishri, Bhavye Garg
The LNM Institute of Information Technology
Have you ever wondered how your phone recognizes faces or how computers distinguish between different objects? It’s all thanks to the magic of image recognition! As a team of college students eager to explore computer vision, we built our very first model to identify handwritten digits — a classic challenge that taught us the foundations of deep learning.
This project marked our first real step into teaching machines how to “see” and make predictions from raw pixel data.
The “Why”: Seeing Beyond the Pixels
Unlike humans, computers don’t see shapes — they see numbers. Each image is simply a grid of pixel values. Teaching a machine to distinguish a handwritten ‘7’ from a ‘1’ is a fascinating challenge and a perfect introduction to Artificial Neural Networks (ANNs). Our goal was to transform these numerical grids into meaningful predictions.
Data at Hand: The Famous MNIST Dataset
We used the iconic MNIST dataset, containing thousands of 28×28 grayscale images of handwritten digits (0–9). Often called the “Hello World” of deep learning, MNIST is ideal for beginners to understand image classification without unnecessary complexity.
Preparing Our “Eyes”: Data Preprocessing
Before training, the raw image data required careful preprocessing:
- Normalization: Pixel values were scaled from 0–255 down to 0–1, making learning more stable and efficient.
- Flattening: Each 28×28 image was converted into a 1D vector of 784 values so it could be fed into our neural network.
Building Our Brain: The Keras Model Architecture
We chose TensorFlow’s Keras API for its simplicity and flexibility. Our ANN architecture included:
- Input Layer: Accepting 784 pixel values per image.
- Hidden Layers: Two dense layers with 128 and 32 neurons, using ReLU activation to learn complex digit features.
- Output Layer: 10 neurons (digits 0–9) with Softmax activation to produce probability scores.
The Learning Process: Training & Evaluation
The model was compiled using the Adam optimizer and sparse categorical crossentropy loss function. We trained it for 20 epochs, during which it rapidly improved its accuracy.
- Accuracy & Loss Graphs: Helped us track learning progress and detect overfitting.
- Confusion Matrix: Visualized correct predictions and common misclassifications between similar-looking digits.
The final model achieved an impressive ~97.4% accuracy on unseen test data.
Putting It to the Test: Real-World Predictions
The most exciting moment was watching the model predict digits it had never seen before. Seeing correct predictions in real time truly brought the power of neural networks to life.
Our Deep Learning Debut: What We Learned
- Core concepts behind Artificial Neural Networks
- The importance of data preprocessing
- How to train, validate, and evaluate deep learning models
- The value of tools like Google Colab for collaboration
This project was more than recognizing numbers — it was about demystifying computer vision and discovering the potential of deep learning. As students, this experience laid a strong foundation for future exploration in AI.
Explore the code: Handwritten Digit Recognition on GitHub . We welcome your feedback and ideas!