"

Main Body

3 Fine-tuning a Pre-trained Model using Local Images

A.V. Ronquillo

Fastai Model Overview

Machine learning (ML) is a branch of AI that enables computers to learn from data. This form of programming analyzes patterns and trends in large datasets, uses its algorithms to make predictions, recognize objects, and generate new ideas based on learned and processed information.[1] For designers, this means that machine learning can streamline specific aspects of their creative endeavours through the automation of routine tasks and creating innovative solutions.[2] With Fastai, users can harness the power of machine learning without requiring deep technical expertise. By utilizing Fastai’s features, machine learning algorithms can be adapted to recognize and differentiate between animal classes.[3] This practice demonstrates how Fastai can be used to tackle real-world problems in image classification.

 

Fastai was created by Jeremy Howard and is available on GitHub as an open source python library under the Apache 2 license, and can be installed using the conda or pip package managers.[4] Fastai is a library for deep learning that is built on PyTorch, allowing Fastai to leverage the strengths of PyTorch’s components while adding additional features.[5]

 

This model consists of a layered architecture that focuses on making deep learning more approachable through user accessibility and abstractions.[6] It is implemented with easy-to-use APIs and abstractions that aim to simplify training processes.[7] Additionally, Fastai was designed to be modular, therefore, it allows users to easily substitute different components as needed to effectively process data and perform fine-tuning tasks.[8] Moreover, Fastai has a built-in function that loads popular convolutional neural network (CNN) architectures.[9] For this example, a learner function was utilized to set up a CNN with a pre-trained ResNet34 model as the backbone, leveraging transfer learning.

 

The ResNet34 model is initially pre-trained on a large dataset like ImageNet.[10] In this case, the model will be fine-tuned on a specific dataset of local images of fox, squirrel, and deer classes. This approach allows the model to use the learned features from the large dataset and adapt them to the new dataset with relatively fewer training epochs. In this manner, Fastai simplifies the process of fine-tuning through the adjustment of learning rates, methods for data loaders, evaluating performance, and making deep learning more accessible.

 

Pre-made Dataset of Local Images

The Wild Winnipeg database consists of various animal classes that can be compiled into different dataset sizes. In particular, the dataset will include the fox, squirrel, and deer species involving 100, 500, 1000, and 5000 images for each class. These varying dataset sizes will be fine-tuned with Fastai and the pre-trained ResNet34 model. The model can be loaded by gathering local images for each species and its given size. More details about gathering the images and establishing directories for the script can be found in the Fastai Dataset Sizes Confusion Matrix Jupyter Notebook (figure 3.1).

Image of a Jupyter Notebook code snippet showing the integration of local images for the classification model. The code is displayed in a cell-based format, with various modules and functions such as importing libraries, loading datasets, and defining model architectures. The image appears to be a screenshot of a Google CoLab interface, with menus and tabs visible at the top and a code editor window below.
Figure 3.1: Example of a module and snippets of code that focus on integrating images from the local folder to prepare the files for the classification model. This image and notebook can be found in the Fastai Dataset Sizes Confusion Matrix Jupyter Notebook created by Zhenggang Li & A.V. Ronquillo.

Fine-tune the Model for Image Classification

The fine-tuning script includes various features that demonstrate Fastai’s user-friendliness, streamlining the task of model training to a simplified deep learning process without compromising the use of advanced techniques. The metric of the model is focused on accuracy and set to as few as 4 epochs, adapting the network’s weights based on the specific datasets of animal images. The local dataset’s directory is structured as a single folder where each dataset size are in subfolders and can be located along with its labelled animal classes (figure 3.2). These images are processed through a single CNN learner which encapsulate complex operations into single, easy-to-use functions. In this way, users do not need to write extra lines of code for tasks like data augmentation, batching, or setting up the learning process. This abstraction allows for a quick setup that trains models without needing deep expertise in neural networks or optimization techniques. More details about the fine-tuning process itself can be found in the Fastai Dataset Sizes Confusion Matrix Jupyter Notebook, such as details about pre-processing, image data loaders, and file handling.

Screenshot of a machine learning tutorial module showing CNN initialization code using ResNet34 architecture. The module demonstrates fine-tuning a pre-trained model on animal image datasets (fox, squirrel, deer) with different sample sizes (100, 500, 1000, 5000). Code snippets show learner setup and results visualization for image classification performance assessment.
Figure 3.2: Example of a module and snippets of code (left) that focus on initializing the CNN learner, fine-tuning the model, and specifying metrics. The directory of the datasets (right) is structured to serve this process within the script. These images and notebook can be found in the Fastai Dataset Sizes Confusion Matrix Jupyter Notebook created by Zhenggang Li & A.V. Ronquillo.

Confusion Matrices for each Dataset Size

To display the model’s classification results and overall performance, a confusion matrix is plotted for each dataset size. A confusion matrix showcases the number of images a model predicts for a certain class and is plotted against its actual class, creating a grid of cells based on the model’s computational accuracy of classification. This approach focuses on using a ground truth as a foundational comparison to the model’s predictions, including True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN). Therefore, matrices excel at providing insights into specific classification accuracies and errors. Under Fastai’s features and a pre-trained ResNet34 architecture, the objective is to create a matrix for each dataset size of 100, 500, 1000, and 5000 images. More details about how these visuals can be plotted to interpret model results are in the Fastai Dataset Sizes Confusion Matrix Jupyter Notebook (figure 3.3).

Screenshot of machine learning tutorial showing two modules: confusion matrix plotting using fastai's ClassificationInterpretation to visualize model performance with labeled datasets, and dataset processing module that accesses Google Drive files and sets parameters for processing animal image datasets (fox, squirrel, deer) with 100-sample size configuration.
Figure 3.3: Example of a module and snippets of code that focus on interpreting model results into a confusion matrix plot for a given dataset size. This image and notebook can be found in the Fastai Dataset Sizes Confusion Matrix Jupyter Notebook created by Zhenggang Li & A.V. Ronquillo.

Model training also includes data metrics that represent the model’s performance in the task of image classification. These metrics are associated with terms such as accuracy, training and validation notes, as well as epochs. The definition of these metrics can be further understood in the glossary. Fastai’s visualization tools display the model’s training progress by showing validation losses and curves, as well as learning rate schedules for every epoch. At a small dataset size of 100 images, the model was already able to predict the fox, squirrel, and deer classes with a high level of accuracy. However, at a small dataset size, the accuracy starts quite low for the first epochs; it is not until the fourth and final epoch that the model reaches 94.8% in accuracy. As the model processes larger dataset sizes of 500, 1000, and 5000 images, the accuracy consistently stays at a high value for each epoch (figure 3.4).

This image displays four training charts comparing model performance across dataset sizes (100, 500, 1000, 5000 samples) over 4 epochs. Each tracks accuracy, training loss, and validation loss. The 100-sample dataset shows dramatic improvement but overfitting, while larger datasets demonstrate more stable performance and better generalization, with the 5000-sample dataset showing optimal stability.
Figure 3.4: Metrics of the model’s performance for each epoch, illustrating training and validation losses as well as accuracies over time. These graphs were created by A.V. Ronquillo through the use of a Python script that was written with the assistance of ChatGPT.

The average accuracy of the first epoch of these larger datasets is 89.4%. This, in comparison to the 41.5% accuracy of the 100 images’ first epoch, shows that the classification accuracy of Fastai and ResNet34 is more instantaneously and consistently accurate when it is dealing with larger dataset sizes. The overall accuracies of the larger datasets after 4 epochs are 98.2% for 500 images, 96.7% for 1000 images, and 97.8% for 5000 images. This significant decrease in accuracy of 98.2% to 96.7% from the 500 dataset to the 1000 dataset can be due to several factors. One reason to consider is the increase in dataset size. Even though all the images are of good quality and properly labelled, this increase in size may introduce more image variability in the dataset’s collection, which can affect model performance. Another reason to consider is that a larger dataset size can consist of a higher imbalance between specific classes, meaning there can be a varying distribution of images across different classes that constitute an imbalanced weighting of species, such as oversampling or under-sampling.[11] These results can be further visualized through the confusion matrices of each set. For instance, the 100 Dataset matrix shows that the model performs very well on deer and squirrel species with high true positives and relatively low false negatives (figure 3.5). The fox class may achieve good precision, but it still has the lowest performance due to its low number of true positives. These observations indicate that the model’s recognition of foxes could improve by augmenting the dataset of fox images.

This image shows four confusion matrices comparing classification performance across dataset sizes (100, 500, 1000, 5000 samples) for three classes: deer, fox, and squirrel. Darker blue indicates higher values. Performance improves with larger datasets, showing better diagonal accuracy and fewer misclassifications. The 5000-sample dataset demonstrates the highest classification accuracy across all classes.

Figure 3.5: Confusion matrices for each dataset size of 100, 500, 1000, and 5000 images. These images were generated by the script in the Fastai Dataset Sizes Confusion Matrix Jupyter Notebook created by Zhenggang Li & A.V. Ronquillo.

 

 

Notes


  1. Abdallah Abbas, Khairil Imran, and Choo-Yee Ting, “User Experience Design Using Machine Learning: A Systematic Review,” IEEE Access 10 (January 2022): 1-1, https://doi.org/10.1109/ACCESS.2022.3173289.
  2. Ibid.
  3. Pascal Schröder, “Using Fastai for Image Classification,” Towards Data Science, June 14, 2019, https://towardsdatascience.com/using-fastai-for-image-classification-54d2b39511ce.
  4. Jeremy Howard and Sylvain Gugger, “Fastai: A Layered API for Deep Learning,” Fastai Blog, August 13, 2021, https://www.fast.ai/posts/2020-02-13-fastai-A-Layered-API-for-Deep-Learning.html.
  5. Fastai, “Fastai Documentation,” accessed July 13, 2024, https://docs.fast.ai/.
  6. Howard and Gugger, "Fastai: A Layered API for Deep Learning."
  7. Fastai, *fastai*, GitHub repository, last modified July 14, 2024, accessed June 28, 2024, https://github.com/fastai/fastai.
  8. Ibid.
  9. Ibid.
  10. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, "Deep Residual Learning for Image Recognition," *arXiv*, December 10, 2015, https://arxiv.org/abs/1512.03385.
  11. Sara Beery et al., "Synthetic Examples Improve Generalization for Rare Classes," *arXiv:1904.05916v2 [cs.CV]*, May 14, 2019, https://arxiv.org/abs/1904.05916.
definition

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Teaching with Images Copyright © 2025 by Mark Meagher, Kamni Gill, A.V. Ronquillo, Ryleigh Bruce, Mitchell Constable, Matthew Glowacki, Zhenggang Li, and Owen Swendrowski-Yerex is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book