Making Sense of Large Image Datasets

Ryleigh Bruce; Mitchell Constable; Zhenggang Li

Main Body

2 Making Sense of Large Image Datasets

Ryleigh Bruce; Mitchell Constable; and Zhenggang Li

This section provides a brief overview of methods for exploring and automating the visualization and organization of large image datasets.

One challenge in making sense of large collections of images is understanding which questions to ask. It can be helpful to get an overview of the dataset, visualizing its contents to understand the overall distribution of images. It can also be helpful to filter the dataset, visualizing subsets of the entire dataset to get an understanding of variation and local characteristics. In the case of the animals dataset, images can be filtered by time of day, day and night, time of year, location, and species.

Filtering and search

Filtering images

If image filenames contain information about the image contents, it becomes easy to sort and filter the images using these details. Image filenames in the Wild Winnipeg dataset are organized in the format ‘xxx’ and contain information about the date the image was captured, the camera number, and animal species present in the image. The information in the filename can be used to sort the images or filter them into subsets based on the information provided. This can be accomplished with Python scripts, as described in the Image Filter Jupyter Notebook, and filters can also be applied using the search function in the MacOS, Linux or Windows file browser.

Manual Review of Random Selections

When analyzing large image sets, it can be helpful to review a sample of images selected randomly, as a method for understanding the content and variation contained within the dataset (figure 2.1). This process can be automated using a Python script that randomly selects and moves the desired number of images and displays them. The scripts in the Random Manual Review Jupyter Notebook were used by the Wild Winnipeg team to create data subsets to test YOLO model training.

A grid of fifteen trail camera images from June 2020 showing various wildlife captures. The collection includes daytime color photos of forest scenes with deer, and nighttime infrared images revealing animals like deer and what appears to be a raccoon. Motion blur is present in several shots, typical of wildlife photography. — **Figure 2.1:** Visual output of the Random Manual Review Jupyter Notebook. A sample of the selected images are displayed along with their file names to confirm script functionality.

Selecting Random Interactive Images for Review

This workflow selects random images and displays them in the terminal space to be reviewed, as an alternative to manually selecting images to examine in closer detail (figure 2.2). The Python script presented in the Random Manual Review Jupyter Notebook displays the randomly selected image and allows the user to interact with it by zooming in on specific regions. This feature is particularly useful when attempting to analyze finer details of an image set or when reviewing images analyzed by a machine learning model for accuracy.

A motion-blurred photograph captures a deer mid-movement in a lush forest setting. The animal appears as a dark silhouette with light-colored patches, possibly the white tail or rump markings characteristic of white-tailed deer. Dense green vegetation and trees create a natural woodland backdrop in this trail camera image. — **Figure 2.2:** Visual output of the script found in the Random Interactive Image Jupyter Notebook. The user can zoom in on key aspects of the image by selecting an image region with the mouse. This image has been zoomed in to examine the deer.

Visualizing Image Information

This section provides a brief overview of the process of using AI-generated Python code to extract and visualize data from large sets of images.

Creating an Image Grid

One quick method for finding patterns in an image dataset is to create an image grid, a collection of miniature images that facilitates the visual identification of patterns across the dataset. For example, in figure 2.3, there are visible patterns in the image grid related to the weather (intensity of direct sunlight), vegetation on the site, as well as the day versus night. A grid with thousands of small images can be used to analyze general patterns that do not require a detailed view of each individual image. Alternately, a series of grids with fewer, larger images can be used to inspect for more subtle visual patterns that require a detailed view of each image. Both types of image grid can be generated using the script in the 100 Image-grid Jupyter Notebook.

A 10x10 grid of 100 images, displaying a subset of images from a larger dataset. The grid allows for visual inspection of patterns and trends within the image set, with each image representing a unique data point. The arrangement of images in the grid facilitates comparison and identification of shared visual characteristics, such as vegetation patterns and animal sightings across the dataset. — **Figure 2.3:** Example of the kind of grid that can be created using the script found within the 100 Image-grid Jupyter Notebook.

Generating Visualizations from Extracted Data

Image datasets are often accompanied by a file containing associated metadata, often including details regarding image date, camera location, and any other relevant details. The Wild Winnipeg project dataset includes an Excel file that for each image logs the information associated with the image. Categories of data captured by the camera at the time of capturing an image include date, time, temperature, moon phase, image quality, and ambient light. Additional categories of information that were added manually for each image include the animal species present in the image, species count, the activity the animals are engaged in, and animal gender. This detailed file allows the user to generate a series of charts and graphs visualizing the relationships between each category.

A variety of visualization techniques can be generated using Python scripts (figure 2.4). Tutorials on generating bar graphs, line graphs, scatter plots, tree maps, and other custom visualizations can be found in the Tree Map (figure 2.5), Sightings Bar Graph, and Matplotlib Visualization Jupyter Notebooks.

A bar graph visualization of animal sightings by moon phase, generated from metadata associated with the image dataset. The graph displays the frequency of animal sightings across different moon phases, with each bar representing a specific moon phase and the height of the bar indicating the number of sightings. The visualization allows for the identification of patterns and trends in animal activity in relation to lunar cycles. — **Figure 2.4:** An example of a visualization that can be created using data extracted from a metadata file.

A treemap visualization of animal sightings by season and species, generated from metadata associated with the image dataset. The visualization consists of a series of nested rectangles, with each rectangle representing a specific animal species and season. The size of each rectangle is proportional to the number of sightings for that species during that season, allowing for comparison of relative activity levels across different species and times of year. — **Figure 2.5:** An example of a more complex visualization that can be generated using image metadata. This is a series of tree maps visualizing the volume of sightings across seasons for each animal species, with the relative size of each individual tree map demonstrating the comparative volume of overall sightings across the seasons.

Close Encounters Data

One opportunity provided by the comprehensive metadata from the Wild Winnipeg project is the chance to examine relationships between different animals, as well as relationships between suburban wildlife and humans. The specific information analyzed by the team is close encounter data. A close encounter is defined here as two or more mammals observed by a camera at the same location on the same day within ten minutes of each other. Any observations meeting these criteria that occur within one minute of each other are classified as a ‘sequence,’ which counts as a single observation of the identified mammal. This information has been analyzed in the form of network visualization graphs (figure 2.6).

These graphs represent each class of animal as a node, connected by a series of edges whose line type and colour represent the volume of close encounters along with the location of the close encounter, respectively.

The intent of analyzing this data is to reflect on the significance of wildlife in an urban setting and how ephemeral wildlife sightings shape urban experiences. Additionally, it is critical to understand how these encounters shape the activity of mammals that live within urban woodlands to preserve these spaces and promote awareness of how human spaces impact the existence of other living things. Information on how to generate a network graph using Python code can be found in the Close Encounters Jupyter Notebook.

A network graph visualization of close encounters between different animal species in urban woodlands. Each animal species is represented by a node (in the form of a silhouette), and the lines connecting the nodes (edges) represent the volume of close encounters between species at a specific location. The line type and color of the edges indicate the frequency and location of these encounters, respectively. The graph illustrates the complex relationships and interactions between different species in an urban environment, highlighting the importance of preserving urban woodlands and promoting awareness of human impact on wildlife. — **Figure 2.6:** This network graph visualizes close encounters that occur between species across locations. Each animal silhouette is a node, and the lines connecting the nodes represent the volume of close encounters that occurred between species at a specific location.

Fine-tuning the YOLOv8 Object Localization Model

This section provides a brief overview of the process of fine-tuning the classification model on a specialized dataset to increase its performance on identifying objects within an image set via bounding box annotation.

Design Applications

The YOLOv8 model is particularly useful in the design realm due to its accessibility to designers with little to no prior coding experience. The pre-trained model has excellent general knowledge that can be used to extract information from large image sets. It is simple to train with powerful results and offers a variety of image analysis techniques.

A fine-tuned YOLOv8 model has exciting potential for applications throughout the design process. It is particularly useful for extracting key information efficiently and accurately from large image sets collected during stages of design, such as site analysis. This information can be used to analyze phenomena relevant to design that are otherwise difficult to investigate, such as how urban activity influences urban wildlife or how infrastructure shapes day-to-day activity and experiences at the human scale.

Model Overview

The YOLO (You Only Look Once) object detection model was originally developed by Joseph Redmon and Ali Farhadi at the University of Washington in 2015. YOLOv8 is the latest version of the model at the creation of this document and provides features such as object detection, segmentation, pose estimation, object tracking, and image classification.^[1] The YOLO model has been trained on vast image datasets, such as the COCO (Common Objects in Context) dataset, which comprises over 330 000 images and 80 object classes.^[2]

The YOLO model is innovative due to its single-shot detection approach that performs object detection by processing an entire image simultaneously, rather than using traditional two-stage detection methods that employ region proposal networks and sliding window techniques for object localization. This unified architecture in the YOLO model drastically reduces the required inference time and computational resources for performing real-time object detection tasks. Additionally, the YOLOv8 model employs data augmentation techniques such as geometric transformations (rotation, horizontal flipping) and photometric distortions to enhance model generalization and mitigate overfitting during training.^[3]

What is Fine-tuning?

When training the YOLO model to increase its performance at a specific task, one can either train it entirely from scratch or modify an existing model. The latter is done by training the existing model on a smaller, more specialized dataset. Training a model entirely from scratch, particularly a YOLO model, requires a large dataset and extensive computational resources, which is not always practical. Fine-tuning a model involves training a model with existing weights on a much smaller dataset, requiring less time and computational resources.

To fine-tune a YOLO model on a custom dataset (figure 2.7), it is critical to ensure that the image and label files are properly formatted. When fine-tuning a model on an object localization dataset, the training and validation images should be visually clear of bounding boxes, but the label files (.json files) should include the class of the identified object along with the bounding box coordinates. Having this information provides a ground truth for the model to learn from, rather than allowing it to simply guess where the subject is within the image.

Overall, fine-tuning the YOLO model is an excellent way to take a model trained on very general data and increase its ability to perform a task within a specific context. In our project, that context is the study of large mammals in the suburban woodlands of Winnipeg.

Downloading a Pre-made Dataset: Huggingface, Kaggle, and FiftyOne Data Zoo

Image of a Jupyter Notebook module, specifically the Data Prep Jupyter Notebook, showing a code block with accompanying explanations and comments. The code block appears to be related to downloading a pre-made dataset from a platform such as Hugging Face, Kaggle, or FiftyOne Data Zoo. The module is designed to provide a step-by-step guide on how to download and prepare a dataset for model fine-tuning, with explanations of the script's format, function, and customizability. The image shows a coding environment with syntax highlighting, comments, and markdown formatting, indicating a tutorial or instructional content. — **Figure 2.7:** An example of a module found within the Data-Prep Jupyter Notebook. Each code block is accompanied by an in-depth explanation of the script’s format, function, and customizability.

If one does not have access to enough data to create an entirely custom dataset to fine-tune a model, there are several platforms that host a wealth of high-quality open-source datasets. Hugging Face hosts models and data sets specifically targeted for AI training. Kaggle hosts models, data sets, competitions, courses, and discussion platforms targeted towards AI training and machine learning.

The FiftyOne Data Zoo is a particularly useful platform as it allows you to download partial datasets directly within your coding environment. Further instructions on how to do this can be found in the Data-Prep Jupyter Notebook. This notebook covers steps such as downloading an open-source dataset, splitting a dataset, and creating a YAML file. The latter two topics will be explored in greater detail later in this guide.

Creating a Custom Dataset

Curating the Dataset

When fine-tuning an object localization model, it is critical to have a varied dataset to increase the model’s generalizability (figure 2.8). The YOLOv8 model has built-in data augmentation features that are especially useful for varying the dataset during training. These data augmentation techniques include rotating images, mirroring them, increasing visual noise, and modifying image hue and saturation. Incorporating these features into the fine-tuning script also aids in increasing the model’s prediction accuracy.

In addition to having a varied dataset, it is important to have a dataset that represents each of its classes relatively equally (figure 2.9). Training a model on an unbalanced dataset increases the likelihood of producing an inherently biased model. One technique for addressing an unbalanced dataset is to combine your custom dataset with portions of a pre-existing dataset containing a higher volume of the same or similar classes. The FiftyOne Data Zoo’s ability to download specific classes and portions of open-source datasets is particularly useful for this purpose.

Image of a deer in a natural environment, with the animal partially visible on the left side of the frame. The image is well-lit, with a variety of elements such as lighting conditions, locations, seasons, species, animal postures, and levels of obstruction, making it a good example of a diverse dataset for training an object localization model like YOLOv8. The image is part of the Wild Winnipeg custom dataset, which contains a wide range of images that can help increase the model's generalizability and prediction accuracy. The presence of diverse elements in the image, such as varying lighting conditions and animal postures, demonstrates the importance of having a varied dataset when fine-tuning an object localization model. — **Figure 2.8:** It is important to have a diverse dataset when training the YOLO model. The *Wild Winnipeg* custom dataset contains images across lighting conditions, locations, seasons, species, animal postures, and levels of obstruction. This image is well-lit, containing a partial view of a deer on the left side.

Image of several deer standing in snowy conditions at night, captured in a dark and wintry environment. The deer are visible in the center of the frame, with the snowy surroundings and lack of lighting. This image is part of the Wild Winnipeg dataset and demonstrates the diversity of lighting conditions, seasons, and environments represented in the dataset, which is essential for training an object localization model like YOLOv8. — **Figure 2.9:** This image from the Wild Winnipeg dataset shows several deer at night in snowy conditions, standing in stark contrast to the visual in Figure 8.

Creating Label Files

Each label file should follow the same naming convention as its image counterpart, with the addition of the .txt suffix. This means that an image called Camera1_2021-05-21_Raccoon_123 would have a corresponding label file named Camera1_2021-05-21_Raccoon_123.txt (or the suffix .json, depending on the file format). If this naming convention is not followed, the model will be unable to recognize the corresponding label files during the fine-tuning process.

Label files for fine-tuning the YOLO model for the object localization task are typically in .txt or .json format. Each line within the file represents a single bounding box indicating the location of an object, and multiple object predictions within a single image will result in multiple lines of text within the label file.

Each label file contains the class number of the predicted object, followed by the coordinates of the corresponding bounding box. For example, within the custom Wild Winnipeg dataset, the raccoon is the seventh class. Thus, a label file for a raccoon prediction may contain: 7 0.493850385 0.404802841 0.485723947 0.954762948. Note that YOLO training label files will only contain positive values.

When creating a custom dataset, there are several software options available for label file generation, all of which require manual bounding box labelling. The program used for the creation of the Wild Winnipeg custom dataset was Labelme, which can be installed through the Windows Command Prompt.

Splitting the Dataset

When fine-tuning an object localization model, the dataset must be split into training and validation folders (figure 2.10). This can be done manually or, much more efficiently, with a Python script such as the one found in the Data-Prep Jupyter Notebook. The most common training/validation split is 0.2, meaning 80% of the dataset is used for training and the remaining 20% for validation.

When splitting the dataset, a specific directory organization must be followed. The structure for the object localization model training is visualized in figure 3.10, but it consists of separate training and validation folders within a main directory, with each of these subfolders containing separate directories for image files and their accompanying label files.

When using an automated script to split the dataset, it is critical to ensure the script is modified appropriately to handle different file types and directory structure requirements.

Image of a directory structure example for a YOLO training dataset, showing the required organization for training and validation folders. The directory structure consists of a main directory containing separate subfolders for training and validation, each with their own subdirectories for image files and label files. This example illustrates the specific organization required for object localization model training, with the training folder containing 80% of the dataset and the validation folder containing 20%. The image is a screenshot from the Data Prep Jupyter Notebook, which provides a Python script for efficiently splitting the dataset into training and validation folders. — **Figure 2.10:** This is an example of the required directory structure for a YOLO training dataset. This example was taken from the Data-Prep Jupyter Notebook.

The YAML File

The YAML file is the legend provided to the YOLO model to understand the information within the label files (figure 2.11). It tells the model where to locate the training and validation files, the total number of classes, and the class names in string format.

When listing the classes in the YAML file, the first class is class 0 unless otherwise specified. This means that a YAML file with the classes listed in the following format [‘rabbit’, ‘deer’, ‘fox’, ‘dog’] contains four classes, with ‘rabbit’ being class 0.

However, if each class is explicitly assigned a value starting at one in the following format:

Order:
1: rabbit
2: deer
3: fox
4: dog

Then there are four classes, with ‘rabbit’ being class 1.

Image of a YAML file containing configuration information for the YOLO model, specifically for the Wild Winnipeg dataset. The file lists the classes to be detected, including their names and corresponding class indices, as well as the file paths for the training and validation datasets. The YAML file is a crucial component in training the YOLO model, as it provides the necessary information for the model to understand the label files and perform object localization tasks accurately. The image shows the classes listed in a specific format, with each class assigned a unique index, starting from 0 or 1, depending on the configuration. — **Figure 2.11:** This YAML file contains classes from the Wild Winnipeg dataset, intended to train the YOLO model to increase its accuracy when performing object localization tasks.

Fine-tuning YOLOv8 for Object Localization Tasks

Fine-tuning the YOLOv8 model for increased performance at the object localization task requires extensive time and computational resources. A training script for 100 epochs with the minimum batch size took twelve hours alone on a remote desktop. The script used for fine-tuning our model can be found in the YOLO Fine-tuning Jupyter Notebook.

Once the fine-tuning script has successfully run, a weights file, or .pt file, is generated. This file can then be used to implement the trained model. The script will also produce performance metrics for each class it has been trained on, allowing the user to analyze its current performance and adjust the number of epochs and the batch size accordingly.

Notes

Ultralytics. "Ultralytics YOLO Docs." Accessed July 24, 2024. https://docs.ultralytics.com/. ↵
COCO. "COCO - Common Objects in Context." Accessed July 24, 2024. https://cocodataset.org/#home. ↵
Torres, Jane. "YOLOv8 Architecture: A Deep Dive into its Architecture." YOLOv8. Accessed July 24, 2024, https://yolov8.org/yolov8-architecture/. ↵

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Teaching with Images Copyright © 2025 by Mark Meagher, Kamni Gill, A.V. Ronquillo, Ryleigh Bruce, Mitchell Constable, Matthew Glowacki, Zhenggang Li, and Owen Swendrowski-Yerex is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.