Appendix

Mark Meagher; Kamni Gill; A.V. Ronquillo; Ryleigh Bruce; Mitchell Constable; Matthew Glowacki; Zhenggang Li; Owen Swendrowski-Yerex

Appendix

Glossary

3D Viewport: The main workspace in Blender where users can view and manipulate 3D models, often configured as a dual-view setup to efficiently adjust models and ensure they align with the camera’s vision.

Action Editor: A component of Blender’s Dope Sheet where users can insert keyframes and specify values for properties such as position, rotation, and scale of bones at specific frames within the timeline.

Alpha settings: Image processing parameters in Stable Diffusion that control the opacity and transparency of generated objects, particularly useful for adjusting the transparency of solid bounding boxes that may appear around synthetic animals.

Armature: A type of Blender object used to create a skeleton consisting of a hierarchical structure of ‘bones’ that provides a framework for deforming a model’s mesh and creating movements.

Batch: A parameter determining the number of images used simultaneously in each epoch during model training. It can be adjusted by the user and impacts the speed of the training process. The minimum batch size for fine-tuning the YOLOv8 model is 8.

Blending: The process of seamlessly combining synthetic animals with real background images to create realistic composite images, often involving adjustments to transparency, brightness, vibrancy, contrast, and motion blur.

Bounding box: Bounding boxes are a method of object localization provided by the YOLOv8 model. They are intended to tightly frame the identified object, and it is best practice to minimize the amount of excess space between the subject and the bounding box. When creating bounding boxes by hand to obtain ground truth for training models, the coordinates of the bounding box will typically be generated in a .json file containing the coordinates of the top-left and bottom-right corners. When the bounding boxes are annotated by a trained model, confidence scores are often labeled with the box in decimal format.

Close encounters: Two or more mammals observed by a camera at the same location on the same day within ten minutes of each other. Any observations meeting these criteria but that occur within one minute of each other are classified as a ‘sequence,’ which counts as a single observation of the identified mammal.

CNN (Convolutional Neural Network): A type of neural network architecture particularly effective for image processing tasks. CNNs use convolutional layers to detect features in images and are commonly used in models like ResNet34 and as part of the Stable Diffusion architecture.

Confusion matrix: A visualization tool that showcases the number of images a model predicts for a certain class plotted against its actual class, creating a grid of cells based on the model’s computational accuracy of classification. It displays True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) to provide insights into specific classification accuracies and errors.

Corridors: Pathways or zones that facilitate movement and connectivity for both wildlife and humans in urban environments, such as multi-use paths along levees or riparian areas that allow species movement while accommodating human activity.

Cycles: A high-quality, photorealistic rendering engine within Blender that is GPU-supported and used for generating synthetic images with realistic lighting and materials.

Data augmentation: A built-in feature of YOLOv8 model training that involves manipulating aspects of the training images to increase model accuracy and generalizability. This is done by modifying image saturation, hue, orientation, or noise levels, amongst other characteristics.

Denoising autoencoder: A neural network component used in Stable Diffusion that is trained to reconstruct original images from diffused (noisy) versions, enabling the model to generate clear images from random noise through iterative refinement.

Diffusion: The generative process used in Stable Diffusion that begins with an unclear and noisy image and incrementally refines it to become more like the desired image result through a series of denoising steps.

Dope Sheet: An animation editor in Blender that provides a timeline interface for managing keyframes and animation sequences across multiple objects and properties.

EleutherAI: One of the collaborating organizations involved in the development of Stable Diffusion, along with LAION, RunwayML, and LMU Munich.

Epochs: A parameter that determines the number of complete passes through the entire training dataset the model will undergo during training. The number of epochs significantly influences the number of computational resources and time required for training. More epochs can lead to better model performance but may also increase the risk of overfitting.

False Negatives (FN): Instances where the model fails to identify a positive case that actually exists in the ground truth data.

False Positives (FP): Instances where the model incorrectly predicts a positive case when none exists in the ground truth data.

Fastai: An open-source Python library built on PyTorch that focuses on making deep learning more approachable through user accessibility and abstractions. It provides easy-to-use APIs for training models and includes built-in functions for loading popular CNN architectures.

Fine-tuning: The process of training an existing pre-trained model on a smaller, more specialized dataset to improve its performance on specific tasks, rather than training a model entirely from scratch.

Forest profile: The cross-sectional view or vertical structure of a forest ecosystem that shows the different layers of vegetation from ground level to canopy, used in wildlife habitat analysis to understand site characteristics and environmental conditions.

GANs (Generative Adversarial Networks): A machine learning architecture consisting of two neural networks competing against each other in a game-theoretic framework. The generator network creates synthetic data (such as images) from random noise, while the discriminator network attempts to distinguish between real and generated data. Through this adversarial training process, the generator becomes increasingly skilled at producing realistic synthetic content that can fool the discriminator. GANs are widely used for generating images, videos, and other data types, though they typically do not have built-in text-to-image functions like more recent models such as Stable Diffusion.

Ground truth: The information contained within an image, such as object class and location, rather than the model’s predictions. It is crucial to include this information during model training to allow the model to compare predictions with information truly contained within the image, and to adjust its predictions accordingly.

HDR (High Dynamic Range): Image files that contain a wider range of luminosity than standard images, used in Blender to provide realistic environmental lighting and backgrounds for 3D scenes.

Image classification: The simplest image analysis technique offered by YOLOv8, image classification involves identifying what class the objects within a given image belong to and provides the class name and confidence score. This technique does not identify the location of the predicted objects within a given image.

ImageEnhance settings: Tools used in Stable Diffusion for adjusting various aspects of image quality such as opacity and transparency to improve the realism of synthetic objects.

ImageNet: A large dataset commonly used for pre-training models like ResNet34, containing millions of labeled images across thousands of categories.

Inference: Refers to the process of applying a pre-trained or fine-tuned model to analyze images and return predictions, whether it be in the form of bounding boxes, segmentation, or other analysis techniques.

Inpainting: A technique for filling in missing parts of images or editing existing images while maintaining visual consistency. Stable Diffusion includes an InPaint model for this purpose.

Interpolation: The automatic process by which Blender calculates intermediate frames between keyframes to generate smooth transitions and fluid motions in animations.

Keyframes: Markers in animation that define the start and end points of an animation sequence, specifying the values of properties like position, rotation, and scale at specific points in time.

LAION: One of the collaborating organizations involved in the development of Stable Diffusion, contributing to the creation of this open-source generative model.

LMU Munich: Ludwig Maximilian University of Munich, one of the academic institutions that collaborated in the development of Stable Diffusion.

Machine learning (ML): A branch of AI that enables computers to learn from data by analyzing patterns and trends in large datasets, using algorithms to make predictions, recognize objects, and generate new ideas based on learned information.

Majority class: In imbalanced datasets, the classes that have significantly more training examples compared to other classes, which can lead to biased model performance favoring these well-represented categories.

mAP50: A metric that determines how often the model’s predicted bounding box or semantic map overlaps with the ground truth.

mAP50-95: A metric that determines how accurately the model’s predicted bounding box or semantic map overlaps with the ground truth.

Masks: High-contrast, black and white silhouettes used in the Stable Diffusion InPaint model to define areas where synthetic animals should be generated, intended to eliminate unwanted bounding boxes and improve blending.

Mesh: The surface geometry of a 3D model in Blender, consisting of vertices, edges, and faces that define the shape and can be deformed by armatures during animation.

Minority class: In machine learning datasets, classes that have fewer training examples compared to other classes, often requiring data augmentation techniques like synthetic image generation to improve model performance.

Model training: The process of teaching a machine learning algorithm to make accurate predictions by feeding it labeled data examples, allowing the model to learn patterns and relationships that enable it to classify or predict outcomes on new, unseen data. This involves adjusting the model’s internal parameters through iterative exposure to training data until the model achieves satisfactory performance.

Multi-scale architecture: The architectural approach used in Stable Diffusion that employs convolutional neural networks (CNNs) at different scales to capture complex dependencies across different scales and components of images.

Object Data Properties panel: A Blender interface panel that provides settings and controls for the specific data associated with the selected object, such as light properties (intensity, blend, bounces, radius), mesh settings, camera parameters, or other object-specific attributes depending on the type of object selected.

Object detection: A computer vision task that involves identifying and locating objects within images, typically using bounding boxes to indicate object positions.

Output Properties panel: A Blender interface panel where users configure render settings including resolution, frame range, file format, and color mode for generating final images or animations.

Overfitting: Occurs when the model is trained with too many epochs on a dataset, resulting in inaccurately high performance metrics and inhibiting the model’s ability to generalize on other image sets. This can be avoided by ceasing model training when performance metrics plateau.

Oversampling: A technique used to address imbalanced datasets by increasing the number of examples in minority classes, often through methods like synthetic image generation, to create more balanced representation across all classes during model training.

Pose estimation: Pose estimation is an image analysis technique offered by the YOLOv8 model. It identifies human postures within an image by identifying key points with coloured nodes and connecting them to create a wire frame figure. Although the pre-trained model only offers this technique to analyze human postures, several open-source datasets exist to fine-tune the model to approximate various animal postures.

Precision: The percentage of true positives within all positive predictions made by the model. This metric indicates model accuracy in predicting specific classes.

Python: A programming language used throughout the documents for creating scripts that utilize machine learning models, automate image processing tasks, generate prompts for Stable Diffusion, and implement various data analysis and visualization functions in the context of wildlife image classification research.

PyTorch: The underlying framework that Fastai is built upon, providing the foundational components for deep learning while Fastai adds additional user-friendly features and abstractions.

Recall: Determines the percentage of true positives that are correctly identified by the model out of all actual positive instances. It measures the model’s ability to detect positive instances when they are present. This is different from precision, which measures how accurate the model’s positive predictions are.

ResNet34: A 34-layer residual neural network architecture that is commonly used as a pre-trained backbone for image classification tasks, particularly effective when fine-tuned on specific datasets.

Rigging: The comprehensive setup process in 3D animation that includes creating armatures and other functional elements to make models animatable, incorporating control handles and constraints for intuitive manipulation.

Riparian forest: Forest habitat along riverbanks that provides a continuous connective framework for wildlife movement between different urban areas, particularly important along the Red River’s west bank in the study.

RunwayML: One of the organizations that collaborated in the development of Stable Diffusion, contributing to this open-source generative model.

Segmentation: Segmentation is a method of object identification provided by the YOLOv8 model. It involves analyzing every pixel within a given image and predicting which pixels belong to a known class. These pixels are then identified with a semi-opaque coloured mask, and this model frames the mask with a bounding box that states the predicted class and confidence score in decimal format. Segmentation mask labels follow a similar format to bounding box labels, however, they contain the coordinates of each pixel that forms the edge of the mask.

Shader Editor: A Blender interface where users can import and configure HDR images as the ‘World’ environment setting, which influences scene background and lighting.

StabilityAI: The company founded by Emad Mostaque that created Stable Diffusion, making it publicly available as an open-source model in August 2022.

Stable Diffusion: A generative model created by StabilityAI that uses diffusion processes to generate images from text prompts. It is an open-source model that can generate high-quality, diverse, and realistic images across different categories.

Text-to-image: A capability of Stable Diffusion that allows the model to generate images based on textual descriptions or prompts, distinguishing it from traditional GANs that typically lack this direct text-to-image functionality.

Transfer learning: The technique of using a model pre-trained on a large dataset (like ImageNet) and adapting it to a new, more specific task, leveraging learned features from the original training.

True Negatives (TN): Instances where the model correctly identifies the absence of a positive case in the ground truth data.

True Positives (TP): Instances where the model correctly identifies a positive case that exists in the ground truth data.

Understory: The layer of vegetation beneath the main canopy of a forest, an important habitat component documented at camera trap sites that varies in density from thick vegetation to open areas.

Undersampling: A technique used to address imbalanced datasets by reducing the number of examples in majority classes to create more balanced representation, though this approach risks losing potentially valuable training data.

Urban biodiversity: The variety of plant and animal life present in urban environments, which the Wild Winnipeg Project aimed to document and heighten awareness of through camera trap monitoring and public engagement.

Validation loss: A metric that measures how well a model performs on data it hasn’t seen during training, used to monitor model performance and detect overfitting during the training process.

Vibrancy: An image quality parameter that adjusts the intensity and saturation of colors in synthetic objects to better match the color characteristics of the background image.

Visual hybridity: The seamless integration of synthetic and real elements in a composite image, where the boundary between artificial and authentic components is not easily distinguishable.

Weights: Parameters within the model that are adjusted during the fine-tuning process. Fine-tuning involves modifying pre-existing weights. In contrast, training a custom model requires initializing and adjusting new weights.

Woodland remnants: Small patches of forest that remain in suburban areas, often isolated and block-sized or smaller, which serve as important habitat for urban wildlife despite their proximity to human development.

World Setting: A Blender configuration that defines the environmental background and lighting conditions of a 3D scene, typically using HDR images to create realistic environmental contexts.

YAML file: A configuration file that serves as a legend for machine learning models, telling the model where to locate training and validation files, the total number of classes, and class names in string format.

YOLO (You Only Look Once): An object detection model that performs detection by processing an entire image simultaneously, rather than using traditional two-stage detection methods, making it faster and more efficient for real-time applications.

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Teaching with Images Copyright © 2025 by Mark Meagher, Kamni Gill, A.V. Ronquillo, Ryleigh Bruce, Mitchell Constable, Matthew Glowacki, Zhenggang Li, and Owen Swendrowski-Yerex is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Glossary

License

Share This Book