SizeExtractR: A workflow for rapid reproducible extraction of object size metrics from scaled images

Abstract Size is a biological characteristic that drives ecological processes from microscopic to geographic spatial scales, influencing cellular energetics, species fitness, population dynamics, and ecological interactions. Methods to measure size from images (e.g., proxies of body size, leaf area, and cell area) occur along a gradient from manual approaches to fully automated technologies (e.g., machine learning). These methods differ in terms of time investment, expertise required, and data or resource availability. While manual methods can improve accuracy through human recognition, they can be labor intensive, highlighting the need for semi‐automated, and user‐friendly software or workflows to increase the efficiency of manual techniques. Here, we present SizeExtractR, an open‐source workflow that enables faster extraction of size metrics from scaled images (e.g., each image includes a ruler) using semi‐automated protocols. It comprises a set of ImageJ macros to speed up size extraction and annotation, and an R‐package for the quality control of annotations, data collation, calibration, and visualization. SizeExtractR extracts seven common size dimensions, including planar area, min/max diameter, and perimeter. Users can record additional categorical variables relating to their own study, for example species ID, by simply adding alphanumeric annotations to individual objects when prompted. Using a population size structure case study for hard corals as an example, we show how SizeExtractR was used to quantify the impact of mass coral bleaching on coral population dynamics. Lastly, the time saving benefit of using SizeExtractR was quantified during a series of timed image analyses, revealing up to a 49% reduction in image analysis time compared to a fully manual approach. SizeExtractR automatically archives results, allowing re‐analysis of size extraction and promoting quality control and reproducibility. It has already been employed in marine and terrestrial sciences to assess population dynamics and demography, energy investment in eggs, and growth of nursery reared corals, with potential to be applied to a wide range of other research fields.


Introduction
This methodology is designed to be used when: -Multiple objects must be measured within each image. These objects are called Regions of Interest (ROIs). -There are a huge number of images, so fully manual image annotation will be slow.
SizeExtractR automates some labour-intensive parts of this workflow. -Each image is scaled (i.e., includes a calibration of known length) Useful Size Metrics extracted from images using this protocol are: -Area (of irregular or regular-shaped ROIs) o Circular equivalent diameter (derived from the area measurement) o Extruded spherical volume (Derived from circular equivalent diameter) -Maximum/Minimum Feret's Diameter (equivalent to diameters measured with calipers) o Geometric mean diameter (derived from mean of Max/Min Feret) -Perimeter length.
-Additional user-defined categorical variables that relate to individual ROIs (e.g., health status) -Size Frequency Distributions by grouping variable (up to two variables allowed) Useful Additional Metrics that can be easily derived are: -Total imaged area -ROI density per unit area -Population size per area imaged (total number of ROIs)

Background:
This workflow is described using an example of a coral dataset (see Lachs et al. 2021). Scaled images of the seabed were taken during annual field surveys at 4 sites, with 3 transects per site, that have been surveyed since 2010 (Transect within Site within Year). Each image contained up to 40+ individual coral colonies. We were interested in the size of individual coral colonies after a mass bleaching event occurred in 2016 (see example in Fig. 1).
Workflow overview: 1. Prepare a set of scaled images into a consistent nested directory structure (e.g., images within transect folders within site folders within year folders within one overall directory). 2. Decide on a labelling scheme for the ROI name label codes (e.g., c = coral, b = bleached). 3. Using the SizeExtractR ImageJ-macro, analyse all images by outlining the objects of interest (corals), naming them according to the labelling scheme, and measuring calibration lengths. 4. Using the SizeExtractR R-package, first perform some checks to identify any potential human errors that occurred during labelling, amend those errors, then collate all data files into one calibrated database.
SizeExtractR provides simple functions in both ImageJ and R to perform this workflow. Part 1 -One-off Setup of SizeExtractR The following subsections are: A. ImageJ download B. SizeExtractR ImageJ-macro installation C. R Download D. SizeExtractR R-package installation Part 1.A -ImageJ Download Download ImageJ either as the basic programme (https://imagej.nih.gov/ij/download.html), as the Fiji distribution (https://imagej.net/Fiji/Downloads). Funnily enough, Fiji doesn't install like a regular programme. It is an App. So you just: 1) download the zip folder.
2) Extract all contents of this folder to a folder in your "My Documents" (or wherever).
3) You will find the Application file "ImageJ-win64" (e.g for Windows) within the main folder for Fiji. 4) Double click on this to launch ImageJ -no installation needed.
SizeExtractR will also work on the basic ImageJ programme (not Fiji).

Background: Why use Macros? and What are they?
Macros are programmed ImageJ workflows that serve to speed up the image analysis by automating processes. Macros can be initiated by tapping specific keyboard shortcuts (one for each SizeExtractR macro).
Working in ImageJ can be quite slow. Particularly, switching between tools and saving files can become an extremely laborious task. For one image you need to choose: hand tool (to pan in the image), freehand polygon tool (to outline the object of interest and make an ROI), add and rename the ROI (with the user-defined labelling scheme), line tool (to measure calibration lengths), and REPEAT many times. At the end of each image you must then save all relevant measurements in a sensible way.
We don't want to have to think too hard whilst outlining/measuring ROIs. We want to be able to sit back, put on some good music, with nice shortcuts that optimise the process as much as possible. Once you find a good rhythm it is easy! To make the entire process more fluid we have written various ImageJ macros. These tools speed up the image analysis by about a factor of two (see main manuscript), reduce human error considerably, and simplify the workflow and make it more efficient and less error prone. These macros are saved in the Startup Macros script in ImageJ. If you are interested to see this code then you can work on it there. For the advanced user, descriptions of all macro functions can be found in the directory @ https://imagej.nih.gov/ij/developer/macro/functions.html NOTE: Avoid changing the macro scripts. They are very sensitive to any change and may stop working if edited. Feel free to contact me if you want to make any changes @ liamlachs@gmail.com.

Installation Guide:
Open the Startup Macros script:

Plugins > Macros > Startup Macros
The macros contained in this ImageJ script load every time ImageJ is run. If we want to embed our new shortcuts within ImageJ (which speed things up very nicely), they must be appended to this startup macro script. To use SizeExtractR load it using:

library(SizeExtractR)
Part 2 -ImageJ Analysis in ImageJ A large monitor and a mouse are recommended to make outlining objects easier. Note that the first three section give background on SizeExtractR, and the fourth section gives a step-by-step user guide to running the image analysis protocol. Step-by-step Workflow -Macros and Keyboard Shortcuts E) Additional Useful Tools (macros and shortcuts)

Part 2.A -Decide on a Labelling System
In the final calibrated dataset output in R, each row represents a different ROI (i.e., object of interest or calibration length). The different variables (i.e., columns in the dataset) describe these ROIs. The numerical variables that are measures of size (e.g., area, max/min diameter) are produced automatically in ImageJ. However, some additional categorical variables that are user-specific can also be incorporated.
There are two types of categorical variables.

1) Directory Variables:
Variables that are the same for all ROIs within a single image (e.g., transect, site, or year). SizeExtractR R-package derives these variables from the folder names in the directory containing the images, therefore ensuring all folders are named correctly is of utmost importance. We refer to these as Directory Variables.

2) ROI Variables:
Variables that can be different for each ROI within a single image (e.g., ROI Type: calibration length or object of interest, like a coral, or ROI Code: user-defined variable to give extra information about that ROI, like whether the coral is bleached or not). The SizeExtractR Rpackage derives these from the ROI name label assigned during image analysis in ImageJ. We refer to these as ROI variables, and there are two types of these. a. ROI Type -this is either the Calibration Point ("Cali_Pts"), the Calibration Length ("M"), or the object of interest (a user-defined alphabetical code -e.g., "c" for coral). This distinguishes the calibration ROIs from the ROIs of interest (i.e., corals), and makes up the beginning of the ROI name label.
b. ROI Replicate -this is a unique number per ROI that identifies each ROI of a specific ROI Type within a single image. For instance, the name label for the 1st coral outlined in the image would be "c1". The second coral outlined in that image would be "c2" and so on. See below for more details.
c. ROI Label Code-this is any set of user-defined variables (e.g., "b" for bleached, or nothing for not bleached). This is used to record extra information about the ROIs during image analysis and makes up the end of the ROI name label.

Part 2.B -Directory Variables & Organising Images
For full integration with the R script, images must be organised in a logical consistent directory structure. The folders holding different images must be named consistently, as the R-package derives Directory Variables from the folder names of the directory.
For instance, annual photo-quadrat surveys of coral reefs at three sites from 2010-2012 would need to have a directory structure and consistent naming system like so: Table 1. showing folder names in a nested computer directory.

Directory
Level 0 Setting this directory in R would then use the following string which would then access all images from all the 'Site' subfolders across all years: dir <-"C:/users/<user name>/Photos_Survey" Do not use the "~" symbol as a replacement for the full current directory string, as this will cause an error. Note that this User Guide uses examples from Windows but that it will also work on other platforms (e.g., Mac).
If Directory Variables are not needed for a specific study, then all images can be pooled in a single folder.

Part 2.C ---ROI Variables (ROI Type, ROI Replicate, & ROI Label Code)
A consistent ROI labelling scheme is essential, as the R-package will use these to form the calibrated size database. The labelling scheme is different for the three categories of ROIs that we use. These ROI categories are: 1. Objects of Interest (e.g., a coral) 2. Measurement Lengths (for Calibration) 3. Calibrations Points (a shortcut for consecutive calibration lengths -see Part 2.A -ROI Variables)

1) Objects of Interest:
The ROI name label is a concatenation of three sections. These sections are inputted in order directly after one another without any separator. These three sections are: i. ROI Type: This is a categorical ROI Type Variable which must have at least one grouping level. Note that multiple grouping levels are allowed too. For example, if we are only interested in corals the only code needed is "c" for coral. However, if we are interested in multiple taxa, then this could be recorded as: -c -coral -s -sponge -u -sea urchin NOTE: The code "M" cannot be used as this is reserved for calibration measurement lengths. NOTE: Capital and lower case letters are treaded differently. You can use only one letter if you wish. Numbers and special characters are not allowed, as numbers are for the ROI replicate. ii.

Replicate Number in that Group:
The number of that ROI of that group in that specific image. i.e. the 4 th coral outlined in that photo would be recorded as = c4 Note that it is okay if ROIs from different images have the same code (e.g., multiple images that contain the ROI code c4).
iii. ROI Code: This is a string of the additional user-defined variables to define. For instance, if we want to record bleaching and partial mortality for each coral then we would do the following: EXAMPLES: The 3rd coral in the photo. It is bleached and has partial mortality: c3bpm The 4th coral in the photo. It is not bleached and has partial mortality: c4pm etc. NOTE: no label can be used to code for multiple different categorical variables.

3) Calibration Points:
This is an ROI that is created when making the Measurement Length (M1, M2, M3, M4) using macros [7] + [8]. This ROI has no other use than creation of M1-M4 which are used later in R to calibrate the size metrics. Code Name: Cali_Pts Part 2.D -Step-by-step Workflow -Macros and Shortcuts As mentioned before, SizeExtractR macros are found in Plugins > Macros > Startup Macros. NOTE 1: the order of the macro descriptions is different in this manual than in the startup macros script -so they can be explained in a more logical way. NOTE 2: the hotkey for the keyboard shortcut is found in the square brackets [ ] (i.e. hand tool = press the "g" key) For the following step-by-step, you can try it yourself on some sample images. Download an unpack the data repository (https://doi.org/10.25405/data.ncl.15106455). Open the folder "Data_preImageJ" and look inside. This folder contains some example images for you to annotate. The final version of fully annotated images is found in the folder "Data_postImageJ_all". That is how your analysis should look once the ImageJ step is complete. 2) For each image in that folder, follow the following protocol. Then move to the next folder of images: i.

Name -"Setup For Benthic Plot Analysis [l]"
Shortcut Keyl Use -This macro sets the size metrics to be measured and records, and opens the ROI manager panel which is useful to have open whilst working.
ii. Name -"multipoint line [7]" Shortcut Key -7 Use -For calibration. Pick the points to be used to make calibration lengths (see shortcut [8]).
iii. Name -"Calibration Lengths [8]" Shortcut Key -8 Use -For calibration. Saves the output of multipoint line [7] as separate lengths called M1, M2, M3, M4. It saves the lengths between consecutive points. Hence the placement of points using multipoint line [7] must be linear and consecutive. NOTE: You will specify the true calibration length (in users own units, e.g., centimetres) later in R software. NOTE: You cannot move onto the next image until a minimum of 1 calibration length has been save as an ROI (named M1, M2, M3 or M4).
iv. Name -"freehand [q]" Shortcut Keyq Use -This tool draws a freehand polygon. Used to outline the ROIs of interest (e.g., corals).

v.
Name -"AddAndNameROI [u]" Shortcut Keyu Use -Used to add the ROI to the ROI manager, and give it a ROI name label, in accordance with the user-defined labelling scheme. A box will appear asking for a name. Here you must input a string that the R-package has been programmed to read. For a full description of the string see below.
NOTE: This function is designed to disallow ROIs with the same name. Therefore, if we are working on corals, and in the image there are three coral colonies, then they must be named: c1, c2, c3. This helps to identify individual colonies at a later stage.

Figure 2.
Image analysis screenshot with all labelled ROIs added to the ROI manager, showing calibration points (Cali_Pts), 10cm calibration lengths (M1 to M4), and coral ROIs named according to the labelling scheme (c: coral, b: moderately bleached, bb: fully bleached, pm: partial mortality, and o: out-of-frame). Once the image is completed to this level, the user needs to use the shortcut 'Save Area Next Image [n]' to save all relevant files and move to the next image in the folder.
Part 2.E -Additional Other Tools (macros and shortcuts and links) 1) Name -Ctrl and scroll (zoom on mouse) (Mac: + and -sign to zoom in and out) Use -Zoom in/out. Use a magnification that allows you determine the objects of interest.
2) Name -"hand [g]" Shortcut Keyg Use -Set the hand tool to pan across image when zoomed in.
3) Name -"line [p]" Shortcut Keyp Use -For calibration. Line tool to draw a line between two points. This must be used in conjunction with AddAndNameROI [u] to name the ROI as a calibration length (M1, M2, M3, or M4). Note that for most cases the shortcuts [7] and [8] will be more useful.

4) Name -"Next image -no change [r]"
Shortcut Keyr Use -This is useful for a folder full of original jpgs, outline jpgs, zip folders and txt files. The shortcut [n] (above) will move on to the next image which will be an outlined image without ROIs. When this happens, use this shortcut [r].
It will automatically open the next image (the next original image to be analysed) and all ROIs that go along with it. You can then add ROIs as you wish.
5) Further size analysis of ROI files can be achieved using the new R-package: RImageJROI (Sterratt and Vihtakari, 2021).

Part 3 -Calibrated Database Formation in R
Three main steps in R are described here in subsections. The following subsections are: A. Checking, Collating, and Building a Calibrated Dataset B. Worked Example using the main SizeExtractR workflow function C.
Step-by-step without using the all-in-one-wrapper function D. Additional Functions (Full Workflow Function and Plotting Function) Part 3.A -Checking, Collating, and Building a Calibrated Dataset Within this R package there are series of seven R functions that are used to check for human errors made during image analysis, for instance an ROI name label being written incorrectly, to build an error free calibrated dataset. These are designed to all be run in sequence.
There is an additional function which runs these seven in-sequence functions automatically, however, for the purposes of learning, it is worth first running the function individually.
For all R functions, you can use the help file to understand the specific inputs needed, and outputs.

? <name of function>
Note that during data checks you may be shown a mistake (e.g., made during ROI labelling, or folder naming). This will cause R to error and provide you with information on the terminal about what the problem was during quality control checking, and how to fix it. The user may need to fix mistakes outside of the R environment. Then, simply re-run that R function to check if it now passes the quality control check.
Likely Errors and how they can be fixed:

Directory Variables
If you find a mistake in the directory variables, this either relates to a misspelling in a folder name, or an inconsistent directory structure (e.g., if Site folders are always within Year folders, but in one case it is not, this will flag an error). You will need to fix this. To do that, rename / organise the folders appropriately outside of the R environment.

ROI Type and ROI Code Variables
If you find a mistake during ROI labelling, then you will need to fix this. Organising raw images and directory tree III.
Choosing a user-defined labelling system IV.
Image analysis using ImageJ macros to export size uncalibrated size data V.
Quality control and compiling all datafiles into a single size dataset and saving in R VI.
Plotting size-frequency distributions in R

Part 3.B.I -Prerequisites
First software must be installed and prepared. ImageJ must be installed locally (see Part 1.A) and the SizeExtractR macros attached as start-up macros (see Part 1.B). R must be installed locally (see Part 1.C) and the SizeExtractR R-package must be installed (see Part 1.D). There are two datasets for the worked example (https://doi.org/10.25405/data.ncl.15106455): • Before ImageJ: The first is simple a folder of raw images, named "Data_preImageJ". This will be used to learn how to use the ImageJ macros on a set of raw images. • After ImageJ: The second is the same folder of images, except after the ImageJ analysis, named "Data_postImageJ_all".
Explore these folders now to get an idea of how the output files from ImageJ look.
Part 3.B.II -Organising raw images and directory tree First the images must be organised within a single folder or in a directory tree manually. This has already been done in this worked example. Open the folder "Data_preImageJ" and explore this directory. Note that we have two levels in this directory: Note that the naming of these folders must be consistent throughout the directory levels. For this case, the first directory level has only "year" folders, and second directory level has only "site" folders, named consistently within each year folder. The images are then all stored in the "site" folders.
Part 3.B.III -Choosing a user-defined labelling system As described in the manuscript, the ROI name labels that are annotated for each object in each image are a combination three codes ROI Type, ROI Replicate, and ROI Label Code. For this example we will are only interested in the coral P. aliciae, so the ROI Type code is 'Pa'. Now we must define a simple labelling system for the ROI Label Code (the final alphabetical code in the annotation). In this example, we will record partial mortality and bleaching status (moderate or severe) as in the example shown in the SizeExtractR manuscript (Fig. 4). The labelling system can be seen in Table 4. Table 4. User-defined ROI Label Code labelling system. Note that a coral can also be labelled without any of these factors, for example a healthy coral could be labelled as Pa1, where a moderately bleached coral would be Pa1bb. This table is also what will be used later during quality control in R.

ROI_Label_code Corresponding_Variable_Name b
Mod_Bleached bb Sev_Bleached pm Partial_Mortality Part 3.B.IV -Image analysis using ImageJ macros to export size uncalibrated size data Now, we will demonstrate the image analysis on one image from the dataset of raw images (folder name is "Data _pre-ImageJ"). However, this would then be repeated for each image in the dataset.
Follow the steps below:

To outline and annotate the first coral:
o Press the shortcut [q] to make sure the freehand tool is selected. o Manually outline a coral using a mouse click and drag.
o Press the shortcut [u] and type in the ROI name label for that coral (e.g., if it is the first coral in the image to be annotated and it is a severely bleached coral, this will be 'Pa1bb'). 6. Repeat step 5 for every other coral in the image. 7. To save extract the size data, save all output files, and move to the next image in this folder, simply press the shortcut [n].
Now you have finished the analysis on the first image. On your own dataset this would be completed for all the other images in the dataset, however for the sake of this example you can stop at this stage and continue to the next step. The second dataset folder, named "Data _post-ImageJ_all", is a fully completed analysis for this image dataset, with annotations for all corals and all images.
Part 3.B.V -Quality control, compiling text files to single size dataset, and saving in R Now we will work from the processed image dataset, named "Data _post-ImageJ_all". The next step is to complete the quality control of annotations and build the single size dataset based on the data from each image which are currently stored as individual text files in the subfolders named "ImageJ_Output". The code can be copied into and run from an R script to run through the example.
Follow the steps below: • Setup 1. Open R.
2. Install SizeExtractR: Do this only if the package is not yet installed. > library(devtools) > devtools::install_github("liamlachs/SizeExtractR") 3. Load SizeExtractR > library(SizeExtractR) • Save Path Save a variable with the path string to the root directory folder containing the image analysis files. Do not use the '..', '~', or '\' symbols in the path directory. > mypath <-"<fill your directory>/Data_postImageJ_all" • Run Full_SizeExtractR_Workflow() Note that the known.calibration.length parameter is entered as a value 10 (cm), as that was length of each calibration length on the measurement stick. Therefore, all computed size metrics will be in centimetres. The include.calibrations parameter is set to FALSE to avoid measurement stick lengths (e.g., Cali_Pts and M1-M4) being included in the final dataset. > data <-Full_SizeExtractR_Workflow(mypath, known.calibration.length = 10, include.calibrations = FALSE) • Step-by-step for Full_SizeExtractR_Workflow() Everything this function does is described in the following steps, and will culminate in a full calibrated size dataset. However, if mistakes are found during quality control, you will need to make changes outside R manually, and then rerun Full_SizeExtractR_Workflow(). Note: For each following step there is a screen shot of the R console after running Full_SizeExtractR_Workflow(). Please read the red text which is a guide for navigating the interactive quality control checks and variable setting.

Quality control -Directory Variables
Ensures the folder names are all correct. If any names are incorrect, rename the folders manually outside of R, and rerun Full_SizeExtractR_Workflow().

Set Directory Variable names
The second step is to fill in the Variable names for each directory level. Here we have entered Timepoint and Site, which will end up as two categorical variables (columns) in the final dataset.

Quality control -ROI Type codes
Ensures there are no human errors (e.g., typos) in the annotated ROI Type codes (c.f. Figure 1). If errors are present choose no and then you will be given an option to locate the specific images that contain errors. Then you would need to rerun the workflow function. If there are no errors then proceed.

Set ROI Variable names
Now we must link the user-defined ROI Label Codes to corresponding names of categorical ROI Variables for the final dataset. This information must be added manually to the ROI_Labels.csv template file outside the R environment. This template file will have been automatically created. For this worked example fill the codes given in Table  4 into the .csv file manually, save it, then continue in R (see example in data repository). Please do this now.

Quality control -ROI Label Codes 1
Ensures there are no human errors (e.g., typos) in the annotated ROI Label Codes (c.f. Fig. 4). Check that the data entered into the .csv file is correct.

Quality control -ROI Label Codes 2
Finally, check that the translation matrix from ROI Label Codes to ROI Variables is correct.

• Congratulations
Now the database is calibrated. You have a single, quality-controlled, calibrated dataset of object sizes from the example image dataset. View the dataset using View() or head().
Notice the variable names we specified have been included in the dataset.
The dataset can be saved using: > write.csv(data, "Calibrated_Dataset.csv", row.names = FALSE) > Plot_Size_Frequency(data, size.metric = "Area", log_size = TRUE, nbins = 10, group_by = c("Site", "Timepoint"), facetRow_by = "Timepoint", facetCol_by = "Site", scales_gg = "fixed") • Database the output of Function 2, an uncalibrated database • As a reminder the ROI Type is the alphabetical code before the replicate number, that differentiates measurement lengths (i.e., M), calibration points (i.e., Cali_Pts), and userdefined objects of interest (e.g., s and c, for sponges and corals respectively). • If there are issues in these codes or misspellings, the user will be prompted information about which images they are from and can then amend these in imageJ by opening/editing the specific image and ROIs zip file. • Once issues are solved, and this check is passed you can move to the next check. 4 CheckSet_ROILabelCodeVars()

Inputs
• Database the output of Function 2, an uncalibrated database • path the directory path Check ROI Label Codes and set Variable names (interactive) • This function checks the ROI Label Codes for any errors, and sets the variable names. • As a reminder the ROI Label Code is the alphabetical code after the ROI Replicate number, that is fully user-defined. For example, the corals in the case study were given the codes "b" for bleached, or nothing for not bleached. • This function creates a "label translator matrix" that is used by R to translate the ROI Label Codes into TRUE/FALSE database variables. • The user must provide the labelling scheme in a csv file which is auto-created in the mypath directory folder. • Once these checks are passed you can move on.
• If no ROI label codes are required, then user should still run the function, however, it will flag that no ROI Label Codes are used, and signal this to the next function.

Inputs
• Database the output of Function 2, an uncalibrated database • label.translator the output of Function 4, the "label translator matrix" Add ROI Label Code Variables (non-interactive).
• This function adds the ROI Label Code Variables to the dataset via the "label translator matrix". • The output is a new database that has all userdefined variables but is still uncalibrated. • Once this runs you can move on.
• This function calibrates all the size metric variables based on the average number of pixels among the multiple calibration lengths (M1-M4) for each photo independently.