A. LabRI Package


The LabRI tool is an R Markdown file that employs the indirect method called the LabRI Method. This method is an adaptive and multi-criteria approach for the estimation and verification of reference intervals, utilizing a combination of data cleaning algorithms, data transformation, clustering techniques, and the refineR and reflimR algorithms or the expectation-maximization (EM) algorithm, depending on the number of clusters in the truncated distribution.


    Characteristics of the LabRI Method:


  • Adaptive: Adjusts its application based on the structure and characteristics of the data, using different cleaning and transformation techniques as needed. Applies the Centroid of Windsorized Reference Limits method using refineR and reflimR if the data distribution has more than one cluster for reference interval estimation. If there is only one cluster, the expectation-maximization algorithm is used with both parametric and non-parametric approaches to obtain the best reference interval estimate.

  • Multi-criteria: Considers multiple criteria and methods for the estimation and verification of reference intervals, ensuring a robust and comprehensive analysis.


Clique Aqui para Explorar o Blog do Grupo Lab R

Download the compressed package (Click here)



B. Tutorial


Tutorial for Installing R and RStudio, R Packages, and Using Tools from the LabRI Package

1º) Installing The R Language


When entering the page http://www.r-project.org, we must click “CRAN”, as shown in Fig. 1:


Fig. 1. Website www.r-project.org. The blue arrow indicates the link to access the Comprehensive R Archive Network (CRAN).


After that, we must choose which server we will download the R language from. Take, for example, a University of São Paulo (USP) mirror (Fig. 2A). Next, we must choose which Operating System your machine will use to install the R language (Fig. 2B):


  • For Windows users: click on “Download R for Windows”, then “base” and “Download R-X.X.X for Windows”. Run the downloaded file and install it as usual, with the default settings;

  • For Mac users: click on “Download R for macOS”, then on “R- X.X.X.pkg”. Install the application as usual, with the default settings;

  • For Linux users: check if the adopted distribution already has R installed at the factory – e.g.: Debian, RedHat [Fedora], Suse [OpenSuse], Ubuntu. If not, follow the instructions at the link https://vps.fmvz.usp.br/CRAN/.


Fig. 2. Interface of the Comprehensive R Archive Network (CRAN) page, where users can choose the closest server (mirror) to download the R language. A: In the example, a server from the University of São Paulo (USP) is highlighted ) in Brazil (blue arrow). B: After choosing the server, the user must select the operating system of their machine to proceed with the download. The options available for Windows (red arrow), macOS (blue arrow) and various Linux distributions (black arrow).


2º) Installing Rstudio


On the https://posit.co/downloads/ page, we must click on “Downloadn RStudio” (Fig. 3A). After that, we must scroll down the page and click on the button associated with the “Install RStudio” option (Fig. 3B).

NOTE: Remember to install the R software before installing RStudio. After completing the download, install the software keeping the default settings.

Fig. 3. Downloading and Installing RStudio. (A) RStudio IDE download page. The red arrow indicates the “DOWNLOAD RSTUDIO” button, which users should click to start the download process. (B) Installation steps for R and RStudio. The red arrow in section 2 points to the “DOWNLOAD RSTUDIO DESKTOP FOR WINDOWS” button, guiding users to download the correct version for their operating system.


3º) Installing R Packages

An R package is a collection of functions, data and documentation that extends the capabilities of the R programming environment. After installing the R and RStudio software, we must install some R packages.

A. Download and unzip the “LabRI_Package.zip” folder to the desired location on your computer (Fig. 4);

B. Open the unzipped folder and look for the file with the extension “.Rproj” called “1_PROJECT_LabRI_Tool”. This is the RStudio project file;

C. Double click on the “1_PROJECT_LabRI_Tool.Rproj” file. This will launch RStudio (if it is already installed on your machine) and automatically open the project, setting the correct directory as your working directory;

D. Once inside RStudio, you will see all Rmarkdown files (extension “.Rmd”) and scripts associated with the project listed in the files panel;

E. In the list of files, locate and click the file named “2_INSTALL_R_packages.Rmd”. This file contains the script necessary to install the R packages required to use the LabRI and refineR tools (Fig. 5);

F. With the file open, you will see the “block of code”, known as the “Chunk”. To install the packages, you will need to run this “Chunk”. To run a Chunk, position the cursor inside the Chunk and look for a button in the top right corner of the Chunk that looks like a small triangle or arrow (sometimes referred to as “Run Current Block” or “Run Current Chunk”). Click this button to run the code inside “Chunk” (Fig. 6);

H. After running “Chunk”, RStudio will attempt to install and load all listed packages.


Fig.4. Unzipping and Organizing the LabRI Package. Compressed file “LabRI_Package.zip” that needs to be unzipped (blue arrow) to access its contents. Once unzipped, the folder “LabRI_Package”, containing various subfolders and files necessary for the project. Directory structure within the “LabRI_Package” folder, highlighting the project file “1_PROJECT_LabRI_Tool.Rproj” (red arrow) for opening the project in RStudio. The directory also includes other important files and R scripts such as “2_INSTALL_R_packages.Rmd”, “3_SCRIPT_RMask_Pseudoanonymizer.Rmd”, “4_SCRIPT_LabRI_Method.Rmd”, and “5_SCRIPT_refineR_algorithm.Rmd”


Fig.5. RStudio console displaying the startup information for R version 4.4.0, including details about the R environment and instructions for accessing help and citations. This indicates the successful launch of an R session. Project directory structure in RStudio, highlighting the project file “1_PROJECT_LabRI_Tool.Rproj” (blue arrow).


Fig.6. RStudio environment displaying the R Markdown file “2_INSTALL_R_packages.Rmd” with a code block (chunk) dedicated to installing and loading the necessary R packages for the project. The comment line instructs to click “Run Current Chunk” to execute the code. The code block (chunk) is highlighted, showing the list of packages to be checked and installed if not already present. The “Run Current Chunk” button (blue arrow) is highlighted with the instruction “Click ‘Run Current Chunk’ to execute the code block” in a yellow box, emphasizing the action needed to install and load the required packages.


In the console, you will see a string of TRUE or FALSE values. A TRUE value indicates that the package was loaded successfully, while a FALSE value indicates that there was a problem. To ensure the full functioning of tools that indirectly estimate the reference range, such as LabRI and refineR, it is crucial that all listed packages return TRUE after attempted installation and loading (Fig. 7).


Fig.7. Packages checked and their installation status (blue bracket), ensuring all dependencies are met for running the scripts effectively.


SETTINGS

A) SECRET KEY

  • Secret_key (salt): Key used for hashing purposes. It is crucial to keep it confidential to ensure the security of the hashing process.

B) FILE NAME AND COLUMNS

  • File_name: Name of the file that contains the dataset.
  • IdentifierColumn: Column containing unique identifiers that will be pseudo-anonymized.
  • Column_age: Optional column containing information about the patient’s age or another identifier, which can be combined with IdentifierColumn for hashing.

C) FILE FORMAT

  • The user defines the data file format by marking the corresponding option with an “X” (CSV, XLSX or XLS).


4º) Settings For Using The LabRI Tool


This Rmarkdown script, “4_SCRIPT LabRI_Method.Rmd” (Fig. 8 and 9), which uses the LabRI method, is designed to indirectly estimate the reference interval from the computerized laboratory system database. Locate it in the file list and click to open it. With this file open, you will see several sections where you can define parameters and information relevant to the analysis.

Follow the instructions and fill in the information requested in each section. Areas where you need to enter information or make selections are clearly indicated. Fill in all textual and numerical information and leave them in quotation marks (” “).


Fig.8. Project directory structure in RStudio highlighting the file “3_SCRIPT RMask_Anonimizer.Rmd” (blue arrow), which contains the script for refining the reference interval algorithm.


SETTINGS

A) FILE AND COLUMN NAME

  • File_Name: The name of the file that contains the data to be analyzed.

  • Column_Name: The name of the specific column within this file that will be used in the analysis.

Note 1: The default setting for the file name is “dataset”.


B) FILE FORMAT

  • CSV, XLSX and XLS: The user marks the file format they are using with an “X”.

Notes 2: The default settings is “CSV” for the file format.


C) DECIMAL SEPARATOR

  • decimal_point and decimal_comma: Mark with an “X” the decimal separator used in the file

Notes 3: The default settings is “.” as a decimal separator.


D) ONE-SIDED OR TWO-SIDED REFERENCE INTERVALS?

  • Here, the user selects the type of reference interval that will be used:

  • Double_sided: Range with Lower and Upper Limit.

  • Right_sided: Upper Limit Only.

  • Left_side: Lower Limit Only.

Note 4: The default setting is a “Double-sided” range.


E) NUMBER OF DECIMALS FOR THE LIMITS OF THE REFERENCE INTERVAL

  • Number_of_decimal_places: Defines how many decimal places will be used for the reference limits. Note 5: The default setting for the number of decimal places is “2”.


INFORMATION ABOUT THE STUDY

A) RESPONSIBLE PERSON

  • Responsible_person: The person responsible for the study or data collection.


B) MEASUREMENT PROCEDURE

  • Measurement_procedure: The procedure used to obtain the measurements.


C) NAME OF MEASURAND

  • Is the name of the analyte the same as the name of the column in the file? Mark the appropriate answer with an “X”: Yes or No

  • Name_of_measurand: Name of the measurand. If it is different from the column name in the csv, xls or xlsx extension file, the user must enter it.


D) UNIT OF MEASUREMENT

  • Unit_of_measurement: Unit of measurement of the analyte.


E) TYPE OF BLOOD SPECIMEN

  • Type_of_blood_specimen: Type of blood sample used.


F) AGE RANGE

  • Age_range: Age range of individuals in the study.


G) SEX

  • Sex: Sex of the individuals in the study.


H) EXCLUSION CRITERIA

  • Exclusion_criteria: Criteria used to exclude certain data or individuals.


I) DATA SOURCE

  • Data_source: Data source.


J) REFERENCE INTERVAL OF THE COMPARATIVE REFERENCE

  • Upper_Reference_Limit_of_Comparative_Reference: Upper Reference Limit of the comparative reference which may be a reference range in use, estimated by another method, published in a scientific journal, available from an external source, etc.

  • Lower_Reference_Limit_of_Comparative_Reference: Upper reference limit of the comparative reference which may be a reference range in use, estimated by another method, published in a scientific journal, available from an external source, etc.

  • Source_of_comparative_reference_used: Source of the comparative reference range. Note: RI = Reference Interval.

NOTE: Some of the approaches, methods, or algorithms that can be used as comparative references can be found in the links below.

[sourceforge, Bellview - Available method: Bhattacharya]

[goCrunch - Available methods: Bhattacharya, Hoffmann and refineR]

University Medicine Oldenburg (Universitätsmedizin Oldenburg, OMU) - Available methods: TMC, TML, refineR and reflimR


L) MANUALLY SELECT THE NUMBER OF GROUPS (1 to 7)

  • Number_of_clusters: Allows the user to manually select the number of groups (1 to 7).


M) SET THE MAXIMUM SAMPLE SIZE IF APPLICABLE

  • Maximum_sample_size: Defines the maximum size of the sample that will be analyzed, when applicable. For devices with RAM between 8 and 16 GB, when possible, a sample of 10,000 to 15,000 to ensure the HTML report is generated in a timely manner. In cases of highly skewed or highly contaminated data (>25%), sample sizes greater than 20,000 or a sample size sufficient to stabilize the reference limit estimates are recommended, when applicable. For devices with RAM greater than 16 GB, the script can process samples greater than 20,000 in an efficient period of time.

Note: Using the indirect sampling technique, the recommended minimum sample size is approximately 1,000 and a considered large sample size is approximately 15,000 to 20,000. Furthermore, as important as the sample size is the quality of the data. Therefore, it is critical to define useful and relevant exclusion criteria and filtering criteria to mitigate the magnitude of contamination with pathological results.


Fig.9. Generating the Reference Interval Estimation and Verification Report using the LabRI Tool. The “Knit” button (blue arrow) is used to render and generate the HTML report. The instruction “Click on ‘Knit’ to start rendering and generating the HTML report” is highlighted in a yellow box for clarity.


5º) Information About The Dataset And The Measurand Under Study


The LabRI package contains a folder called ‘1_Dataset’ where all datasets are stored. For the Rmarkdown LabRI and refineR tools to correctly identify the data set under study, within the “1_Dataset” folder, the name of the file, the format of this file, the name of the column of the measurand under analysis must be correctly identified, highlighted within the folder of datasets. Avoid using spaces when naming the file or column. In place of spaces for compound names, use underscore as shown in Fig. 10.


Fig.10. File and column name organization. (A) Directory structure highlighting the “1_Dataset” folder. The blue arrow points to the “1_Dataset” folder that contains the data to be analyzed. (B) Name of the Excel file “ALP_Male_Omuse_et.al_2020.xlsx” with red brackets highlighting the full file name. (C) Example of a data spreadsheet, where the “ALP” column is highlighted. The blue arrow points to the “ALP” header, indicating the column of interest. (D) R code for defining file and column names. The red arrow points to the file name “ALP_Male_Omuse_et.al_2020” and the blue arrow points to the column name “ALP”.


6º) Export Of Figures


Figure 11A shows the folder named “3_Exported_daset_and_figures”, which contains the main figures exported after executing the LabRI tool script, as illustrated in Figure 11B.


Fig.11. Directory Organization and Exported Figures. (A) Structure of the working directory highlighting the folder “3_Exported_daset_and_figures” (blue arrow), which contains the exported figures. (B) List of exported figures.