`dataset_annotations_info`

Extract 3D bounding boxes for all connected regions of a specific label value from annotation volumes and save them as a CSV file.

dataset_annotations_info(
    nii_folder: str,
    output_path: str,
    annotation_value: int = 1
) -> None

Overview

This function processes all annotation masks in a folder to identify and extract bounding boxes around labeled regions. It uses connected component analysis to detect separate instances of the same label value, making it useful for:

Multi-instance object detection datasets
Lesion or tumor localization in medical images
Organ or anatomical structure boundary extraction
Quality control and annotation verification

Each connected region with the specified annotation value gets its own bounding box, and all boxes are saved to dataset_annotations_info.csv.

Parameters

Name	Type	Default	Description
`nii_folder`	`str`	required	Path to the directory containing annotation volumes in `.nii.gz` format.
`output_path`	`str`	required	Directory where the CSV file will be saved. Created automatically if it doesn’t exist.
`annotation_value`	`int`	`1`	Voxel value representing the region of interest in the annotation masks.

Returns

None – The function saves results to disk.

Output File

CSV Structure

The function creates dataset_annotations_info.csv in the specified output directory with two columns:

Column	Description
`FILENAME`	Name of the annotation file
`3D_BOXES`	List of bounding boxes, each as `[xmin, ymin, zmin, xmax, ymax, zmax]`

Bounding Box Format

Each bounding box is a list of 6 integers representing the minimum and maximum coordinates:

[X_MIN, Y_MIN, Z_MIN, X_MAX, Y_MAX, Z_MAX]

Example CSV content:

FILENAME,3D_BOXES
case_001_mask.nii.gz,"[[45, 67, 23, 89, 112, 56], [120, 130, 40, 145, 155, 65]]"
case_002_mask.nii.gz,"[[50, 70, 30, 95, 115, 60]]"

Connected Component Analysis

The function uses connected component labeling to identify separate instances of the annotation value. This means:

Multiple regions: If the same label value appears in disconnected regions, each region gets its own bounding box
Single region: If all voxels with the annotation value are connected, only one bounding box is created
Empty masks: Files with no voxels matching the annotation value will have an empty list of boxes

Exceptions

Exception	Condition
`FileNotFoundError`	The `nii_folder` does not exist or contains no `.nii.gz` files

Usage Notes

Input Format: Only .nii.gz files are processed
Progress Display: Shows a progress bar with file count during processing
Error Handling: Files that fail to process are skipped with error messages
Output Directory: Automatically created if it doesn’t exist
Coordinate System: Bounding boxes use voxel coordinates (0-indexed)

Examples

Basic Usage

Extract bounding boxes for annotation value 1:

from nidataset.utility import dataset_annotations_info

dataset_annotations_info(
    nii_folder="dataset/annotations/",
    output_path="results/bboxes/",
    annotation_value=1
)
# Creates: results/bboxes/dataset_annotations_info.csv

Multiple Annotation Values

Process different anatomical structures separately:

# Extract liver annotations (value = 1)
dataset_annotations_info(
    nii_folder="dataset/masks/",
    output_path="results/liver_boxes/",
    annotation_value=1
)

# Extract kidney annotations (value = 2)
dataset_annotations_info(
    nii_folder="dataset/masks/",
    output_path="results/kidney_boxes/",
    annotation_value=2
)

Analyzing Results

Load and analyze the extracted bounding boxes:

import pandas as pd
from nidataset.utility import dataset_annotations_info

# Extract bounding boxes
dataset_annotations_info(
    nii_folder="annotations/",
    output_path="output/",
    annotation_value=1
)

# Load and analyze
df = pd.read_csv("output/dataset_annotations_info.csv")
print(f"Total files processed: {len(df)}")
print(f"Files with annotations: {df['3D_BOXES'].apply(lambda x: len(eval(x)) > 0).sum()}")

# Check a specific file
import ast
boxes = ast.literal_eval(df.loc[0, '3D_BOXES'])
print(f"Number of regions in first file: {len(boxes)}")
for i, box in enumerate(boxes):
    xmin, ymin, zmin, xmax, ymax, zmax = box
    print(f"Region {i+1}: Size = {xmax-xmin}×{ymax-ymin}×{zmax-zmin}")

Verifying Annotations

Use bounding boxes to verify annotation quality:

import nibabel as nib
import ast
import pandas as pd
from nidataset.utility import dataset_annotations_info

# Extract boxes
dataset_annotations_info(
    nii_folder="masks/",
    output_path="output/",
    annotation_value=1
)

# Check for suspicious small or large boxes
df = pd.read_csv("output/dataset_annotations_info.csv")

for idx, row in df.iterrows():
    boxes = ast.literal_eval(row['3D_BOXES'])
    
    for box in boxes:
        xmin, ymin, zmin, xmax, ymax, zmax = box
        volume = (xmax - xmin) * (ymax - ymin) * (zmax - zmin)
        
        if volume < 10:
            print(f"Warning: Very small region in {row['FILENAME']}: volume={volume}")
        elif volume > 100000:
            print(f"Warning: Very large region in {row['FILENAME']}: volume={volume}")

Complete Workflow

Extract boxes and create visualization metadata:

import pandas as pd
import ast
from nidataset.utility import dataset_annotations_info

# 1. Extract all bounding boxes
dataset_annotations_info(
    nii_folder="dataset/segmentations/",
    output_path="dataset/metadata/",
    annotation_value=1
)

# 2. Load results
df = pd.read_csv("dataset/metadata/dataset_annotations_info.csv")

# 3. Create summary statistics
summary = []
for idx, row in df.iterrows():
    boxes = ast.literal_eval(row['3D_BOXES'])
    summary.append({
        'filename': row['FILENAME'],
        'num_regions': len(boxes),
        'has_annotations': len(boxes) > 0
    })

summary_df = pd.DataFrame(summary)
print(f"\nDataset Summary:")
print(f"Total files: {len(summary_df)}")
print(f"Files with annotations: {summary_df['has_annotations'].sum()}")
print(f"Average regions per file: {summary_df['num_regions'].mean():.2f}")

Typical Workflow

from nidataset.utility import dataset_annotations_info

# 1. Prepare annotation folder
annotation_folder = "data/segmentation_masks/"
output_folder = "data/bounding_boxes/"

# 2. Extract bounding boxes for target structure
dataset_annotations_info(
    nii_folder=annotation_folder,
    output_path=output_folder,
    annotation_value=1
)

# 3. Review the output
import pandas as pd
df = pd.read_csv("data/bounding_boxes/dataset_annotations_info.csv")
print(df.head())

# 4. Use boxes for downstream tasks
# - Object detection training
# - Region cropping
# - Statistical analysis