`extract_annotations_dataset`

Batch process annotation volumes to extract bounding box coordinates or centers as CSV files, with flexible organization and optional statistics tracking.

extract_annotations_dataset(
    nii_folder: str,
    output_path: str,
    view: str = "axial",
    saving_mode: str = "case",
    extraction_mode: str = "slice",
    data_mode: str = "center",
    target_size: Optional[Tuple[int, int]] = None,
    save_stats: bool = False
) -> None

Overview

This function processes all annotation masks in a dataset folder by applying the extract_annotations function to each file. It extracts annotation coordinates as CSV files with flexible control over organization and output format.

Key features:

Batch processes entire datasets using extract_annotations internally
Flexible organization: per-case folders or shared view folders
Optional statistics tracking for dataset overview
Works with extract_slices_dataset for aligned image-annotation pairs
Progress tracking with tqdm

The function provides flexible control over:

Anatomical view: Extract from axial, coronal, or sagittal slices
Organization: Group by case or by view
Granularity: Per-slice or per-volume extraction
Data format: Bounding boxes, center points, or radius format
Coordinate adjustment: Optional padding compensation for alignment with extracted images

Parameters

Name	Type	Default	Description
`nii_folder`	`str`	required	Path to the directory containing annotation volumes in `.nii.gz` format.
`output_path`	`str`	required	Root directory where extracted annotations will be saved.
`view`	`str`	`"axial"`	Anatomical view for extraction: `"axial"`, `"coronal"`, or `"sagittal"`.
`saving_mode`	`str`	`"case"`	Organization mode: `"case"` (folder per file) or `"view"` (shared folder).
`extraction_mode`	`str`	`"slice"`	Granularity: `"slice"` (CSV per slice) or `"volume"` (single CSV per case).
`data_mode`	`str`	`"center"`	Output format: `"center"` (point coordinates), `"box"` (bounding boxes), or `"radius"` (center + radius).
`target_size`	`Optional[Tuple[int, int]]`	`None`	Target dimensions (height, width) for coordinate adjustment to account for padding.
`save_stats`	`bool`	`False`	If `True`, saves annotation statistics as `<view>_annotations_stats.csv`.

Returns

None – The function saves CSV files to disk.

Output Organization

Saving Modes

Case Mode (`saving_mode="case"`)

Creates a separate folder for each annotation file (recommended for datasets):

output_path/
├── case_001/
│   └── axial/
│       ├── case_001_axial_000.csv
│       ├── case_001_axial_001.csv
│       └── ...
├── case_002/
│   └── axial/
│       └── ...

Note: Each case folder is passed to extract_annotations with its annotations organized in a view subfolder.

View Mode (`saving_mode="view"`)

Groups all annotations in a single view folder:

output_path/
└── axial/
    ├── case_001_axial_000.csv
    ├── case_001_axial_001.csv
    ├── case_002_axial_000.csv
    └── ...

Note: All files are extracted to a shared view folder using extract_annotations.

Extraction Modes

The extraction_mode parameter is passed directly to extract_annotations as the saving_mode parameter:

Slice Mode (`extraction_mode="slice"`)

Creates one CSV per slice:

Filename pattern: <PREFIX>_<VIEW>_<SLICE_NUMBER>.csv
Example: patient_042_axial_015.csv
Maps to extract_annotations(..., saving_mode="slice")

Volume Mode (`extraction_mode="volume"`)

Creates one CSV for the entire volume:

Filename pattern: <PREFIX>.csv
Contains annotations from all slices with slice index information
Maps to extract_annotations(..., saving_mode="volume")

Data Formats

The data_mode parameter is passed directly to extract_annotations. See the extract_annotations documentation for complete details on each format.

Center Mode (`data_mode="center"`)

CSV contains the center coordinates of each annotation’s bounding box.

Volume mode columns:

CENTER_X: X coordinate of bounding box center
CENTER_Y: Y coordinate of bounding box center
CENTER_Z: Z coordinate (slice index)

Slice mode columns:

CENTER_X: X coordinate of bounding box center
CENTER_Y: Y coordinate of bounding box center

Box Mode (`data_mode="box"`)

CSV contains full bounding box coordinates.

Volume mode columns:

X_MIN, Y_MIN, Z_MIN: Minimum coordinates
X_MAX, Y_MAX, Z_MAX: Maximum coordinates

Slice mode columns:

X_MIN, Y_MIN: Minimum coordinates
X_MAX, Y_MAX: Maximum coordinates

Radius Mode (`data_mode="radius"`)

CSV contains center coordinates and radii from center to bounding box edges.

Volume mode columns:

CENTER_X, CENTER_Y, CENTER_Z: Center coordinates
RADIUS_X, RADIUS_Y, RADIUS_Z: Radii in each direction

Slice mode columns:

CENTER_X, CENTER_Y: Center coordinates
RADIUS_X, RADIUS_Y: Radii in each direction

Anatomical Views

The view parameter determines which axis to extract along:

View	Extraction Axis	Description
`"axial"`	Z-axis	Horizontal slices (top-down)
`"coronal"`	Y-axis	Frontal slices (front-back)
`"sagittal"`	X-axis	Lateral slices (left-right)

Target Size and Coordinate Adjustment

When target_size is specified, coordinates are adjusted to account for padding applied during image extraction. This parameter is passed directly to extract_annotations for each file.

Important: Use the same target_size value for both extract_slices_dataset and extract_annotations_dataset.

Example:

# Extract images with padding to 512x512
extract_slices_dataset(..., target_size=(512, 512))

# Extract annotations with matching adjustment
extract_annotations_dataset(..., target_size=(512, 512))

See extract_annotations documentation for details on coordinate adjustment behavior.

Statistics File

When save_stats=True, a CSV file is created with annotation counts per file:

Column	Description
`FILENAME`	Annotation file name
`NUM_ANNOTATIONS`	Number of annotations in the file
`TOTAL_ANNOTATIONS`	Sum across all files (last row)

The file is named <view>_annotations_stats.csv and saved in output_path.

Exceptions

Exception	Condition
`FileNotFoundError`	The `nii_folder` does not exist or contains no `.nii.gz` files
`ValueError`	Invalid `view`, `saving_mode`, `extraction_mode`, or `data_mode`

Usage Notes

Input Format: Only .nii.gz files are processed
Progress Display: Shows progress bar with tqdm for batch processing
Error Handling: Files that fail extraction are skipped with error messages
Coordinate System: Coordinates are in voxel space (0-indexed)
Underlying Function: Each file is processed using extract_annotations
Statistics: Tracked across all files and saved when save_stats=True

Examples

Basic Usage - Slice-Based Extraction

Extract center coordinates for each axial slice across all files:

from nidataset.slices import extract_annotations_dataset

extract_annotations_dataset(
    nii_folder="dataset/annotations/",
    output_path="extracted/annotations/",
    view="axial",
    saving_mode="case",
    extraction_mode="slice",
    data_mode="center"
)
# For each file, calls extract_annotations with saving_mode="slice"
# Creates: extracted/annotations/case_001/axial/case_001_axial_000.csv, ...

With Statistics Tracking

Enable annotation statistics for dataset overview:

extract_annotations_dataset(
    nii_folder="dataset/masks/",
    output_path="output/labels/",
    view="coronal",
    saving_mode="view",
    extraction_mode="slice",
    data_mode="center",
    save_stats=True
)
# Creates: output/labels/coronal_annotations_stats.csv

Full Bounding Boxes with Padding Adjustment

Extract complete bounding boxes with coordinate adjustment:

extract_annotations_dataset(
    nii_folder="data/segmentations/",
    output_path="data/bbox_labels/",
    view="axial",
    saving_mode="case",
    extraction_mode="slice",
    data_mode="box",
    target_size=(512, 512),
    save_stats=True
)
# Coordinates adjusted for 512x512 padded images

Volume-Based Extraction

Create single CSV per case with all annotations:

extract_annotations_dataset(
    nii_folder="annotations/",
    output_path="volume_labels/",
    view="sagittal",
    saving_mode="case",
    extraction_mode="volume",
    data_mode="center",
    save_stats=True
)
# For each file, calls extract_annotations with saving_mode="volume"
# Creates: volume_labels/case_001/sagittal/case_001.csv

Complete Image-Annotation Pipeline

Extract aligned images and annotations for training:

from nidataset.slices import extract_slices_dataset, extract_annotations_dataset

# Step 1: Extract images with padding
extract_slices_dataset(
    nii_folder="data/scans/",
    output_path="training_data/images/",
    view="axial",
    saving_mode="case",
    target_size=(512, 512),
    normalization="min-max",
    save_stats=True
)

# Step 2: Extract annotations with matching adjustment
extract_annotations_dataset(
    nii_folder="data/masks/",
    output_path="training_data/labels/",
    view="axial",
    saving_mode="case",
    extraction_mode="slice",
    data_mode="box",
    target_size=(512, 512),  # Must match image extraction
    save_stats=True
)

# Result: Aligned image-annotation pairs ready for training

Multi-View Extraction

Extract annotations from all three anatomical views:

from nidataset.slices import extract_annotations_dataset

views = ["axial", "coronal", "sagittal"]
base_path = "multi_view_annotations/"

for view in views:
    print(f"Extracting {view} view...")
    extract_annotations_dataset(
        nii_folder="dataset/labels/",
        output_path=base_path,
        view=view,
        saving_mode="view",
        extraction_mode="slice",
        data_mode="center",
        save_stats=True
    )

# Creates separate folders for each view with statistics

Analyzing Statistics

Review annotation distribution across dataset:

import pandas as pd
from nidataset.slices import extract_annotations_dataset

# Extract with statistics
extract_annotations_dataset(
    nii_folder="annotations/",
    output_path="results/",
    view="axial",
    saving_mode="view",
    extraction_mode="slice",
    data_mode="center",
    save_stats=True
)

# Load and analyze statistics
stats = pd.read_csv("results/axial_annotations_stats.csv")

# Remove total row for per-file analysis
per_file = stats[stats['FILENAME'] != 'TOTAL_ANNOTATIONS'].copy()
per_file['NUM_ANNOTATIONS'] = pd.to_numeric(per_file['NUM_ANNOTATIONS'])

print("Annotation Statistics:")
print(f"  Total files: {len(per_file)}")
print(f"  Files with annotations: {(per_file['NUM_ANNOTATIONS'] > 0).sum()}")
print(f"  Average annotations per file: {per_file['NUM_ANNOTATIONS'].mean():.2f}")
print(f"  Max annotations: {per_file['NUM_ANNOTATIONS'].max()}")
print(f"  Min annotations: {per_file['NUM_ANNOTATIONS'].min()}")

# Files without annotations
empty = per_file[per_file['NUM_ANNOTATIONS'] == 0]
if not empty.empty:
    print(f"\nWarning: {len(empty)} files have no annotations:")
    print(empty['FILENAME'].tolist())

Quality Control Workflow

Verify annotation extraction quality:

import pandas as pd
from nidataset.slices import extract_annotations_dataset

# Extract annotations
extract_annotations_dataset(
    nii_folder="masks/",
    output_path="qa/annotations/",
    view="axial",
    saving_mode="case",
    extraction_mode="slice",
    data_mode="box",
    save_stats=True
)

# Check a sample annotation file
sample_csv = "qa/annotations/case_001/axial/case_001_axial_010.csv"
df = pd.read_csv(sample_csv)

print(f"Sample slice has {len(df)} annotations")
print("\nBounding box sizes:")
df['width'] = df['X_MAX'] - df['X_MIN']
df['height'] = df['Y_MAX'] - df['Y_MIN']
print(df[['width', 'height']].describe())

# Identify potential issues
small_boxes = df[(df['width'] < 5) | (df['height'] < 5)]
if not small_boxes.empty:
    print(f"\nWarning: {len(small_boxes)} very small annotations detected")

Radius Mode for Analysis

Extract center and radius information for size analysis:

extract_annotations_dataset(
    nii_folder="nodule_masks/",
    output_path="nodule_analysis/",
    view="axial",
    saving_mode="case",
    extraction_mode="volume",
    data_mode="radius",
    save_stats=True
)
# Each case gets a single CSV with center coordinates and radii

Typical Workflow

from nidataset.slices import extract_annotations_dataset
import pandas as pd

# 1. Define paths
annotation_folder = "data/segmentation_masks/"
output_folder = "data/extracted_labels/"

# 2. Extract annotations with statistics
extract_annotations_dataset(
    nii_folder=annotation_folder,
    output_path=output_folder,
    view="axial",
    saving_mode="case",
    extraction_mode="slice",
    data_mode="box",
    target_size=(512, 512),
    save_stats=True
)

# 3. Review statistics
stats = pd.read_csv(f"{output_folder}/axial_annotations_stats.csv")
print(stats.head())

# 4. Use extracted annotations for training
# - Load corresponding images from extract_slices_dataset
# - Create dataloaders with image-annotation pairs
# - Train detection or segmentation models

extract_annotations_dataset

Overview

Parameters

Returns

Output Organization

Saving Modes

Case Mode (saving_mode="case")

View Mode (saving_mode="view")

Extraction Modes

Slice Mode (extraction_mode="slice")

Volume Mode (extraction_mode="volume")

Data Formats

Center Mode (data_mode="center")

Box Mode (data_mode="box")

Radius Mode (data_mode="radius")

Anatomical Views

Target Size and Coordinate Adjustment

Statistics File

Exceptions

Usage Notes

Examples

Basic Usage - Slice-Based Extraction

With Statistics Tracking

Full Bounding Boxes with Padding Adjustment

Volume-Based Extraction

Complete Image-Annotation Pipeline

Multi-View Extraction

Analyzing Statistics

Quality Control Workflow

Radius Mode for Analysis

Typical Workflow

`extract_annotations_dataset`

Case Mode (`saving_mode="case"`)

View Mode (`saving_mode="view"`)

Slice Mode (`extraction_mode="slice"`)

Volume Mode (`extraction_mode="volume"`)

Center Mode (`data_mode="center"`)

Box Mode (`data_mode="box"`)

Radius Mode (`data_mode="radius"`)