extract_annotations_dataset
Batch process annotation volumes to extract bounding box coordinates or centers as CSV files, with flexible organization and optional statistics tracking.
extract_annotations_dataset(
nii_folder: str,
output_path: str,
view: str = "axial",
saving_mode: str = "case",
extraction_mode: str = "slice",
data_mode: str = "center",
target_size: Optional[Tuple[int, int]] = None,
save_stats: bool = False
) -> None
Overview
This function processes all annotation masks in a dataset folder by applying the extract_annotations function to each file. It extracts annotation coordinates as CSV files with flexible control over organization and output format.
Key features:
- Batch processes entire datasets using
extract_annotationsinternally - Flexible organization: per-case folders or shared view folders
- Optional statistics tracking for dataset overview
- Works with
extract_slices_datasetfor aligned image-annotation pairs - Progress tracking with tqdm
The function provides flexible control over:
- Anatomical view: Extract from axial, coronal, or sagittal slices
- Organization: Group by case or by view
- Granularity: Per-slice or per-volume extraction
- Data format: Bounding boxes, center points, or radius format
- Coordinate adjustment: Optional padding compensation for alignment with extracted images
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
nii_folder | str | required | Path to the directory containing annotation volumes in .nii.gz format. |
output_path | str | required | Root directory where extracted annotations will be saved. |
view | str | "axial" | Anatomical view for extraction: "axial", "coronal", or "sagittal". |
saving_mode | str | "case" | Organization mode: "case" (folder per file) or "view" (shared folder). |
extraction_mode | str | "slice" | Granularity: "slice" (CSV per slice) or "volume" (single CSV per case). |
data_mode | str | "center" | Output format: "center" (point coordinates), "box" (bounding boxes), or "radius" (center + radius). |
target_size | Optional[Tuple[int, int]] | None | Target dimensions (height, width) for coordinate adjustment to account for padding. |
save_stats | bool | False | If True, saves annotation statistics as <view>_annotations_stats.csv. |
Returns
None – The function saves CSV files to disk.
Output Organization
Saving Modes
Case Mode (saving_mode="case")
Creates a separate folder for each annotation file (recommended for datasets):
output_path/
├── case_001/
│ └── axial/
│ ├── case_001_axial_000.csv
│ ├── case_001_axial_001.csv
│ └── ...
├── case_002/
│ └── axial/
│ └── ...
Note: Each case folder is passed to extract_annotations with its annotations organized in a view subfolder.
View Mode (saving_mode="view")
Groups all annotations in a single view folder:
output_path/
└── axial/
├── case_001_axial_000.csv
├── case_001_axial_001.csv
├── case_002_axial_000.csv
└── ...
Note: All files are extracted to a shared view folder using extract_annotations.
Extraction Modes
The extraction_mode parameter is passed directly to extract_annotations as the saving_mode parameter:
Slice Mode (extraction_mode="slice")
Creates one CSV per slice:
- Filename pattern:
<PREFIX>_<VIEW>_<SLICE_NUMBER>.csv - Example:
patient_042_axial_015.csv - Maps to
extract_annotations(..., saving_mode="slice")
Volume Mode (extraction_mode="volume")
Creates one CSV for the entire volume:
- Filename pattern:
<PREFIX>.csv - Contains annotations from all slices with slice index information
- Maps to
extract_annotations(..., saving_mode="volume")
Data Formats
The data_mode parameter is passed directly to extract_annotations. See the extract_annotations documentation for complete details on each format.
Center Mode (data_mode="center")
CSV contains the center coordinates of each annotation’s bounding box.
Volume mode columns:
CENTER_X: X coordinate of bounding box centerCENTER_Y: Y coordinate of bounding box centerCENTER_Z: Z coordinate (slice index)
Slice mode columns:
CENTER_X: X coordinate of bounding box centerCENTER_Y: Y coordinate of bounding box center
Box Mode (data_mode="box")
CSV contains full bounding box coordinates.
Volume mode columns:
X_MIN,Y_MIN,Z_MIN: Minimum coordinatesX_MAX,Y_MAX,Z_MAX: Maximum coordinates
Slice mode columns:
X_MIN,Y_MIN: Minimum coordinatesX_MAX,Y_MAX: Maximum coordinates
Radius Mode (data_mode="radius")
CSV contains center coordinates and radii from center to bounding box edges.
Volume mode columns:
CENTER_X,CENTER_Y,CENTER_Z: Center coordinatesRADIUS_X,RADIUS_Y,RADIUS_Z: Radii in each direction
Slice mode columns:
CENTER_X,CENTER_Y: Center coordinatesRADIUS_X,RADIUS_Y: Radii in each direction
Anatomical Views
The view parameter determines which axis to extract along:
| View | Extraction Axis | Description |
|---|---|---|
"axial" | Z-axis | Horizontal slices (top-down) |
"coronal" | Y-axis | Frontal slices (front-back) |
"sagittal" | X-axis | Lateral slices (left-right) |
Target Size and Coordinate Adjustment
When target_size is specified, coordinates are adjusted to account for padding applied during image extraction. This parameter is passed directly to extract_annotations for each file.
Important: Use the same target_size value for both extract_slices_dataset and extract_annotations_dataset.
Example:
# Extract images with padding to 512x512
extract_slices_dataset(..., target_size=(512, 512))
# Extract annotations with matching adjustment
extract_annotations_dataset(..., target_size=(512, 512))
See extract_annotations documentation for details on coordinate adjustment behavior.
Statistics File
When save_stats=True, a CSV file is created with annotation counts per file:
| Column | Description |
|---|---|
FILENAME | Annotation file name |
NUM_ANNOTATIONS | Number of annotations in the file |
TOTAL_ANNOTATIONS | Sum across all files (last row) |
The file is named <view>_annotations_stats.csv and saved in output_path.
Exceptions
| Exception | Condition |
|---|---|
FileNotFoundError | The nii_folder does not exist or contains no .nii.gz files |
ValueError | Invalid view, saving_mode, extraction_mode, or data_mode |
Usage Notes
- Input Format: Only
.nii.gzfiles are processed - Progress Display: Shows progress bar with tqdm for batch processing
- Error Handling: Files that fail extraction are skipped with error messages
- Coordinate System: Coordinates are in voxel space (0-indexed)
- Underlying Function: Each file is processed using
extract_annotations - Statistics: Tracked across all files and saved when
save_stats=True
Examples
Basic Usage - Slice-Based Extraction
Extract center coordinates for each axial slice across all files:
from nidataset.slices import extract_annotations_dataset
extract_annotations_dataset(
nii_folder="dataset/annotations/",
output_path="extracted/annotations/",
view="axial",
saving_mode="case",
extraction_mode="slice",
data_mode="center"
)
# For each file, calls extract_annotations with saving_mode="slice"
# Creates: extracted/annotations/case_001/axial/case_001_axial_000.csv, ...
With Statistics Tracking
Enable annotation statistics for dataset overview:
extract_annotations_dataset(
nii_folder="dataset/masks/",
output_path="output/labels/",
view="coronal",
saving_mode="view",
extraction_mode="slice",
data_mode="center",
save_stats=True
)
# Creates: output/labels/coronal_annotations_stats.csv
Full Bounding Boxes with Padding Adjustment
Extract complete bounding boxes with coordinate adjustment:
extract_annotations_dataset(
nii_folder="data/segmentations/",
output_path="data/bbox_labels/",
view="axial",
saving_mode="case",
extraction_mode="slice",
data_mode="box",
target_size=(512, 512),
save_stats=True
)
# Coordinates adjusted for 512x512 padded images
Volume-Based Extraction
Create single CSV per case with all annotations:
extract_annotations_dataset(
nii_folder="annotations/",
output_path="volume_labels/",
view="sagittal",
saving_mode="case",
extraction_mode="volume",
data_mode="center",
save_stats=True
)
# For each file, calls extract_annotations with saving_mode="volume"
# Creates: volume_labels/case_001/sagittal/case_001.csv
Complete Image-Annotation Pipeline
Extract aligned images and annotations for training:
from nidataset.slices import extract_slices_dataset, extract_annotations_dataset
# Step 1: Extract images with padding
extract_slices_dataset(
nii_folder="data/scans/",
output_path="training_data/images/",
view="axial",
saving_mode="case",
target_size=(512, 512),
normalization="min-max",
save_stats=True
)
# Step 2: Extract annotations with matching adjustment
extract_annotations_dataset(
nii_folder="data/masks/",
output_path="training_data/labels/",
view="axial",
saving_mode="case",
extraction_mode="slice",
data_mode="box",
target_size=(512, 512), # Must match image extraction
save_stats=True
)
# Result: Aligned image-annotation pairs ready for training
Multi-View Extraction
Extract annotations from all three anatomical views:
from nidataset.slices import extract_annotations_dataset
views = ["axial", "coronal", "sagittal"]
base_path = "multi_view_annotations/"
for view in views:
print(f"Extracting {view} view...")
extract_annotations_dataset(
nii_folder="dataset/labels/",
output_path=base_path,
view=view,
saving_mode="view",
extraction_mode="slice",
data_mode="center",
save_stats=True
)
# Creates separate folders for each view with statistics
Analyzing Statistics
Review annotation distribution across dataset:
import pandas as pd
from nidataset.slices import extract_annotations_dataset
# Extract with statistics
extract_annotations_dataset(
nii_folder="annotations/",
output_path="results/",
view="axial",
saving_mode="view",
extraction_mode="slice",
data_mode="center",
save_stats=True
)
# Load and analyze statistics
stats = pd.read_csv("results/axial_annotations_stats.csv")
# Remove total row for per-file analysis
per_file = stats[stats['FILENAME'] != 'TOTAL_ANNOTATIONS'].copy()
per_file['NUM_ANNOTATIONS'] = pd.to_numeric(per_file['NUM_ANNOTATIONS'])
print("Annotation Statistics:")
print(f" Total files: {len(per_file)}")
print(f" Files with annotations: {(per_file['NUM_ANNOTATIONS'] > 0).sum()}")
print(f" Average annotations per file: {per_file['NUM_ANNOTATIONS'].mean():.2f}")
print(f" Max annotations: {per_file['NUM_ANNOTATIONS'].max()}")
print(f" Min annotations: {per_file['NUM_ANNOTATIONS'].min()}")
# Files without annotations
empty = per_file[per_file['NUM_ANNOTATIONS'] == 0]
if not empty.empty:
print(f"\nWarning: {len(empty)} files have no annotations:")
print(empty['FILENAME'].tolist())
Quality Control Workflow
Verify annotation extraction quality:
import pandas as pd
from nidataset.slices import extract_annotations_dataset
# Extract annotations
extract_annotations_dataset(
nii_folder="masks/",
output_path="qa/annotations/",
view="axial",
saving_mode="case",
extraction_mode="slice",
data_mode="box",
save_stats=True
)
# Check a sample annotation file
sample_csv = "qa/annotations/case_001/axial/case_001_axial_010.csv"
df = pd.read_csv(sample_csv)
print(f"Sample slice has {len(df)} annotations")
print("\nBounding box sizes:")
df['width'] = df['X_MAX'] - df['X_MIN']
df['height'] = df['Y_MAX'] - df['Y_MIN']
print(df[['width', 'height']].describe())
# Identify potential issues
small_boxes = df[(df['width'] < 5) | (df['height'] < 5)]
if not small_boxes.empty:
print(f"\nWarning: {len(small_boxes)} very small annotations detected")
Radius Mode for Analysis
Extract center and radius information for size analysis:
extract_annotations_dataset(
nii_folder="nodule_masks/",
output_path="nodule_analysis/",
view="axial",
saving_mode="case",
extraction_mode="volume",
data_mode="radius",
save_stats=True
)
# Each case gets a single CSV with center coordinates and radii
Typical Workflow
from nidataset.slices import extract_annotations_dataset
import pandas as pd
# 1. Define paths
annotation_folder = "data/segmentation_masks/"
output_folder = "data/extracted_labels/"
# 2. Extract annotations with statistics
extract_annotations_dataset(
nii_folder=annotation_folder,
output_path=output_folder,
view="axial",
saving_mode="case",
extraction_mode="slice",
data_mode="box",
target_size=(512, 512),
save_stats=True
)
# 3. Review statistics
stats = pd.read_csv(f"{output_folder}/axial_annotations_stats.csv")
print(stats.head())
# 4. Use extracted annotations for training
# - Load corresponding images from extract_slices_dataset
# - Create dataloaders with image-annotation pairs
# - Train detection or segmentation models