`dataset_images_info`

Extract comprehensive metadata from all NIfTI volumes in a folder and save the summary as a CSV file for dataset analysis and quality control.

dataset_images_info(
    nii_folder: str,
    output_path: str
) -> None

Overview

This function generates a detailed metadata summary for medical imaging datasets. It extracts key properties from each NIfTI file including spatial dimensions, voxel sizes, intensity statistics, and tissue volume measurements. The resulting CSV is useful for:

Dataset quality control and validation
Identifying outliers or corrupted files
Understanding data distribution before preprocessing
Documentation and reproducibility
Comparing multiple datasets

All metadata is saved to dataset_images_info.csv in the specified output directory.

Parameters

Name	Type	Default	Description
`nii_folder`	`str`	required	Path to the directory containing NIfTI volumes in `.nii.gz` format.
`output_path`	`str`	required	Directory where the CSV file will be saved. Created automatically if it doesn’t exist.

Returns

None – The function saves results to disk.

Output File

CSV Structure

The function creates dataset_images_info.csv with the following columns:

Column	Description
`FILENAME`	Name of the NIfTI file
`SHAPE (X, Y, Z)`	Image dimensions in voxels as `(width, height, depth)`
`VOXEL SIZE (mm)`	Physical size of each voxel as `(x_size, y_size, z_size)`
`DATA TYPE`	NumPy data type (e.g., `float64`, `int16`, `uint8`)
`MIN VALUE`	Minimum intensity value in the volume
`MAX VALUE`	Maximum intensity value in the volume
`BRAIN VOXELS`	Count of non-zero voxels (tissue volume)
`BRAIN VOLUME (mm³)`	Physical volume of non-zero voxels in cubic millimeters
`BBOX MIN (X, Y, Z)`	Minimum coordinates of the bounding box around non-zero voxels
`BBOX MAX (X, Y, Z)`	Maximum coordinates of the bounding box around non-zero voxels

Example CSV Output

FILENAME,SHAPE (X, Y, Z),VOXEL SIZE (mm),DATA TYPE,MIN VALUE,MAX VALUE,BRAIN VOXELS,BRAIN VOLUME (mm³),BBOX MIN (X, Y, Z),BBOX MAX (X, Y, Z)
scan_001.nii.gz,"(512, 512, 300)","(0.5, 0.5, 1.0)",float64,0.0,4095.0,45678900,22839450.0,"[50, 60, 20]","[462, 452, 280]"
scan_002.nii.gz,"(256, 256, 150)","(1.0, 1.0, 1.5)",float32,-1024.0,3071.0,12456780,18685170.0,"[30, 40, 15]","[226, 216, 135]"

Metadata Details

Shape and Voxel Size

Shape represents the number of voxels in each dimension
Voxel size indicates the physical spacing between voxels in millimeters
Together, they determine the physical dimensions: physical_size = shape × voxel_size

Intensity Range

MIN VALUE and MAX VALUE show the intensity range in the volume
Useful for detecting preprocessing issues or unexpected value ranges
Different modalities have different typical ranges (e.g., Hounsfield units for CT)

Tissue Volume Metrics

BRAIN VOXELS: Count of non-zero voxels, representing tissue or contrast-enhanced regions
BRAIN VOLUME: Physical volume calculated as non_zero_count × voxel_x × voxel_y × voxel_z
Note: “BRAIN” terminology is used generically for non-zero regions, applicable to any tissue type

Bounding Box

Minimum and maximum coordinates defining the smallest box containing all non-zero voxels
Useful for automatic cropping and region of interest extraction
Coordinates are in voxel space (0-indexed)

Exceptions

Exception	Condition
`FileNotFoundError`	The `nii_folder` does not exist or contains no `.nii.gz` files

Usage Notes

Input Format: Only .nii.gz files are processed
Progress Display: Shows a progress bar during metadata extraction
Error Handling: Files that fail to process are skipped with error messages
Output Directory: Automatically created if it doesn’t exist
Non-zero Definition: Voxels with intensity > 0 are considered tissue

Examples

Basic Usage

Extract metadata for all volumes in a folder:

from nidataset.utility import dataset_images_info

dataset_images_info(
    nii_folder="dataset/scans/",
    output_path="dataset/metadata/"
)
# Creates: dataset/metadata/dataset_images_info.csv

Quality Control Analysis

Load and analyze the metadata to identify outliers:

import pandas as pd
from nidataset.utility import dataset_images_info

# Extract metadata
dataset_images_info(
    nii_folder="data/raw_scans/",
    output_path="data/qa/"
)

# Load and analyze
df = pd.read_csv("data/qa/dataset_images_info.csv")

# Check for dimension consistency
print("Unique shapes in dataset:")
print(df['SHAPE (X, Y, Z)'].value_counts())

# Check for unusual voxel sizes
print("\nVoxel size distribution:")
print(df['VOXEL SIZE (mm)'].value_counts())

# Identify volumes with unusual intensity ranges
print("\nIntensity range summary:")
print(df[['MIN VALUE', 'MAX VALUE']].describe())

# Find potentially corrupted files (very small volumes)
min_expected_volume = 100000  # mm³
suspicious = df[df['BRAIN VOLUME (mm³)'] < min_expected_volume]
if not suspicious.empty:
    print(f"\nWarning: {len(suspicious)} files with unusually small volumes:")
    print(suspicious[['FILENAME', 'BRAIN VOLUME (mm³)']])

Dataset Comparison

Compare metadata across multiple datasets:

import pandas as pd
from nidataset.utility import dataset_images_info

# Extract metadata for multiple datasets
datasets = {
    'Training': 'data/train/',
    'Validation': 'data/val/',
    'Testing': 'data/test/'
}

for name, folder in datasets.items():
    dataset_images_info(
        nii_folder=folder,
        output_path=f"metadata/{name}/"
    )

# Compare datasets
for name in datasets.keys():
    df = pd.read_csv(f"metadata/{name}/dataset_images_info.csv")
    print(f"\n{name} Dataset:")
    print(f"  Files: {len(df)}")
    print(f"  Avg volume: {df['BRAIN VOLUME (mm³)'].mean():.0f} mm³")
    print(f"  Shape consistency: {df['SHAPE (X, Y, Z)'].nunique()} unique shapes")

Preprocessing Planning

Use metadata to determine appropriate preprocessing parameters:

import pandas as pd
import ast
from nidataset.utility import dataset_images_info

# Extract metadata
dataset_images_info(
    nii_folder="data/original/",
    output_path="data/analysis/"
)

# Analyze bounding boxes to determine crop size
df = pd.read_csv("data/analysis/dataset_images_info.csv")

# Calculate bounding box dimensions
bbox_sizes = []
for idx, row in df.iterrows():
    bbox_min = ast.literal_eval(row['BBOX MIN (X, Y, Z)'])
    bbox_max = ast.literal_eval(row['BBOX MAX (X, Y, Z)'])
    size = [bbox_max[i] - bbox_min[i] for i in range(3)]
    bbox_sizes.append(size)

bbox_df = pd.DataFrame(bbox_sizes, columns=['X', 'Y', 'Z'])

print("Bounding box size statistics:")
print(bbox_df.describe())

# Recommend target shape (95th percentile)
recommended_shape = tuple(bbox_df.quantile(0.95).astype(int).values)
print(f"\nRecommended target shape for crop_and_pad: {recommended_shape}")

Data Type Verification

Check if data types are consistent across the dataset:

import pandas as pd
from nidataset.utility import dataset_images_info

dataset_images_info(
    nii_folder="data/scans/",
    output_path="data/info/"
)

df = pd.read_csv("data/info/dataset_images_info.csv")

# Check data type consistency
print("Data types in dataset:")
print(df['DATA TYPE'].value_counts())

# Identify files with unexpected data types
expected_dtype = 'float64'
unexpected = df[df['DATA TYPE'] != expected_dtype]
if not unexpected.empty:
    print(f"\nWarning: {len(unexpected)} files with unexpected data type:")
    print(unexpected[['FILENAME', 'DATA TYPE']])

Export Summary Report

Generate a human-readable summary report:

import pandas as pd
from nidataset.utility import dataset_images_info

# Extract metadata
dataset_images_info(
    nii_folder="dataset/images/",
    output_path="dataset/reports/"
)

# Load and create summary
df = pd.read_csv("dataset/reports/dataset_images_info.csv")

summary = f"""
Dataset Summary Report
=====================

Total Files: {len(df)}

Dimensions:
  - Shapes: {df['SHAPE (X, Y, Z)'].nunique()} unique
  - Most common: {df['SHAPE (X, Y, Z)'].mode()[0]}

Voxel Spacing:
  - Voxel sizes: {df['VOXEL SIZE (mm)'].nunique()} unique
  - Most common: {df['VOXEL SIZE (mm)'].mode()[0]}

Intensity:
  - Global min: {df['MIN VALUE'].min()}
  - Global max: {df['MAX VALUE'].max()}

Volume:
  - Mean brain volume: {df['BRAIN VOLUME (mm³)'].mean():.0f} mm³
  - Std brain volume: {df['BRAIN VOLUME (mm³)'].std():.0f} mm³
  - Range: [{df['BRAIN VOLUME (mm³)'].min():.0f}, {df['BRAIN VOLUME (mm³)'].max():.0f}] mm³
"""

print(summary)

# Save report
with open("dataset/reports/summary.txt", "w") as f:
    f.write(summary)

Typical Workflow

from nidataset.utility import dataset_images_info
import pandas as pd

# 1. Extract metadata for your dataset
dataset_images_info(
    nii_folder="data/medical_scans/",
    output_path="data/metadata/"
)

# 2. Load the results
df = pd.read_csv("data/metadata/dataset_images_info.csv")

# 3. Perform quality checks
print(f"Dataset contains {len(df)} volumes")
print(f"Dimension consistency: {df['SHAPE (X, Y, Z)'].nunique()} unique shapes")

# 4. Use metadata to inform preprocessing decisions
# - Determine appropriate crop sizes
# - Identify files needing special handling
# - Verify data type consistency
# - Check for outliers or corrupted files