register_CTA_dataset
Batch register all medical imaging volumes in a dataset folder to a reference template using intensity-based registration with mutual information.
register_CTA_dataset(
nii_folder: str,
mask_folder: str,
template_path: str,
template_mask_path: str,
output_path: str,
saving_mode: str = "case",
cleanup: bool = False,
debug: bool = False,
number_histogram_bins: int = 128,
learning_rate: float = 0.0001,
number_iterations: int = 2000,
initialization_strategy: int = sitk.CenteredTransformInitializerFilter.MOMENTS,
sigma_first: float = 2.0,
sigma_second: float = 3.0,
metric_sampling_percentage: float = 0.5,
initial_transform = None
) -> None
Overview
This function processes all volumes in a dataset folder by aligning them to a common reference template through image registration. Each volume undergoes:
- Preprocessing: Gaussian filtering to remove noise and outliers
- Mask application: Brain region isolation using provided masks
- Registration: Alignment to template using mutual information metric
- Transform saving: Storing transformation parameters for later use
Registration is essential for:
- Standardizing spatial orientation across datasets
- Enabling voxel-wise analysis and comparisons
- Creating anatomically aligned datasets for machine learning
- Normalizing scan positions and orientations
- Building anatomical atlases
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
nii_folder | str | required | Path to the directory containing input volumes in .nii.gz format. |
mask_folder | str | required | Path to the directory containing brain masks. Must have matching filenames. |
template_path | str | required | Path to the reference template volume for registration. |
template_mask_path | str | required | Path to the template’s brain mask. |
output_path | str | required | Base directory for all outputs. Structure depends on saving_mode. |
saving_mode | str | "case" | Organization mode: "case" (folder per volume) or "folder" (shared folders). |
cleanup | bool | False | If True, deletes intermediate Gaussian-filtered files after registration. |
debug | bool | False | If True, prints detailed registration information for each volume. |
number_histogram_bins | int | 128 | Number of histogram bins for Mattes Mutual Information metric. Common values: 10, 50, 64, 128. |
learning_rate | float | 0.0001 | Learning rate for Gradient Descent optimizer. Common values: 0.0001-1.0. |
number_iterations | int | 2000 | Maximum number of optimization iterations. Common values: 500-5000. |
initialization_strategy | int | MOMENTS | Strategy for initializing transformation: MOMENTS (center of mass) or GEOMETRY (center/orientation). |
sigma_first | float | 2.0 | Standard deviation for the first Gaussian smoothing filter. |
sigma_second | float | 3.0 | Standard deviation for the second Gaussian smoothing filter. |
metric_sampling_percentage | float | 0.5 | Percentage of voxels to sample for metric evaluation (0.0-1.0). Default: 0.5 (50%). |
initial_transform | None | None | Initial transformation object. If None, defaults to sitk.Euler3DTransform(). |
Returns
None – The function saves registered volumes and transformations to disk.
Output Organization
Saving Modes
The function supports two organizational strategies for output files:
Case Mode (saving_mode="case")
Creates a separate folder for each volume (recommended for dataset organization):
Without cleanup (cleanup=False):
output_path/
├── patient_001/
│ ├── patient_001_registered.nii.gz
│ ├── patient_001_gaussian_filtered.nii.gz ← intermediate file kept
│ └── patient_001_transformation.tfm
├── patient_002/
│ ├── patient_002_registered.nii.gz
│ ├── patient_002_gaussian_filtered.nii.gz ← intermediate file kept
│ └── patient_002_transformation.tfm
With cleanup (cleanup=True):
output_path/
├── patient_001/
│ ├── patient_001_registered.nii.gz
│ └── patient_001_transformation.tfm ← gaussian_filtered.nii.gz deleted
├── patient_002/
│ ├── patient_002_registered.nii.gz
│ └── patient_002_transformation.tfm ← gaussian_filtered.nii.gz deleted
Folder Mode (saving_mode="folder")
Separates images and transformations into dedicated subdirectories. This mode uses temporary directories during processing.
Processing flow:
- For each volume, creates a temporary directory:
output_path/_temp_<PREFIX>/ - Runs registration, generating files in the temporary directory
- Moves
<PREFIX>_registered.nii.gztooutput_path/registered/ - Moves
<PREFIX>_transformation.tfmtooutput_path/transforms/ - If
cleanup=True, deletes the temporary directory - If
cleanup=False, keeps the temporary directory with gaussian-filtered files
Without cleanup (cleanup=False):
output_path/
├── _temp_patient_001/
│ └── patient_001_gaussian_filtered.nii.gz ← intermediate kept in temp
├── _temp_patient_002/
│ └── patient_002_gaussian_filtered.nii.gz ← intermediate kept in temp
├── _temp_patient_003/
│ └── patient_003_gaussian_filtered.nii.gz ← intermediate kept in temp
├── registered/
│ ├── patient_001_registered.nii.gz
│ ├── patient_002_registered.nii.gz
│ └── patient_003_registered.nii.gz
└── transforms/
├── patient_001_transformation.tfm
├── patient_002_transformation.tfm
└── patient_003_transformation.tfm
With cleanup (cleanup=True):
output_path/
├── registered/
│ ├── patient_001_registered.nii.gz
│ ├── patient_002_registered.nii.gz
│ └── patient_003_registered.nii.gz
└── transforms/
├── patient_001_transformation.tfm
├── patient_002_transformation.tfm
└── patient_003_transformation.tfm
Important notes for folder mode:
- Temporary directories (
_temp_<PREFIX>/) are created for each volume during processing - If
cleanup=True, temporary directories are deleted after moving registered images and transformations - If
cleanup=False, temporary directories are kept with their gaussian-filtered intermediate files - This allows you to inspect preprocessing steps or recover gaussian-filtered volumes if needed
Output Files
For each input volume, the function generates:
| File | Description | Kept After Cleanup |
|---|---|---|
<PREFIX>_registered.nii.gz | Volume aligned to template space | Yes |
<PREFIX>_gaussian_filtered.nii.gz | Preprocessed volume used for registration | No (if cleanup=True) |
<PREFIX>_transformation.tfm | Transformation parameters | Yes |
Example: Input scan_042.nii.gz produces:
scan_042_registered.nii.gzscan_042_gaussian_filtered.nii.gz(temporary)scan_042_transformation.tfm
Registration Pipeline
The registration process for each volume:
- Gaussian Filtering
- Smooths volume to reduce noise
- Clips negative and extreme values
- Prepares data for robust registration
- Mask-Based Registration
- Uses mutual information metric
- Constrains optimization to brain regions
- Moment-based or geometry-based initialization
- Gradient descent optimization
- Transform Application
- Applies computed transformation
- Resamples volume to template space
- Preserves intensity characteristics
- Save Outputs
- Registered volume
- Transformation file (for later reuse)
- Optional intermediate files
Registration Method
Metric: Mattes Mutual Information
- Histogram bins: Configurable (default: 128)
- Sampling: Configurable percentage (default: 50%)
- Robust to intensity differences between modalities
- Suitable for mono-modal (same imaging type) registration
- Handles intensity variations across scanners
Optimization: Gradient Descent
- Learning rate: Configurable (default: 0.0001)
- Iterations: Configurable maximum (default: 2000)
- Iteratively refines alignment
- Balances speed and accuracy
- Uses moment-based or geometry-based initialization
Transform: Euler3D (default) or custom
- 6 degrees of freedom (rigid transformation)
- Can be customized via
initial_transformparameter
Masking: Brain region constraint
- Focuses registration on relevant anatomy
- Ignores background and skull
- Improves robustness and accuracy
Exceptions
| Exception | Condition |
|---|---|
FileNotFoundError | The nii_folder does not exist or contains no .nii.gz files |
ValueError | Invalid saving_mode parameter (must be "case" or "folder") |
Usage Notes
- Input Format: Only
.nii.gzfiles are processed - Mask Requirement: Each volume must have a corresponding mask with matching filename in
mask_folder - Template Selection: Choose a representative template from your dataset
- Sequential Processing: Volumes are registered one at a time with a progress bar (tqdm)
- Output Directories: Automatically created if they don’t exist
- Progress Display: Shows real-time progress during batch processing
- Parameter Tuning: All registration parameters can be customized for dataset-specific optimization
Examples
Basic Usage - Case Mode
Register all volumes with per-case organization:
from nidataset.preprocessing import register_CTA_dataset
register_CTA_dataset(
nii_folder="dataset/scans/",
mask_folder="dataset/brain_masks/",
template_path="templates/standard_template.nii.gz",
template_mask_path="templates/standard_mask.nii.gz",
output_path="dataset/registered/",
saving_mode="case"
)
# Creates: dataset/registered/case_001/case_001_registered.nii.gz, ...
Custom Registration Parameters
Fine-tune registration for your dataset:
import SimpleITK as sitk
from nidataset.preprocessing import register_CTA_dataset
register_CTA_dataset(
nii_folder="dataset/scans/",
mask_folder="dataset/masks/",
template_path="template/CTA_template.nii.gz",
template_mask_path="template/CTA_template_mask.nii.gz",
output_path="dataset/registered/",
saving_mode="case",
number_histogram_bins=64,
learning_rate=0.01,
number_iterations=1000,
initialization_strategy=sitk.CenteredTransformInitializerFilter.GEOMETRY,
sigma_first=1.5,
sigma_second=2.5,
metric_sampling_percentage=0.7,
cleanup=True,
debug=True
)
With Cleanup
Remove intermediate files to save space:
register_CTA_dataset(
nii_folder="data/raw_cta/",
mask_folder="data/cta_masks/",
template_path="template/atlas.nii.gz",
template_mask_path="template/atlas_mask.nii.gz",
output_path="data/aligned/",
saving_mode="case",
cleanup=True, # Remove Gaussian-filtered intermediates
debug=True
)
# Only registered volumes and transformations are kept
Folder Mode Organization
Separate images and transformations:
register_CTA_dataset(
nii_folder="scans/",
mask_folder="masks/",
template_path="reference.nii.gz",
template_mask_path="reference_mask.nii.gz",
output_path="results/",
saving_mode="folder",
debug=True
)
# Images in results/registered/, transforms in results/transforms/
Using Custom Initial Transform
Start with an affine transform instead of rigid:
import SimpleITK as sitk
from nidataset.preprocessing import register_CTA_dataset
# Create a custom initial transform
affine_transform = sitk.AffineTransform(3)
register_CTA_dataset(
nii_folder="scans/",
mask_folder="masks/",
template_path="template.nii.gz",
template_mask_path="template_mask.nii.gz",
output_path="registered/",
saving_mode="case",
initial_transform=affine_transform,
number_iterations=3000, # More iterations for affine
debug=True
)
Quality Control Workflow
Register and verify alignment quality:
from nidataset.preprocessing import register_CTA_dataset
import nibabel as nib
import numpy as np
# Register dataset
register_CTA_dataset(
nii_folder="qa/scans/",
mask_folder="qa/masks/",
template_path="qa/template.nii.gz",
template_mask_path="qa/template_mask.nii.gz",
output_path="qa/registered/",
saving_mode="case",
debug=True
)
# Load template and a registered volume
template = nib.load("qa/template.nii.gz")
template_data = template.get_fdata()
registered = nib.load("qa/registered/sample/sample_registered.nii.gz")
registered_data = registered.get_fdata()
# Verify shapes match
print(f"\nQuality Control:")
print(f" Template shape: {template_data.shape}")
print(f" Registered shape: {registered_data.shape}")
print(f" Shapes match: {template_data.shape == registered_data.shape}")
# Check alignment in a specific region
roi = template_data[100:150, 100:150, 50:100]
roi_reg = registered_data[100:150, 100:150, 50:100]
# Calculate correlation as alignment metric
correlation = np.corrcoef(roi.flatten(), roi_reg.flatten())[0, 1]
print(f" ROI correlation: {correlation:.3f}")
print(f" Good alignment: correlation > 0.7")
Creating Custom Template
Select a representative scan as template:
import nibabel as nib
import shutil
import numpy as np
from nidataset.preprocessing import register_CTA_dataset
# Step 1: Select template (e.g., scan with median brain size)
scan_files = ["scan_001.nii.gz", "scan_002.nii.gz", "scan_003.nii.gz"]
volumes = []
for scan in scan_files:
img = nib.load(f"scans/{scan}")
mask = nib.load(f"masks/{scan}")
mask_data = mask.get_fdata()
brain_volume = np.sum(mask_data > 0)
volumes.append((scan, brain_volume))
# Sort by volume and pick median
volumes.sort(key=lambda x: x[1])
template_scan = volumes[len(volumes)//2][0]
print(f"Selected template: {template_scan}")
# Step 2: Copy to template folder
shutil.copy(f"scans/{template_scan}", "templates/custom_template.nii.gz")
shutil.copy(f"masks/{template_scan}", "templates/custom_template_mask.nii.gz")
# Step 3: Register all scans to this template
register_CTA_dataset(
nii_folder="scans/",
mask_folder="masks/",
template_path="templates/custom_template.nii.gz",
template_mask_path="templates/custom_template_mask.nii.gz",
output_path="registered/",
saving_mode="case",
cleanup=True,
debug=True
)
Multi-Center Dataset Harmonization
Align scans from different sites:
from nidataset.preprocessing import register_CTA_dataset
centers = ["center_A", "center_B", "center_C"]
# Use same template for all centers
template = "standard_atlas/template.nii.gz"
template_mask = "standard_atlas/template_mask.nii.gz"
for center in centers:
print(f"\nProcessing {center}...")
register_CTA_dataset(
nii_folder=f"data/{center}/scans/",
mask_folder=f"data/{center}/masks/",
template_path=template,
template_mask_path=template_mask,
output_path=f"harmonized/{center}/",
saving_mode="case",
cleanup=True,
debug=True
)
print("\nAll centers registered to common template")
Applying Saved Transformations
Reuse transformations for other data:
import SimpleITK as sitk
import os
from nidataset.preprocessing import register_CTA_dataset
# Step 1: Register structural scans
register_CTA_dataset(
nii_folder="structural/",
mask_folder="masks/",
template_path="template.nii.gz",
template_mask_path="template_mask.nii.gz",
output_path="registered_structural/",
saving_mode="case",
cleanup=False,
debug=True
)
# Step 2: Apply same transformations to functional scans
transform_folder = "registered_structural/"
functional_folder = "functional/"
for case in os.listdir(transform_folder):
if os.path.isdir(os.path.join(transform_folder, case)):
# Load transformation
tfm_file = f"{transform_folder}/{case}/{case}_transformation.tfm"
transform = sitk.ReadTransform(tfm_file)
# Load functional scan
func_file = f"{functional_folder}/{case}.nii.gz"
func_img = sitk.ReadImage(func_file)
# Apply transformation
template_img = sitk.ReadImage("template.nii.gz")
registered_func = sitk.Resample(
func_img,
template_img,
transform,
sitk.sitkLinear,
0.0
)
# Save
output_file = f"registered_functional/{case}_registered.nii.gz"
os.makedirs("registered_functional/", exist_ok=True)
sitk.WriteImage(registered_func, output_file)
print("Transformations applied to functional scans")
Assessing Registration Quality
Evaluate alignment across dataset:
from nidataset.preprocessing import register_CTA_dataset
import nibabel as nib
import numpy as np
import pandas as pd
import os
# Register dataset
register_CTA_dataset(
nii_folder="scans/",
mask_folder="masks/",
template_path="template.nii.gz",
template_mask_path="template_mask.nii.gz",
output_path="registered/",
saving_mode="case",
debug=True
)
# Load template
template = nib.load("template.nii.gz")
template_data = template.get_fdata()
template_mask = nib.load("template_mask.nii.gz").get_fdata()
# Assess each registered volume
results = []
for case_folder in os.listdir("registered/"):
case_path = os.path.join("registered/", case_folder)
if os.path.isdir(case_path):
reg_file = f"{case_path}/{case_folder}_registered.nii.gz"
reg_img = nib.load(reg_file)
reg_data = reg_img.get_fdata()
# Calculate metrics within brain mask
mask_indices = template_mask > 0
template_roi = template_data[mask_indices]
registered_roi = reg_data[mask_indices]
# Correlation
correlation = np.corrcoef(template_roi, registered_roi)[0, 1]
# Normalized mutual information (approximate)
hist, _, _ = np.histogram2d(
template_roi.flatten(),
registered_roi.flatten(),
bins=50
)
pxy = hist / hist.sum()
px = pxy.sum(axis=1)
py = pxy.sum(axis=0)
# MI calculation
px_py = px[:, None] * py[None, :]
nz = pxy > 0
mi = np.sum(pxy[nz] * np.log(pxy[nz] / px_py[nz]))
results.append({
'case': case_folder,
'correlation': correlation,
'mutual_information': mi
})
# Summary
df = pd.DataFrame(results)
print("\nRegistration Quality Assessment:")
print(f" Mean correlation: {df['correlation'].mean():.3f}")
print(f" Min correlation: {df['correlation'].min():.3f}")
print(f" Cases with correlation < 0.7: {(df['correlation'] < 0.7).sum()}")
# Identify poor registrations
poor_reg = df[df['correlation'] < 0.7]
if not poor_reg.empty:
print(f"\nCases needing review:")
print(poor_reg[['case', 'correlation']])
Creating Dataset Statistics
Track registration parameters:
from nidataset.preprocessing import register_CTA_dataset
import SimpleITK as sitk
import pandas as pd
import numpy as np
import os
# Register with debug to see details
register_CTA_dataset(
nii_folder="data/scans/",
mask_folder="data/masks/",
template_path="template.nii.gz",
template_mask_path="template_mask.nii.gz",
output_path="data/registered/",
saving_mode="case",
cleanup=False,
debug=True
)
# Extract transformation parameters
transform_stats = []
for case in os.listdir("data/registered/"):
if os.path.isdir(f"data/registered/{case}"):
tfm_file = f"data/registered/{case}/{case}_transformation.tfm"
# Read transform
transform = sitk.ReadTransform(tfm_file)
params = transform.GetParameters()
# Extract translation components (for Euler3D: 6 params)
if len(params) >= 6:
rotations = params[0:3]
translations = params[3:6]
transform_stats.append({
'case': case,
'rotation_x': rotations[0],
'rotation_y': rotations[1],
'rotation_z': rotations[2],
'translation_x': translations[0],
'translation_y': translations[1],
'translation_z': translations[2],
'total_translation': np.sqrt(sum(t**2 for t in translations))
})
# Analyze
df = pd.DataFrame(transform_stats)
print("\nRegistration Statistics:")
print(f" Mean translation: {df['total_translation'].mean():.2f} mm")
print(f" Max translation: {df['total_translation'].max():.2f} mm")
print(f" Cases with large shifts (>20mm): {(df['total_translation'] > 20).sum()}")
Comparing Different Registration Settings
Compare standard vs. high-quality registration:
from nidataset.preprocessing import register_CTA_dataset
import nibabel as nib
import numpy as np
import os
# Standard registration
register_CTA_dataset(
nii_folder="comparison/scans/",
mask_folder="comparison/masks/",
template_path="template.nii.gz",
template_mask_path="template_mask.nii.gz",
output_path="comparison/standard/",
saving_mode="case",
cleanup=True
)
# High-quality registration
register_CTA_dataset(
nii_folder="comparison/scans/",
mask_folder="comparison/masks/",
template_path="template.nii.gz",
template_mask_path="template_mask.nii.gz",
output_path="comparison/high_quality/",
saving_mode="case",
number_histogram_bins=256,
learning_rate=0.001,
number_iterations=5000,
metric_sampling_percentage=0.8,
cleanup=True
)
# Compare results
template_data = nib.load("template.nii.gz").get_fdata()
template_mask = nib.load("template_mask.nii.gz").get_fdata()
mask_indices = template_mask > 0
template_roi = template_data[mask_indices]
results = []
for case in os.listdir("comparison/standard/"):
if os.path.isdir(f"comparison/standard/{case}"):
# Load both versions
standard = nib.load(f"comparison/standard/{case}/{case}_registered.nii.gz")
hq = nib.load(f"comparison/high_quality/{case}/{case}_registered.nii.gz")
standard_roi = standard.get_fdata()[mask_indices]
hq_roi = hq.get_fdata()[mask_indices]
# Compare correlations
corr_standard = np.corrcoef(template_roi, standard_roi)[0, 1]
corr_hq = np.corrcoef(template_roi, hq_roi)[0, 1]
results.append({
'case': case,
'standard_corr': corr_standard,
'hq_corr': corr_hq,
'improvement': corr_hq - corr_standard
})
df = pd.DataFrame(results)
print(f"\nAverage improvement: {df['improvement'].mean():.3f}")
print(f"Cases improved: {(df['improvement'] > 0).sum()}/{len(df)}")
Typical Workflow
from nidataset.preprocessing import register_CTA_dataset
import nibabel as nib
# 1. Prepare inputs
scan_folder = "dataset/angiography_scans/"
mask_folder = "dataset/brain_masks/"
template = "atlas/standard_template.nii.gz"
template_mask = "atlas/standard_mask.nii.gz"
# 2. Register all scans to template
register_CTA_dataset(
nii_folder=scan_folder,
mask_folder=mask_folder,
template_path=template,
template_mask_path=template_mask,
output_path="dataset/registered/",
saving_mode="case",
cleanup=True, # Save space
debug=True
)
# 3. Verify a sample result
template_img = nib.load(template)
sample_reg = nib.load("dataset/registered/sample/sample_registered.nii.gz")
print(f"Template shape: {template_img.shape}")
print(f"Registered shape: {sample_reg.shape}")
# 4. Use registered volumes for:
# - Voxel-wise analysis across subjects
# - Creating population atlases
# - Group comparisons
# - Machine learning with spatial features
Parameter Tuning Guide
| Parameter | Effect | Recommendations |
|---|---|---|
number_histogram_bins | Higher values = finer intensity discretization | 64-128 for most cases; 256 for high-contrast images |
learning_rate | Higher values = faster but less stable convergence | 0.0001-0.001 for standard; 0.01+ for fast initial alignment |
number_iterations | More iterations = potential for better alignment | 1000-2000 standard; 3000-5000 for difficult cases |
metric_sampling_percentage | Higher sampling = more accurate but slower | 0.3-0.5 for speed; 0.7-1.0 for accuracy |
sigma_first / sigma_second | Controls smoothing strength | Lower for sharp features; higher for noisy images |
initialization_strategy | MOMENTS vs GEOMETRY | MOMENTS for asymmetric anatomy; GEOMETRY for symmetric |
saving_mode | Organization of outputs | “case” for per-subject analysis; “folder” for simpler structure |