X Tutup
Skip to content

Intellindust-AI-Lab/SUREPlus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

From Misclassifications to Outliers: Joint Reliability Assessment in Classification

arXiv Project Page License Python PyTorch

Official PyTorch implementation of the paper "From Misclassifications to Outliers: Joint Reliability Assessment in Classification".


πŸ“‹ Table of Contents


🎯 Overview

Existing approaches typically treat misclassification detection and OOD detection as separate problems. This repository provides a unified framework for reliability assessment that:

  • Jointly addresses selective risk and OOD detection within a single pipeline
  • Supports both training and evaluation
  • Implements our proposed DS Metrics for reliability analysis

The framework is compatible with ResNet-18 and DINOv3 (ViT-L/16) and integrates with OpenOOD for standardized benchmarking.


πŸ” Motivation: The Limitations of Single Score

Existing approaches typically treat misclassification detection and out-of-distribution (OOD) detection as separate problems. They optimize for either ID accuracy or OOD detection, but not both jointly. This leads to:

  • Poor trade-offs between classification accuracy and reliability
  • Incomplete evaluation of model reliability
  • Suboptimal performance in real-world deployment scenarios

Comparison between Cross-Entropy (CE) and CutMix training strategies across multiple metrics. While CutMix improves OOD detection (OOD AUROC), it shows lower performance in joint reliability assessment (DS-F1) compared to CE.

3D visualization of DS-F1 scores as a function of ID threshold (Ο„_ID) and OOD threshold (Ο„_OOD). The surface shows that CE achieves a higher maximum DS-F1 score (0.565) compared to CutMix (0.539).

πŸ“Š Double Scoring Metrics

We propose Double Scoring (DS) metrics β€” including DS-F1 and DS-AURC β€” that simultaneously evaluate a model's ability to identify misclassifications and detect OOD samples within a unified framework.

DS-F1 (Double Scoring F1 Score)

DS-F1 extends the traditional F1 score to jointly consider both misclassification detection and OOD detection:

DS-AURC (Double Scoring Area Under Risk-Coverage Curve)

DS-AURC extends the selective classification risk-coverage framework to incorporate OOD detection:

More details see paper.


πŸš€ SURE+ Training Strategy

We propose SURE+, a comprehensive training strategy that combines four key components to achieve state-of-the-art reliability performance:

Component Description
RegMixup Regularized mixup augmentation for improved calibration and robustness
RegPixMix Regularized pixel-level mixup that preserves semantic information
F-SAM Fisher information guided Sharpness-Aware Minimization
EMA (ReBN) Exponential Moving Average with Re-normalized Batch Normalization

πŸ› οΈ Installation

Prerequisites

  • Python >= 3.10
  • PyTorch >= 2.0
  • CUDA >= 11.4

Setup

# Clone the repository
git clone https://github.com/Intellindust-AI-Lab/SUREPlus.git
cd SUREPlus

# Create virtual environment
conda create -n sure_plus python=3.10
conda activate sure_plus

# Install dependencies
pip install -r requirements.txt

# For CUDA 12.4 (recommended)
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124

πŸ“ Data Preparation

For instructions on downloading and preparing the dataset, please refer to the official guide from OpenOOD.

PixMix Dataset

Download PixMix augmentation images.

Training Dataset Structure

Organize your datasets in ImageFolder format:

/path/to/dataset/
β”œβ”€β”€ train/
β”‚   β”œβ”€β”€ class_001/
β”‚   β”‚   β”œβ”€β”€ img_001.jpg
β”‚   β”‚   └── ...
β”‚   └── ...
└── val/
    β”œβ”€β”€ class_001/
    └── ...

πŸ”₯ Pretrained Models

We provide pretrained checkpoints for SURE+

DINOv3 Setup

For DINOv3, download the official pretrained weights from Meta AI:

# Set paths in your training scripts
--dinov3-path /path/to/dinov3_vitl16.pth \
--dinov3-repo /path/to/dinov3

πŸš€ Training

Training scripts are located in run/train/. We support both single-GPU and multi-GPU (DDP) training.

Quick Start

ResNet-18 on CIFAR-100 (SURE+)

python main.py \
  --gpu 0 \
  --lr 0.05 \
  --batch-size 128 \
  --epochs 200 \
  --model-name resnet18 \
  --optim-name fsam \
  --pixmix-weight 1.0 \
  --regmixup-weight 1.0 \
  --rebn \
  --pixmix-path ./PixMixSet/fractals_and_fvis/first_layers_resized256_onevis/ \
  --save-dir ./checkpoints/ResNet18-Cifar100/SURE+ \
  Cifar100

Or use the provided script:

bash run/train/resnet18/SURE+.sh

DINOv3-L/16 on ImageNet-1K (SURE+)

python main.py \
  --gpu 0 1 2 3 4 5 6 \
  --lr 1e-5 \
  --weight-decay 5e-6 \
  --batch-size 64 \
  --epochs 20 \
  --model-name dinov3_l16 \
  --optim-name fsam \
  --pixmix-weight 1.0 \
  --mixup-weight 1.0 \
  --mixup-beta 10.0 \
  --rebn \
  --dinov3-repo ./dinov3 \
  --dinov3-path ./dinov3/dinov3_vitl16_pretrain.pth \
  --save-dir ./checkpoints/DinoV3_L16-ImageNet1k/SURE+ \
  ImageNet1k

Or use the provided script:

bash run/train/dinov3/SURE+.sh

πŸ“Š Evaluation

Testing scripts are in run/test/ and are fully compatible with OpenOOD.

Evaluation Workflow

  1. Baseline Evaluation: Save raw logits
  2. Post-processing: Apply various OOD detectors

Supported Post-processors

Post-processors follow the implementations in OpenOOD. In addition, this repository includes the SIRC post-processor. By default, MSP is used as the ID confidence score.

Quick Evaluation

# Evaluate ResNet-18 on CIFAR-100
bash run/test/resnet18/test.sh

Custom Evaluation

export CUDA_VISIBLE_DEVICES=0

PYTHONPATH='.':$PYTHONPATH \
python openood/main.py \
  --config openood/configs/datasets/cifar100/cifar100.yml \
  openood/configs/datasets/cifar100/cifar100_ood.yml \
  openood/configs/networks/resnet18_32x32.yml \
  openood/configs/pipelines/test/test_ood.yml \
  openood/configs/preprocessors/base_preprocessor.yml \
  openood/configs/postprocessors/msp.yml \
  --network.checkpoint "./checkpoints/ResNet18-Cifar100/SURE+/best_1.pth" \
  --network.name resnet18_32x32 \
  --output_dir "./results/SURE+"

Cosine Classifier Models

For CSC models, use the appropriate network config:

--network.name resnet18_32x32_csc  # For ResNet-18
--network.name dinov3_l_csc        # For DINOv3

πŸ† Results

ResNet-18 on CIFAR-100

DINOv3 on ImageNet-1K

More results available in the paper.


πŸ“ Project Structure

SURE-plus/
β”œβ”€β”€ πŸ“„ main.py                  # Main training entry point
β”œβ”€β”€ πŸ“„ train.py                 # Training loop implementation
β”œβ”€β”€ πŸ“ model/                   # Model definitions
β”‚   β”œβ”€β”€ resnet18.py            # ResNet-18 backbone
β”‚   β”œβ”€β”€ classifier.py          # Cosine classifier
β”‚   └── get_model.py           # Model factory
β”œβ”€β”€ πŸ“ data/                    # Data loading utilities
β”‚   β”œβ”€β”€ dataset.py             # Dataset and DataLoader
β”‚   └── sampler.py             # Custom samplers
β”œβ”€β”€ πŸ“ utils/                   # Utility functions
β”‚   β”œβ”€β”€ option.py              # Argument parser
β”‚   β”œβ”€β”€ optim.py               # Optimizers & schedulers
β”‚   β”œβ”€β”€ ema.py                 # Exponential moving average
β”‚   β”œβ”€β”€ sam.py / fsam.py       # SAM implementations
β”‚   β”œβ”€β”€ valid.py               # Validation metrics
β”‚   └── utils.py               # Helper functions
β”œβ”€β”€ πŸ“ openood/                 # OpenOOD integration
β”‚   β”œβ”€β”€ configs/               # Dataset & model configs
β”‚   └── main.py                # OpenOOD evaluation
β”œβ”€β”€ πŸ“ run/                     # Training & testing scripts
β”‚   β”œβ”€β”€ train/                 # Training scripts
β”‚   └── test/                  # Testing scripts
β”œβ”€β”€ πŸ“„ requirements.txt         # Python dependencies
└── πŸ“„ README.md               # This file

πŸ“ Citation

If you find this work useful, please consider citing:

@article{li2026from,
  title={From Misclassifications to Outliers: Joint Reliability Assessment in Classification},
  author={Li, Yang and Sha, Youyang and Wang, Yinzhi and Hospedales, Timothy and Hu, Shell Xu and Shen, Xi and Yu, Xuanlong},
  journal={arXiv preprint arXiv:2603.03903},
  year={2026}
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgements

This work builds upon the following excellent open-source projects:

  • OpenOOD - OOD detection benchmark
  • DINOv3 - Self-supervised vision transformer

We thank the authors for sharing their high-quality code and pretrained models.


⭐ Star us on GitHub β€” it motivates us a lot!

About

From Misclassification to Outliers: Joint Reliability Assessment in Classification

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Β 
Β 
Β 

Contributors

X Tutup