DAPE

By Xiuquan Hou, Meiqin Liu, Senlin Zhang, Shaoyi Du.

This repo is the official implementation of DAPE: Harmonizing Content-Position Encoding for Versatile Dense Visual Prediction accepted to AAAI2026 (score 76665).

💖 If our DAPE is helpful to your researches or projects, please star this repository. Thanks! 🤗

Feature

Harmonized Sampling: Strictly aligning content and position distributions for superior Transformer performance.
Memory-Efficient Training: Reducing VRAM usage via low-rank positional encoder.
Unified Architecture: One architecture for detection and all-in-one segmentation (semantic/instance/panoptic).

Update

[2026-01-13] Code for semantic segmentation and panoptic segmentation is available now!
[2026-01-07] We release the configs and weights for large relation ranks, which further increases the performance.
[2026-01-07] Code for object detection and instance segmentation is available now!
[2024-01-06] The pretrained weights for DAPE are available here!
[2025-11-08] DAPE is accepted to AAAI2026.

Model ZOO

Object Detection

COCO

Model	Backbone	Epoch	Download	mAP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L
DAPE	ResNet50	12	config / checkpoint	51.8	69.7	56.5	36.0	55.5	66.0
DAPE_r=64	ResNet50	12	config / checkpoint	51.9	69.5	56.5	35.7	55.7	66.4
DAPE_r=128	ResNet50	12	config / checkpoint	52.0	69.7	56.8	36.3	55.5	66.1

Instance Segmentation

COCO

Model	Backbone	Epoch	Download	mAP^m	mAP^b	AP₅₀^m	AP₇₅^m	AP_S^m	AP_M^m	AP_L^m
Mask-DAPE	ResNet50	12	config / checkpoint	44.3	50.6	66.2	47.7	23.9	47.5	64.1

The superscripts ^m and ^b represent the results for mask-style IoU and box-style IoU respectively.

Cityscapes

Model	Backbone	Iteration	Download	AP	AP₅₀	person	rider	car	trunk	bus	train	motocycle	bicycle
Mask-DAPE	ResNet50	90k	config / checkpoint	38.2	62.6	35.0	29.0	55.6	39.2	59.3	43.6	21.7	22.5

Get started

1. Installation

Clone the repository:

```shell
git clone https://github.com/xiuqhou/DAPE
cd DAPE
```

Install PyTorch and Torchvision following the instruction on https://pytorch.org/get-started/locally/. We provide the version used for our experiments below. Other versions may also work.

```shell
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126
```

Install other requirements:

```shell
pip install -r requirements.txt
```

2. Prepare datasets

COCO

For instance segmentation and object detection, download instance files train2017, val2017, instances_train2017.json, instances_val2017.json, and panoptic files panoptic_train2017, panoptic_val2017, panoptic_train2017.json, panoptic_val2017.json from https://cocodataset.org/.

coco/
  ├── train2017/
  ├── val2017/
  ├── panoptic_train2017/
  ├── panoptic_val2017/
  └── annotations/
         ├── instances_train2017.json
         ├── instances_val2017.json
         ├── panoptic_train2017.json
         └── panoptic_val2017.json

LVIS

LVIS share the same samples with COCO, you only need to download annotations lvis_v1_train.json, lvis_v1_val.json from https://www.lvisdataset.org and lvis_v1_minival_inserted_image_name.json from https://huggingface.co/GLIPModel/GLIP/tree/main. Put them into the annotations subdirectory of the COCO dataset, as follows:

coco/
  ├── ... 
  └── annotations/
         ├── ... 
         ├── lvis_v1_train.json
         ├── lvis_v1_val.json
         └── lvis_v1_minival_inserted_image_name.json

Cityscapes

Download leftImg8bit_trainvaltest.zip and gtFine_trainvaltest.zip from https://www.cityscapes-dataset.com/downloads/.

 cityscapes/
  ├── gtFine/
  │     ├── train/
  │     ├── val/
  │     └── test/
  │ 
  └── leftImg8bit/
        ├── train/
        ├── val/
        └── test/

The final datasets should be organized as follows:

data/
  ├─ coco/
  │  ├── train2017/
  │  ├── val2017/
  │  ├── panoptic_train2017/
  │  ├── panoptic_val2017/
  │  └── annotations/
  │         ├── instances_train2017.json
  │         ├── instances_val2017.json
  │         ├── lvis_v1_train.json
  │         ├── lvis_v1_minival_inserted_image_name.json
  │         ├── lvis_v1_val.json
  │         ├── panoptic_train2017.json
  │         └── panoptic_val2017.json
  │
  └─ cityscapes/
      ├── gtFine/
      └── leftImg8bit/

3. Train a model

Use CUDA_VISIBLE_DEVICES to specify GPU/GPUs and run the following script to start training. If not specified, the script will use all available GPUs on the node to train. Replace <config_file> with the path to the config file.

CUDA_VISIBLE_DEVICES=0 accelerate launch train.py <config_file>   # train with 1 GPU
CUDA_VISIBLE_DEVICES=0,1 accelerate launch train.py <config_file>  # train with 2 GPUs

# example:
# CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch train.py configs/dape/object_detection/coco/dape_r50_rank16_coco_1x.py

4. Evaluate pretrained models

To evaluate a model with one or more GPUs, specify CUDA_VISIBLE_DEVICES, <config_file>, <checkpoint_file>.

CUDA_VISIBLE_DEVICES=<gpu_ids> accelerate launch test.py <config_file> --checkpoint <checkpoint_file>

# example:
# CUDA_VISIBLE_DEVICES=0,1,2,3 \
#   accelerate launch test.py \
#   configs/dape/object_detection/coco/dape_r50_rank16_coco_1x.py \
#   --checkpoint https://github.com/xiuqhou/DAPE/releases/download/v1.0.0/dape_r50_rank16_coco_1x.pth

License

DAPE is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Bibtex

If you find out work helpful, please consider citing:

@article{hou2025dape,
  title={DAPE: Harmonizing Content-Position Encoding for Versatile Dense Visual Prediction},
  author={Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Du, Shaoyi},
  journal={Proceedings of the AAAI conference on artificial intelligence},
  year={2025}
}

If you find out work helpful, please also consider citing our previous papers related to this work:

@inproceedings{hou2024relation,
  title={Relation DETR: Exploring Explicit Position Relation Prior for Object Detection},
  author={Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong and Lan, Xuguang},
  booktitle={European conference on computer vision},
  year={2024},
  organization={Springer}
}

@InProceedings{Hou_2024_CVPR,
    author    = {Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong},
    title     = {Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {17574-17583}
}

Acknowledge

Many thanks to these excellent open-source projects.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
configs		configs
datasets		datasets
evaluation		evaluation
images		images
models		models
optimizer		optimizer
transforms		transforms
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DAPE

Feature

Update

Model ZOO

Object Detection

COCO

Instance Segmentation

COCO

Cityscapes

Get started

COCO

LVIS

Cityscapes

License

Bibtex

Acknowledge

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DAPE

Feature

Update

Model ZOO

Object Detection

COCO

Instance Segmentation

COCO

Cityscapes

Get started

COCO

LVIS

Cityscapes

License

Bibtex

Acknowledge

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages