X Tutup
Skip to content

xiuqhou/DAPE

Repository files navigation

DAPE

By Xiuquan Hou, Meiqin Liu, Senlin Zhang, Shaoyi Du.

This repo is the official implementation of DAPE: Harmonizing Content-Position Encoding for Versatile Dense Visual Prediction accepted to AAAI2026 (score 76665).

💖 If our DAPE is helpful to your researches or projects, please star this repository. Thanks! 🤗

Feature

  • Harmonized Sampling: Strictly aligning content and position distributions for superior Transformer performance.
  • Memory-Efficient Training: Reducing VRAM usage via low-rank positional encoder.
  • Unified Architecture: One architecture for detection and all-in-one segmentation (semantic/instance/panoptic).

Update

  • [2026-01-13] Code for semantic segmentation and panoptic segmentation is available now!
  • [2026-01-07] We release the configs and weights for large relation ranks, which further increases the performance.
  • [2026-01-07] Code for object detection and instance segmentation is available now!
  • [2024-01-06] The pretrained weights for DAPE are available here!
  • [2025-11-08] DAPE is accepted to AAAI2026.

Model ZOO

Object Detection

COCO

Model Backbone Epoch Download mAP AP50 AP75 APS APM APL
DAPE ResNet50 12 config / checkpoint 51.8 69.7 56.5 36.0 55.5 66.0
DAPEr=64 ResNet50 12 config / checkpoint 51.9 69.5 56.5 35.7 55.7 66.4
DAPEr=128 ResNet50 12 config / checkpoint 52.0 69.7 56.8 36.3 55.5 66.1

Instance Segmentation

COCO

Model Backbone Epoch Download mAPm mAPb AP50m AP75m APSm APMm APLm
Mask-DAPE ResNet50 12 config / checkpoint 44.3 50.6 66.2 47.7 23.9 47.5 64.1

The superscripts m and b represent the results for mask-style IoU and box-style IoU respectively.

Cityscapes

Model Backbone Iteration Download AP AP50 person rider car trunk bus train motocycle bicycle
Mask-DAPE ResNet50 90k config / checkpoint 38.2 62.6 35.0 29.0 55.6 39.2 59.3 43.6 21.7 22.5

Get started

1. Installation
  1. Clone the repository:
```shell
git clone https://github.com/xiuqhou/DAPE
cd DAPE
```
  1. Install PyTorch and Torchvision following the instruction on https://pytorch.org/get-started/locally/. We provide the version used for our experiments below. Other versions may also work.
```shell
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126
```
  1. Install other requirements:
```shell
pip install -r requirements.txt
```
2. Prepare datasets

COCO

For instance segmentation and object detection, download instance files train2017, val2017, instances_train2017.json, instances_val2017.json, and panoptic files panoptic_train2017, panoptic_val2017, panoptic_train2017.json, panoptic_val2017.json from https://cocodataset.org/.

coco/
  ├── train2017/
  ├── val2017/
  ├── panoptic_train2017/
  ├── panoptic_val2017/
  └── annotations/
         ├── instances_train2017.json
         ├── instances_val2017.json
         ├── panoptic_train2017.json
         └── panoptic_val2017.json

LVIS

LVIS share the same samples with COCO, you only need to download annotations lvis_v1_train.json, lvis_v1_val.json from https://www.lvisdataset.org and lvis_v1_minival_inserted_image_name.json from https://huggingface.co/GLIPModel/GLIP/tree/main. Put them into the annotations subdirectory of the COCO dataset, as follows:

coco/
  ├── ... 
  └── annotations/
         ├── ... 
         ├── lvis_v1_train.json
         ├── lvis_v1_val.json
         └── lvis_v1_minival_inserted_image_name.json

Cityscapes

Download leftImg8bit_trainvaltest.zip and gtFine_trainvaltest.zip from https://www.cityscapes-dataset.com/downloads/.

 cityscapes/
  ├── gtFine/
  │     ├── train/
  │     ├── val/
  │     └── test/
  │ 
  └── leftImg8bit/
        ├── train/
        ├── val/
        └── test/

The final datasets should be organized as follows:

data/
  ├─ coco/
  │  ├── train2017/
  │  ├── val2017/
  │  ├── panoptic_train2017/
  │  ├── panoptic_val2017/
  │  └── annotations/
  │         ├── instances_train2017.json
  │         ├── instances_val2017.json
  │         ├── lvis_v1_train.json
  │         ├── lvis_v1_minival_inserted_image_name.json
  │         ├── lvis_v1_val.json
  │         ├── panoptic_train2017.json
  │         └── panoptic_val2017.json
  │
  └─ cityscapes/
      ├── gtFine/
      └── leftImg8bit/
3. Train a model

Use CUDA_VISIBLE_DEVICES to specify GPU/GPUs and run the following script to start training. If not specified, the script will use all available GPUs on the node to train. Replace <config_file> with the path to the config file.

CUDA_VISIBLE_DEVICES=0 accelerate launch train.py <config_file>   # train with 1 GPU
CUDA_VISIBLE_DEVICES=0,1 accelerate launch train.py <config_file>  # train with 2 GPUs

# example:
# CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch train.py configs/dape/object_detection/coco/dape_r50_rank16_coco_1x.py
4. Evaluate pretrained models

To evaluate a model with one or more GPUs, specify CUDA_VISIBLE_DEVICES, <config_file>, <checkpoint_file>.

CUDA_VISIBLE_DEVICES=<gpu_ids> accelerate launch test.py <config_file> --checkpoint <checkpoint_file>

# example:
# CUDA_VISIBLE_DEVICES=0,1,2,3 \
#   accelerate launch test.py \
#   configs/dape/object_detection/coco/dape_r50_rank16_coco_1x.py \
#   --checkpoint https://github.com/xiuqhou/DAPE/releases/download/v1.0.0/dape_r50_rank16_coco_1x.pth

License

DAPE is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Bibtex

If you find out work helpful, please consider citing:

@article{hou2025dape,
  title={DAPE: Harmonizing Content-Position Encoding for Versatile Dense Visual Prediction},
  author={Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Du, Shaoyi},
  journal={Proceedings of the AAAI conference on artificial intelligence},
  year={2025}
}
If you find out work helpful, please also consider citing our previous papers related to this work:
@inproceedings{hou2024relation,
  title={Relation DETR: Exploring Explicit Position Relation Prior for Object Detection},
  author={Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong and Lan, Xuguang},
  booktitle={European conference on computer vision},
  year={2024},
  organization={Springer}
}

@InProceedings{Hou_2024_CVPR,
    author    = {Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong},
    title     = {Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {17574-17583}
}

Acknowledge

Many thanks to these excellent open-source projects.

About

[AAAI2026] Official implementation of the paper "DAPE: Harmonizing Content-Position Encoding for Versatile Dense Visual Prediction"

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

X Tutup