Depth Anything with Any Prior

Zehan Wang1*      Siyu Chen1*      Lihe Yang2      
Jialei Wang1      Ziang Zhang1      Hengshuang Zhao2 Zhou Zhao     
1ZJU                2HKU
* Equal Contribution       

Abstract

This work presents Prior Depth Anything, a framework that combines incomplete but precise metric information in depth measurement with relative but complete geometric structures in depth prediction, generating accurate, dense, and detailed metric depth maps for any scene. To this end, we design a coarse-to-fine pipeline to progressively integrate the two complementary depth sources. First, we introduce pixel-level metric alignment and distance-aware weighting to pre-fill diverse metric priors by explicitly using depth prediction. It effectively narrows the domain gap between prior patterns, enhancing generalization across varying scenarios. Second, we develop a conditioned monocular depth estimation (MDE) model to refine the inherent noise of depth priors. By conditioning on the normalized pre-filled prior and prediction, the model further implicitly merges the two complementary depth sources. Our model showcases impressive zero-shot generalization across depth completion, super-resolution, and inpainting over 7 real-world datasets, matching or even surpassing previous task-specific methods. More importantly, it performs well on challenging, unseen mixed priors and enables test-time improvements by switching prediction models, providing a flexible accuracy-efficiency trade-off while evolving with advancements in MDE models

Core Motivation

We progressively integrate complementary information from metric measurements (accurate metrics) and relative predictions (completeness and fine details) to produce dense and fine-grained metric depth maps.

motivation

Applicable Scenarios

Applicable scenarios of current prior-based monocular depth estimation models. SfM: sparse matching points from SfM, LiDAR: sparse LiDAR line patterns, Extreme: extremely sparse points (100 points), Range: missing depth within a specific range, Shape: missing regular-shaped areas, Object: missing depth of an object.

pipeline

Refinement of Real-World Metric-Mesurements

Taking an RGB image(1st colomn), and its corresponding Ground-Truth measurement (2nd column), our model can effectively correct the noise and in labels, fill the vacant areas and output the depth map (3rd column) that is detailed, complete, and metrically precise. These "beyond ground truth" cases highlight the potential of our approach in addressing the inherent noise in depth measurement techniques.


one line
bedroom_0090_rgb bedroom_0090_gt bedroom_0090_pred
computer_lab_0002_rgb computer_lab_0002_gt computer_lab_0002_pred
kitchen_0029_rgb kitchen_0029_gt kitchen_0029_pred
scene0111_02_rgb scene0111_02_gt scene0111_02_pred
scene0803_00_rgb scene0803_00_gt scene0803_00_pred
study_room_0004_rgb study_room_0004_gt study_room_0004_pred

Framework

Considering RGB images, any form of depth prior Dprior, and relative prediction Dpred from a frozen MDE model, coarse metric alignment first explicitly combines the metric data in Dprior and geometry structure in Dpred to fill the incomplete areas in Dprior. Fine structure refinement implicitly merges the complementary information to produce the final metric depth map.

pipeline

Zero-Shot Depth Estimation with Mixed Prior

In real-world scenarios, different depth priors (e.g. sparse points, low-resolution, etc.) often coexist. Handing these mixed priors is vital for practical applications. We show the quantitative results of our model on the mixed priors to simulate more complex scenarios. All results are reported in AbsRel↓. "S": extremely sparse points (100 points), "L": "x16" in low-esolution, "M": missing square areas of 160x160. We highlight Best, second best results. "Depth Pro+ViT-B" indicates the frozen MDE and conditioned MDE. DAv2-B: Depth Anything v2 ViT-B, SDv2: Stable Diffusion v2.

mixed_area

Citation

@misc{wang2025depthprior,
      title={Depth Anything with Any Prior}, 
      author={Zehan Wang and Siyu Chen and Lihe Yang and Jialei Wang and Ziang Zhang and Hengshuang Zhao and Zhou Zhao},
      year={2025},
      eprint={2505.10565},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.10565}, 
}