This work presents Prior Depth Anything, a framework that combines incomplete but precise metric information in depth measurement with relative but complete geometric structures in depth prediction, generating accurate, dense, and detailed metric depth maps for any scene. To this end, we design a coarse-to-fine pipeline to progressively integrate the two complementary depth sources. First, we introduce pixel-level metric alignment and distance-aware weighting to pre-fill diverse metric priors by explicitly using depth prediction. It effectively narrows the domain gap between prior patterns, enhancing generalization across varying scenarios. Second, we develop a conditioned monocular depth estimation (MDE) model to refine the inherent noise of depth priors. By conditioning on the normalized pre-filled prior and prediction, the model further implicitly merges the two complementary depth sources. Our model showcases impressive zero-shot generalization across depth completion, super-resolution, and inpainting over 7 real-world datasets, matching or even surpassing previous task-specific methods. More importantly, it performs well on challenging, unseen mixed priors and enables test-time improvements by switching prediction models, providing a flexible accuracy-efficiency trade-off while evolving with advancements in MDE models
We progressively integrate complementary information from metric measurements (accurate metrics) and relative predictions (completeness and fine details) to produce dense and fine-grained metric depth maps.
Applicable scenarios of current prior-based monocular depth estimation models. SfM: sparse matching points from SfM, LiDAR: sparse LiDAR line patterns, Extreme: extremely sparse points (100 points), Range: missing depth within a specific range, Shape: missing regular-shaped areas, Object: missing depth of an object.
Taking an RGB image(1st colomn), and its corresponding Ground-Truth measurement (2nd column), our model can effectively correct the noise and in labels, fill the vacant areas and output the depth map (3rd column) that is detailed, complete, and metrically precise. These "beyond ground truth" cases highlight the potential of our approach in addressing the inherent noise in depth measurement techniques.
Considering RGB images, any form of depth prior Dprior, and relative prediction Dpred from a frozen MDE model, coarse metric alignment first explicitly combines the metric data in Dprior and geometry structure in Dpred to fill the incomplete areas in Dprior. Fine structure refinement implicitly merges the complementary information to produce the final metric depth map.
In real-world scenarios, different depth priors (e.g. sparse points, low-resolution, etc.) often coexist. Handing these mixed priors is vital for practical applications. We show the quantitative results of our model on the mixed priors to simulate more complex scenarios. All results are reported in AbsRel↓. "S": extremely sparse points (100 points), "L": "x16" in low-esolution, "M": missing square areas of 160x160. We highlight Best, second best results. "Depth Pro+ViT-B" indicates the frozen MDE and conditioned MDE. DAv2-B: Depth Anything v2 ViT-B, SDv2: Stable Diffusion v2.
@misc{wang2025depthprior, title={Depth Anything with Any Prior}, author={Zehan Wang and Siyu Chen and Lihe Yang and Jialei Wang and Ziang Zhang and Hengshuang Zhao and Zhou Zhao}, year={2025}, eprint={2505.10565}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2505.10565}, }