XDen-1K: A Density Field Dataset of Real-World Objects


Jingxuan Zhang*, Tianqi Yu*, Yatu Zhang*, Jinze Wu, Kaixin Yao, Jingyang Liu,
Yuyao Zhang, Jiayuan Gu, Jingyi Yu
ShanghaiTech University

*Equal Contribution, Corresponding Authors

XDen-1K teaser

We present XDen-1K, a multi-modal dataset of real-world objects, featuring paired biplanar X-ray scans and reconstructed density fields.

Abstract

A deep understanding of the physical world is a central goal for embodied AI and realistic simulation. While current models excel at capturing an object's surface geometry and appearance, they largely neglect its internal physical properties. This omission is critical, as properties like volumetric density are fundamental for predicting an object's center of mass, stability, and interaction dynamics in applications ranging from robotic manipulation to physical simulation. The primary bottleneck has been the absence of large-scale, real-world data. To bridge this gap, we introduce XDen-1K, the first large-scale, multi-modal dataset designed for real-world physical property estimation, with a particular focus on volumetric density. The core of this dataset consists of 1,000 real-world objects across 148 categories, for which we provide comprehensive multi-modal data, including a high-resolution 3D geometric model with part-level annotations and a corresponding set of real-world biplanar X-ray scans. Building upon this data, we introduce a novel optimization framework that recovers a high-fidelity volumetric density field of each object from its sparse X-ray views. To further enhance the dataset's versatility, we have additionally annotated approximately 7.6k synthetic objects and collected 20 CT scans, successfully integrating the real-world X-ray scans, high-precision CT validation, and large-scale synthetic data. To demonstrate its practical value, we add X-ray images as a conditioning signal to an existing segmentation network and perform volumetric segmentation. Furthermore, we conduct experiments on downstream robotics tasks. The results show that leveraging the dataset can effectively improve the accuracy of center-of-mass estimation and the success rate of robotic manipulation. We believe XDen-1K will serve as a foundational resource and a challenging new benchmark, catalyzing future research in physically grounded visual inference and embodied AI.

Video

Data Collection Pipeline


Pipeline

For every collected real-world object, we first use a bi-planar X-ray machine to capture the object's X-ray map. Then, we reconstruct its mesh with a multi-view generation method and segment it into parts. Based on the captured X-ray map and recovered segmented mesh, we develop a differentiable X-ray rendering process to estimate a density field of the given mesh.

Differentiable X-ray Rendering

The density estimation task is formulated as a differentiable X-ray reconstruction problem. Given a voxelized mesh with an initialized linear attenuation coefficient (LAC) for each part, bi-planar X-ray images are rendered according to the Beer-Lambert law. The LAC are then optimized by minimizing the discrepancy between the rendered and captured (ground-truth) bi-planar X-ray images. Finally, the optimized LAC are approximately converted into density to obtain a volumetric density field.

XDen-1K Dataset

Gallery

Gallery of XDen-1K Dataset For every object in our XDen-1K dataset, it contains image, scanned biplanar X-ray image, density field and approximately reconstructed density field.

RGB images
Biplanar X-ray images

Stats of XDen-1K Dataset: Our dataset contains 1000 real-world objects and spans 148 categories, covering a broad spectrum of everyday items such as tools, kitchenware, and electronics, providing rich variation in geometry, size and materials.

XDen-1K Application

X-ray-conditioned volume segmentation

X-Field Pipeline

X-ray image contains collapsed depth information which reveals internal material information and can guide the prediction of volumetric segmentation and density estimation in a physically grounded way. Thus, we propose X-Field, adapting PartField, a strong part-aware feature extractor, to address volumetric density estimation. We additionally collect a synthetic dataset based on PartNeXt to train our X-Field. The volumetric density can then be clustered into material-aware parts.

X-Field result

We then test the volumetric segmentation ability of X-Field, our X-ray conditioned model and the original PartField on both synthetic and real datasets. The results show that with X-ray as condition, the model can perform better in volumetric segmentation and especially inner structure prediction.

CoM-aware Robot Manipulation



Physical properties like Center of Mass (CoM) and Moment of Inertia can be easily evaluated with density-field data. To study how the center of mass influences manipulation outcomes, we conduct three representative experiments on a Franka Emika Panda robotic arm equipped with a parallel gripper. Pick: When the grasp point is off the center of mass, a torque is generated that makes the hammer rotate. This causes rotation under off-center grasps, while grasps near the center of mass remain stable. Place: When the center of mass lies away from the contact point, gravity generates a torque, causing the cup to tip or stay upright depending on their relative position. The cup tips when placed with a large tilt angle and stays upright with a small tilt angle. Push: When the push point is higher than the center of mass, it generates a torque about the contact point, making the bottle fall. High pushes lead to tipping, whereas lower pushes produce stable motion.

BibTeX

@misc{zhang2025xden1kdensityfielddataset,
  title        = {XDen-1K: A Density Field Dataset of Real-World Objects},
  author       = {Jingxuan Zhang and Tianqi Yu and Yatu Zhang and Jinze Wu and Kaixin Yao and Jingyang Liu and Yuyao Zhang and Jiayuan Gu and Jingyi Yu},
  year         = {2025},
  eprint       = {2512.10668},
  archivePrefix= {arXiv},
  primaryClass = {cs.CV},
  url          = {https://arxiv.org/abs/2512.10668}
}