LRM 代码精读 1-[Datasets]

status

type

date

slug

summary

base.py

定义BaseDataset Class(torch.utils.data.Dataset, ABC)

值得注意的 1. ABC作为parent class建立BaseDataset

分别有@abstractmethod 和 @staticmethod

abstractmethod 相当于java 中interface

只需要pass 而后subclass中具体根据用途定义

但是ABC class更强大在于可以同时有staticmethod 是实实在在的implemented method

这里用了一个组合使用内置indexing 会直接调用该subclass 的inner_get_item

使得每个sub class只需要重新写inner_get_item

Static method部分

self.uids = json.load() 为dataset长度

_load_rgba_image converts Image.open()→np.array()→torch.form_numpy

rgba转换成rgb

rgba[:, :3, :, :] * rgba[:, 3:4, :, :]

rgb channel分别与 a(alpha 代表opacity透明度) channel相乘

之后bg_color * (1 - rgba[:, 3:, :, :]) Computes the background color contribution where alpha is not fully opaque 如果本身全透明 (a channel为0) 则back_ground color 全部显现

若全不透明 (a channel为1) 则back_ground color 完全看不见

background与image作concat

还有一个查找dir的_locate_datadir

cam_utils.py

An extrinsic matrix represents the pose (position and orientation) of a camera or an object in a 3D space relative to a world coordinate system. It typically combines a rotation(缩放,旋转,shear剪切,reflect轴对称…) matrix and a translation vector(平移) to transform points from the world coordinate system to the camera coordinate system

总结一下: R是(3,3) T是(3,) RT是(3,4) Extrinsic Matrix是(4,4)

后面两个decompose很简单就不贴了

normalize camera extrinsic matrices (poses) by applying a transformation that centers the cameras based on their distance to a pivotal point (often the center of the scene or object of interest)

这里了解了另外一种一般calibration方法

Intrinsics * Extrinsics 构建calibration矩阵

将world point映射为image point 3d→2d 点

‣