浅了解fvdb | 湛蓝与蔚蓝

status

type

date

slug

summary

1. IndexGrid mechanism

Linear offset from signed coordinate to values in external array

[m] Leaf node with indexes (i,j,k)(512,)

[n] Recorded offset on the mBitMask for the active, each mask represents 512 values

[w]

Inputs:

i,j,ki, j, ki,j,k are the signed coordinates representing a position within a grid or node.
mOffsetmOffsetmOffset is the starting point in an external array where values are stored.
m∈{0,511}m \in \{0, 511\}m∈{0,511} represents the linear index inside a 512-element array (since 512=8×8×8).

512=8×8×8512 = 8 \times 8 \times 8

n∈{0,7}n \in \{0, 7\}n∈{0,7} is the offset into the 64-bit array mBitMask, which stores the active status of the 512 values.

mBitMaskmBitMask

mBitMaskmBitMaskmBitMask is an array where each bit represents whether a corresponding value in the dense 512-element array is "on" (active) or "off" (inactive).

Active Value Mapping:

The 512 values are stored in a compressed format, where only the active values (those "on" in mBitMask) are stored in the external array.

mBitMaskmBitMask

www is a specific 64-bit word from mBitMask, which corresponds to the block containing the value at position (i,j,k).

mBitMaskmBitMask

(i,j,k)(i, j, k)

The value of w determines which positions are active (i.e., bits set to 1) in the 64-bit block.

Masking Process:

The variable mask is used to filter out higher-order bits in w, ensuring that only the bits corresponding to positions before (i,j,k) are considered. This is important to determine the number of "on" values before the current position.

(i,j,k)(i, j, k)

Checking for Activity (Line 6):

If (i,j,k) corresponds to an inactive value (i.e., the bit for (i,j,k) in w is 0), the function returns a zero offset. This likely maps to a "background index" or a default value, indicating no data at that position.

(i,j,k)(i, j, k)

Preceding Active Values (Line 7):

If w is not the first word in mBitMask, the code uses mPrefixSum to extract the count of active values from previous 64-bit words in mBitMask. mPrefixSum encodes the prefix sums, providing a cumulative count of the active values in the preceding words.

mBitMaskmBitMask

mPrefixSummPrefixSum

mBitMaskmBitMask

mPrefixSummPrefixSum

Each prefix sum covers a block of 64 values, so the code extracts the count of "on" values before the current word, speeding up the calculation of the linear index.

Counting Active Values (Line 8):

The code counts how many bits in w (representing the current word) are "on" before the bit corresponding to the position (i,j,k). This gives the count of active values in the current word up to (i,j,k).

(i,j,k)(i, j, k)

The sum of the active values from the preceding words (from mPrefixSum) and the current word gives the final linear index of the position (i,j,k).

mPrefixSummPrefixSum

(i,j,k)(i, j, k)

2. GridBatch & JaggedTensor

Each grid has a random chosen origin in 3D world and random chosen voxel size

The fVDB documentation has more useful examples for these cases using functions like sparse_grid_from_points, sparse_grid_from_dense and sparse_grid_from_mesh

The features stored in a JaggedTensor can be of any type that PyTorch supports, including float, float64, float16, bfloat16, int, etc., and we can have an arbitrary number of feature channels per voxel.

For instance, there could be a JaggedTensor with 1 float feature that represents a signed distance field in each grid, or 3 float features that represent an RGB color in each voxel of the grids, or a 192 float feature that represents a learned feature vector of each voxel in each grids.

VDBTensor exists to wrap around a GridBatch and JaggedTensor

Has operators that work with both at the same time

VDBTensor concatenation has two different definitions.

Concatenating along dimension 0 is a concatenation along the batch dimension.

If J1 has 10 member grids in the batch and J2 has 20 members in the batch, then VDBTensor.cat([J1,J2], dim=0) will have 30 members.

All input VDBTensors must have the same number of features.

Concatenating along dimension 1 concatenates the features of the VDBTensors together.

If J1 has 3 features and J2 has 4 features, then VDBTensor.cat([J1,J2], dim=1) will have 7 features.

All input VDBTensors must have the same number of grids in the batch and number of voxels in each grid.

fvdb.nn.Linear 作用在 VDBTensor上可是只会改变JaggedTensor（只在feature维计算）

GridBatch保持原object（hasn't actually changed the topology of the grids） VDBTensor.same_grid(out_vdbtensor.grid, vdbtensor.grid)) == True

总结:主要目标是将ijk data与feature data拆分

gridbatch存ijk jaggedtensor存feature 由j_index索引

3. Actually training

这个是演示

Model本身没有太多值得注意的

将ijk value 赋予feature的每个c维 ~~ijk = feature??~~

应该还是经过转换的

data.grid.grid_to_world(): This function likely transforms the grid coordinates from the grid space to world space (real-world coordinates). It maps (i,j,k)(i, j, k)(i,j,k) grid positions to actual world coordinates using some transformation (e.g., scaling, rotation).

实际训练中feature数量应该如何设定? 这种方式只能应用在学习简单3D形状上

features = grid.grid_to_world(grid.ijk.float())

训练目标:

最终jdata将会是真实世界坐标

data.grid.ijk.float() → 汇总grid 中的ijk(这步操作是因为grid sparse的特性)

data.grid.grid_to_world(data.grid.ijk.float()) → 得到feature 也就是我们所学习的3D位置的Jagged Tensor

data.grid.grid_to_world(data.grid.ijk.float()).jdata → 得到真实3D位置

target = (dist < 1).float()

BCE Loss(Binary cross entropy)

4. Perform Conv

fvdb.nn.SparseConv3d

conv_layer = SparseConv3d