从零开始了解NeRF | 湛蓝与蔚蓝

status

type

date

slug

summary

-1 NeRF(neural radiance field)介绍

目的：

march camera rays through the scene to generate a sampled set of 3D points

use those points and their corresponding 2D viewing directions as input to the neural network to produce an output set of colors and densities

use classical volume rendering techniques to accumulate those colors and densities into a 2D image

基本方法：

Use gradient descent to optimize this model by minimizing the error between each observed image and the corresponding views rendered from our representation

问题：

Basic implementation of optimizing a neural radiance field representation for a complex scene does not converge to a sufficiently highresolution representation and is inefficient in the required number of samples per camera ray

改进方法：

Transforming input 5D coordinates with a positional encoding that enables the MLP to represent higher frequency functions(map each input 5D coordinate into a higher dimensional space)

hierarchical sampling procedure to reduce the number of queries required to adequately sample this high-frequency scene representation.

使用volumetric representations对比:

Represent complex real-world geometry and appearance(from standard RGB images)

Well suited for gradient-based optimization using projected images

overcomes the prohibitive storage costs of discretized voxel(立体像素) grids when modeling complex scenes at high-resolutions

三维立方体生成图像细腻不够灵活

5D vector-valued function whose input is a 3D location x→(x, y, z) and 2D viewing direction d→(θ, φ), and whose output is an emitted color c = (r, g, b) and volume density σ

MLP network FΘ : (x, d) → (c, σ)

Predict the volume density σ as a function of only the location x, while allowing the RGB color c to be predicted as a function of both location and viewing direction

　ｘ　→　8 fully-connected layers(256 channels)　→ σ + 256 dim feature vector

256 dim feature vector + d → 1 fully-connected layer(128 channels) → c(RGB)

t_n → t_f partition into N bins and draw sample at random

-2 Create and calculate ray transmittance

一下子理解不了公式的话就从代码入手吧下面代码全是抄的

在这之中：

ray_origins: The origin points of the rays (i.e., the camera location).

ray_directions: The directions of the rays (i.e., the direction from the camera into the scene).

hn, hf: The near and far bounds for sampling along the rays (in terms of distance from the ray origin).

nb_bins: The number of sample points along each ray.

沿光束长度切分N个bin 之后对于每一束光也就是batch_size(ray_origins.shape[0])

用linspace创建长度分割成为t

t在经过分成上下两部分然后*随机数

之后设定bin间距离为delta 最后一个距离非常大代表ray终结

x: 3D Positions along the ray as input for the model

ray origins + delta * ray directions这里体现就是

(b, pos_dim) + (b, N_bins) * （b, pos_dim） [pos_dim = 3 (x,y,z)]

x = ray_origins.unsqueeze(1)

+ t.unsqueeze(2) * ray_directions.unsqueeze(1)

添加最后一维做乘法　现在维度为［b, N_bin, pos_dim］

broadcast ray_directiosn

这里使用ray_directions.expand(保留前面维度　后面数值重复)

input为(b * n_bins, 3)

colors: (b, N_bins, 3)

sigma: (b, N_bins) Density of ray at each sample point

Compute alpha (transmittance透射率):

Alpha is calculated as 1 - exp(-sigma * delta). This represents the amount of light that is absorbed at each sample point. Higher sigma means more density, so more light is absorbed (higher alpha).

Compute accumulated transmittance (weights):

weights = compute_accumulated_transmittance(1 - alpha).unsqueeze(2) * alpha.unsqueeze(2): The weights for each sample are computed using accumulated transmittance (how much light passes through previous points along the ray). This is used to weigh the contributions of each sample point to the final pixel color.
This line effectively computes how much each sample point contributes to the final color based on its transparency and the transparency of previous points along the ray.

c = (weights * colors).sum(dim=1): This computes the weighted sum of colors along the ray, using the computed weights. This gives the final RGB value for each pixel.

Handle background regularization:

weight_sum = weights.sum(-1).sum(-1): This computes the sum of the weights, which is used to handle the "white background" regularization. If a ray passes through fully transparent regions (low sigma), it contributes to a white background.

return c + 1 - weight_sum.unsqueeze(-1): The final result adds the regularization term, so that rays that don't hit dense objects result in a white background.

-3 Model 构造

Ｌ代表embedding_dim

默认值是embedding_dim_pos=10, embedding_dim_direction=4

简单将这么多维度的sin cos叠加起来　

block1,2 得到sigma,h ｈ继续通过block3,4得到c

-3 训练时

将render_ray产物与gt_2D image 比对

未结束　还需自己上手复刻代码　但是已经大致了解nerf原理

还可以看一下https://colab.research.google.com/github/bmild/nerf/blob/master/tiny_nerf.ipynb#scrollTo=bZNXlxmEj0FC