type
status
date
slug
summary
tags
category
icon
password
 

-1 NeRF(neural radiance field)介绍

目的:
  1. march camera rays through the scene to generate a sampled set of 3D points
  1. use those points and their corresponding 2D viewing directions as input to the neural network to produce an output set of colors and densities
  1. use classical volume rendering techniques to accumulate those colors and densities into a 2D image
 
基本方法:
Use gradient descent to optimize this model by minimizing the error between each observed image and the corresponding views rendered from our representation
 
问题:
Basic implementation of optimizing a neural radiance field representation for a complex scene does not converge to a sufficiently highresolution representation and is inefficient in the required number of samples per camera ray
 
改进方法:
  1. Transforming input 5D coordinates with a positional encoding that enables the MLP to represent higher frequency functions(map each input 5D coordinate into a higher dimensional space)
 
  1. hierarchical sampling procedure to reduce the number of queries required to adequately sample this high-frequency scene representation.
 
使用volumetric representations对比:
Represent complex real-world geometry and appearance(from standard RGB images)
Well suited for gradient-based optimization using projected images
 
overcomes the prohibitive storage costs of discretized voxel(立体像素) grids when modeling complex scenes at high-resolutions
三维立方体 生成图像细腻 不够灵活
 
 
5D vector-valued function whose input is a 3D location x→(x, y, z) and 2D viewing direction d→(θ, φ), and whose output is an emitted color c = (r, g, b) and volume density σ
 
MLP network FΘ : (x, d) → (c, σ)
  • Predict the volume density σ as a function of only the location x, while allowing the RGB color c to be predicted as a function of both location and viewing direction
 
  1.  x → 8 fully-connected layers(256 channels) → σ + 256 dim feature vector
  1. 256 dim feature vector + d → 1 fully-connected layer(128 channels) → c(RGB)
 
notion image
t_n → t_f partition into N bins and draw sample at random
notion image

-2 Create and calculate ray transmittance

一下子理解不了公式的话 就从代码入手吧 下面代码全是抄的
 
在这之中:
  • ray_origins: The origin points of the rays (i.e., the camera location).
  • ray_directions: The directions of the rays (i.e., the direction from the camera into the scene).
  • hn, hf: The near and far bounds for sampling along the rays (in terms of distance from the ray origin).
  • nb_bins: The number of sample points along each ray.
 
沿光束长度切分N个bin 之后对于每一束光 也就是batch_size(ray_origins.shape[0])
用linspace创建长度分割 成为t
 
t在经过分成上下两部分 然后*随机数
 
之后设定bin间距离为delta 最后一个距离非常大 代表ray终结
 
x: 3D Positions along the ray as input for the model
ray origins + delta * ray directions这里体现就是
(b, pos_dim) + (b, N_bins) * (b, pos_dim) [pos_dim = 3 (x,y,z)]
x = ray_origins.unsqueeze(1)
+ t.unsqueeze(2) * ray_directions.unsqueeze(1)
添加最后一维做乘法 现在维度为[b, N_bin, pos_dim]
 
broadcast ray_directiosn
这里使用ray_directions.expand(保留前面维度 后面数值重复)
 
input为(b * n_bins, 3)
 
colors: (b, N_bins, 3)
sigma: (b, N_bins) Density of ray at each sample point
  • Compute alpha (transmittance透射率):
    • Alpha is calculated as 1 - exp(-sigma * delta). This represents the amount of light that is absorbed at each sample point. Higher sigma means more density, so more light is absorbed (higher alpha).
  • Compute accumulated transmittance (weights):
    • weights = compute_accumulated_transmittance(1 - alpha).unsqueeze(2) * alpha.unsqueeze(2): The weights for each sample are computed using accumulated transmittance (how much light passes through previous points along the ray). This is used to weigh the contributions of each sample point to the final pixel color.
    • This line effectively computes how much each sample point contributes to the final color based on its transparency and the transparency of previous points along the ray.
 
c = (weights * colors).sum(dim=1): This computes the weighted sum of colors along the ray, using the computed weights. This gives the final RGB value for each pixel.
Handle background regularization:
  • weight_sum = weights.sum(-1).sum(-1): This computes the sum of the weights, which is used to handle the "white background" regularization. If a ray passes through fully transparent regions (low sigma), it contributes to a white background.
  • return c + 1 - weight_sum.unsqueeze(-1): The final result adds the regularization term, so that rays that don't hit dense objects result in a white background.
 

-3 Model 构造

L代表embedding_dim
默认值是embedding_dim_pos=10, embedding_dim_direction=4
简单将这么多维度的sin cos叠加起来 
 
block1,2 得到sigma,h h继续通过block3,4得到c
 

-3 训练时

 
将render_ray产物与gt_2D image 比对
 
未结束 还需自己上手复刻代码 但是已经大致了解nerf原理
 
 
音乐分享学习Diffusion in pytorch(1)
Loading...
ran2323
ran2323
我们再来一次, 这一次, 好好来!
Latest posts
Leetcode记录「2」
2024-12-27
Flutter 基础 记录
2024-12-25
Flutter tutorial 记录
2024-12-25
Privicy policy for GitHub To Text (Chrome Extension)
2024-12-22
一些 Kubernetes 笔记
2024-12-21
一些 docker 笔记
2024-12-20
Announcement
 
 
 
 
暂时没有新的内容