type
status
date
slug
summary
tags
category
icon
password
Diagonal Gaussian Dist
sample point from normal_dist with given mean and std
Attend
还是用了 rotary_emb for pixel
注意这里dim = head_dim // 4 ?
endow each pixel freq embed based on pos
Attend Block
注意这里不再用mod和activation了 只有norm
Autoencoder KL
input_image → patch_emb
encoder = enc_depth * AttentionBlock(enc_dim, enc_heads)
bottleneck:
decoder = dec_depth * AttentionBlock(dec_dim, dec_heads)
self.predictor = nn.Linear(dec_dim, self.patch_dim)
*self.patch_dim = 3 * patch_size**2
encoder 完后
从moments里sample 如果not variational → deterministic
var 设成跟moments.zeros_like 全部moments作为mean
若有var 把现有moments分成两份? 但是这样sample的x shape不会不同?
—> 因为如果没有var 不做sample
decode 完后 接predictor
然后unpatchify
最后返回type
enc, dec 的dim, heads相同 dec depth两倍
360*640 → patchify to size of 20*20
最近感想:
看着冰箱里剩的5块肉和4个辣椒陷入沉思
这波我只能说 拖就硬拖 但我就是要拖住!
- Author:ran2323
- URL:https://www.blueif.me//article/14c71a79-6e22-8069-984e-dea444810087
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!