AI生成Minecraft(open-oasis) 代码浅读 - 3(VAE)

status

type

date

slug

summary

sample point from normal_dist with given mean and std

还是用了 rotary_emb for pixel

注意这里dim = head_dim // 4 ?

endow each pixel freq embed based on pos

注意这里不再用mod和activation了只有norm

input_image → patch_emb

encoder = enc_depth * AttentionBlock(enc_dim, enc_heads)

bottleneck:

decoder = dec_depth * AttentionBlock(dec_dim, dec_heads)

self.predictor = nn.Linear(dec_dim, self.patch_dim)

*self.patch_dim = 3 * patch_size**2

encoder 完后

从moments里sample 如果not variational → deterministic

var 设成跟moments.zeros_like 全部moments作为mean

若有var 把现有moments分成两份? 但是这样sample的x shape不会不同?

—> 因为如果没有var 不做sample

decode 完后接predictor

然后unpatchify

最后返回type

enc, dec 的dim, heads相同 dec depth两倍

360*640 → patchify to size of 20*20