status
type
date
slug
summary
tags
category
icon
password
 

Diagonal Gaussian Dist

 
sample point from normal_dist with given mean and std
 
 

Attend

 
还是用了 rotary_emb for pixel
 
注意这里dim = head_dim // 4 ?
endow each pixel freq embed based on pos
 

Attend Block

注意这里不再用mod和activation了 只有norm
 
 

Autoencoder KL

 
input_image → patch_emb
 
encoder = enc_depth * AttentionBlock(enc_dim, enc_heads)
 
bottleneck:
 
decoder = dec_depth * AttentionBlock(dec_dim, dec_heads)
 
self.predictor = nn.Linear(dec_dim, self.patch_dim)
*self.patch_dim = 3 * patch_size**2
 
encoder 完后
 
从moments里sample 如果not variational → deterministic
var 设成跟moments.zeros_like 全部moments作为mean
若有var 把现有moments分成两份? 但是这样sample的x shape不会不同?
 
—> 因为如果没有var 不做sample
 
decode 完后 接predictor
然后unpatchify
 
最后返回type
 
enc, dec 的dim, heads相同 dec depth两倍
360*640 → patchify to size of 20*20
 
 
 
 
 
最近感想:
看着冰箱里剩的5块肉和4个辣椒陷入沉思
这波我只能说 拖就硬拖 但我就是要拖住!
 
Leetcode - Dijkstra相关AI生成Minecraft(open-oasis) 代码浅读 - 2(DIT)
Loading...
ran2323
ran2323
忘掉名字吧
Latest posts
SFT + DPO 塔罗解读
2025-4-14
Backtracking
2025-4-14
Leetcode 0001-1000 分组
2025-4-14
mcp 记录(1)
2025-4-14
DPO 相关
2025-3-29
今日paper(3/25) - MAGPIE
2025-3-27
Announcement
 
 
 
 
暂时没有新的内容