Blog Generative Models Related


Contents


2D Generation

[CVPR 2023] DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation [pdf][Porject]

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman

[Siggraph 2023] 3D Gaussian Splatting for Real-Time Radiance Field Rendering [pdf]

[arXiv 2024] FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model [pdf]

Qijun Feng, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang

[CVM 2024] Recent Advances in 3D Gaussian Splatting [pdf]

Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan-Pei Cao, Ling-Qi Yan, Lin Gao

[Google Research] StyleDrop: Text-to-Image Generation in Any Style [pdf] [blog]

Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan

[NeurIPS 2023] SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs [pdf]

Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang

[CVPR 2023] DynIBaR: Neural Dynamic Image-Based Rendering [pdf] [project]

Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, Noah Snavely

[arXiv 2024] Explorative Inbetweening of Time and Space [pdf]

Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Michael J. Black1, Xuaner Zhang

[Back]


3D Generation

[arXiv 2024] V3D: Video Diffusion Models are Effective 3D Generators [pdf] [[code]](t https://github.com/heheyas/V3D)

Zilong Chen, Yikai Wang, Feng Wang, Zhengyi Wang, Huaping Liu

[arXiv 2024] 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [pdf]

Frank Zhang, Yibo Zhang, Quan Zheng, Rui Ma, Wei Hua, Hujun Bao, Weiwei Xu, Changqing Zou

[NeurIPS 2023] DreamHuman: Animatable 3D Avatars from Text [pdf]

Nikos Kolotouros, Thiemo Alldieck, Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Fieraru, Cristian Sminchisescu

[Siggraph 2022] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars [pdf]

Fangzhou Hong, Mingyuan Zhang, Liang Pan, Zhongang Cai, Lei Yang, Ziwei Liu

[ICCV 2023] Synthesizing Diverse Human Motions in 3D Indoor Scenes [pdf]

Kaifeng Zhao, Yan Zhang, Shaofei Wang, Thabo Beeler, Siyu Tang

[ICCV 2023] AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

Ruixiang Jiang, Can Wang, Jingbo Zhang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

[ICCV 2023] DreamBooth3D: Subject-Driven Text-to-3D Generation [pdf]

Amit Raj, Srinivas Kaza, Ben Poole, Michael Niemeyer, Nataniel Ruiz, Ben Mildenhall, Shiran Zada, Kfir Aberman, Michael Rubinstein, Jonathan Barron, Yuanzhen Li, Varun Jampani

[arXiv 2024] Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering [pdf]

Antoine Guédon, Vincent Lepetit

[arXiv 2024] GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation [pdf]

Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein

[arXiv 2024] UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling

Yujiao Jiang, Qingmin Liao, Xiaoyu Li, Li Ma, Qi Zhang, Chaopeng Zhang, Zongqing Lu, Ying Shan

[arXiv 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model [pdf]

Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan

[arXiv 2024] Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation [pdf]

Tairan He, Zhengyi Luo, Wenli Xiao, Chong Zhang, Kris Kitani, Changliu Liu, Guanya Shi

[arXiv 2024] Giving a Hand to Diffusion Models: a Two-Stage Approach to Improving Conditional Human Image Generation

Anton Pelykh, Ozge Mercanoglu Sincan, Richard Bowden

[arXiv 2024] DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models [pdf]

Zhiyao Sun, Tian Lv, Sheng Ye, Matthieu Gaetan Lin, Jenny Sheng, Yu-Hui Wen, Minjing Yu, Yong-jin Liu

[Back]


Other Interesting Works

[Google Research] AMIE: A research AI system for diagnostic medical reasoning and conversations [pdf] [blog]

Tao Tu, Anil Palepu, Mike Schaekermann, Khaled Saab, Jan Freyberg, Ryutaro Tanno, Amy Wang, Brenna Li, Mohamed Amin, Nenad Tomasev, Shekoofeh Azizi, Karan Singhal, Yong Cheng, Le Hou, Albert Webson, Kavita Kulkarni, S Sara Mahdavi, Christopher Semturs, Juraj Gottweis, Joelle Barral, Katherine Chou, Greg S Corrado, Yossi Matias, Alan Karthikesalingam, Vivek Natarajan

[NeurIPS 2023] Towards Generalist Biomedical AI [pdf]

Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Chuck Lau, Ryutaro Tanno, Ira Ktena, Basil Mustafa, Aakanksha Chowdhery, Yun Liu, Simon Kornblith, David Fleet, Philip Mansfield, Sushant Prakash, Renee Wong, Sunny Virmani, Christopher Semturs, S Sara Mahdavi, Bradley Green, Ewa Dominowska, Blaise Aguera y Arcas, Joelle Barral, Dale Webster, Greg S. Corrado, Yossi Matias, Karan Singhal, Pete Florence, Alan Karthikesalingam, Vivek Natarajan

[Google Research] A new quantum algorithm for classical mechanics with an exponential speedup [pdf]

Robin Kothari, Rolando Somma

[ICCV 2023] SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection [pdf]

Yichen Xie, Chenfeng Xu, Marie-Julie Rakotosaona, Patrick Rim, Federico Tombari, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan

[ICCV 2023] Audiovisual Masked Autoencoders [pdf]

Mariana-Iuliana Georgescu, Eduardo Fonseca, Radu Tudor Ionescu, Mario Lucic, Cordelia Schmid, Anurag Arnab

[Back]


VALL-E [pdf] [re-implement]

EnCodec: High Fidelity Neural Audio Compression [pdf] [code]

[Back]