TL;DR - I think the paper is a good contribution and basically holds up, but Figure 2 seems suspicious and the released repo doesn't include the pieces (AE training code and pretrained 4096-element AEs) that would be needed to make DC-AE practically competitive with SD/SDXL VAEs.
DC-AE is an MIT / Tsinghua / NVIDIA paper about improving generative autoencoders (like the SD VAE) under the high-spatial-compression ratio regime.
I am interested in improved autoencoders, so this gist/thread is my attempt to analyze and review some key claims from the DC-AE paper.
(Disclaimer: I work at NVIDIA in an unrelated org :) - this review is written in my personal capacity as an autoencoder buff).

you could go to their OpenReview to ask your questions
https://openreview.net/forum?id=wH8XXUOUZU