-
-
Save vyeevani/2b1509c3e610b6319a4fb4e4137cc9e1 to your computer and use it in GitHub Desktop.
also a shit ton of the global variables have leaked into the rest of the code. I know it's terrible but I was sprinting through this just to get the piece of shit to work so don't hate :)
I should also mention that this uses x_0 prediction directly. For small dataset/models like this, epsilon prediction is too challenging since there's just generally not enough data to pull noise out given the long noise schedule that's required. Of course the code should be fairly flexible in that you can easily swap the loss function to predict noise directly if you desire. and yes I'm aware that the model will suffer posterior collapse, I just decided it was fine and I'll have to make some progress without getting bogged down in hyper param tuning hell.
I wanted to get down my view on why this is needed:
desired_poses = input_project(einops.pack([self.desired_poses_start, desired_poses], "* r d")[0][:-1], self.desired_poses_positional_embedder)
The basic premise is that the future actions can view the denoised prior actions while making their own decision. This allows for autoregressive action prediction while we are still doing diffusion. Generalizing the perceiver has been challenging because I'm not sure how to properly abstract this without leaking details of how diffusion + autoregressive generation leaks into this.
this has very much not been cleaned up and most of the dependencies which are required to make this file work are in separate libraries