Latent space usually consists of multiple dimensions, some of them are redundant, but we calculate the average of the entire latent space for each frame. The absolute value provides us a rough clue of at which stage the fish is.
One potential thing worth to try is TSNE embedding.
== Results ==