Changes

Jump to navigation Jump to search
Suggested/tested approaches
== Suggested/tested approaches ==
The pipeline consists of multiple steps where each step has a drastic influence on the resulting quality. The stage consist of a single fish or each fish can be tracked individually and independently on other fishes present in the scene. From the tracking stage, we have a bounding box. The image within the bounding box is then fed to a minimalistic minimalist encoder-decoded network where the latent space is used to detect different stages of the fish (either hearth or beat rate).
The architecture of the network is very minimalistic minimalist due extremely small training dataset (3x3 consecutive convolutions, the image is rescaled re-scaled to 64x64 pixels with three channels) which on average consist of no more than 2000 frames. The overfitting over-fitting is prevented by using a very short training stage (100 iterations is incomparably less than what is usually used). The bounding boxed from different videos cannot be used because of the setting (fishes, scene) or alignment is different for each video.
Latent space usually consists of multiple dimensions, some of them are redundant, but we calculate the average of the entire latent space for each frame. The absolute value provides us a rough clue of at which stage the fish is.
One potential thing worth The popular approach of embedding (TSNE) seems not to try is TSNE embedding of the whole latent space.work unfortunately (see results, left figure)
== Results ==
 
 
{|
ccu
60
edits

Navigation menu