Gigabug

image factorization in the brain

Interested in perception and signaling systems, particularly in humans. Earlier project using basic RNN to predict dash camera misalignment. Now reading about visual processing models inspired by the brain.

todos

[1] Invariance and equivariance in brains and machines
[2] Disentangling images with Lie group transformations and sparse coding
[3] Analog Memory and High-Dimensional Computation
[4] Neuromorphic Visual Scene Understanding with Resonator Networks
[5] Hippocampal memory, cognition, and the role of sleep
[6] Computing with Residue Numbers in High-Dimensional Representation
[7] Visual scene analysis via factorization of HD vectors

The brain computes and represents

Cones on the back of your eye perceive scene information.

Set of points lives on nonlinear manifold. Halfway point corresponds to superposition of both patterns at half contrast so manifold has to be curved.

Different patterns trace out different manifolds but importantly the Lie operator is the same. Theory of Lie groups tells us all matrices moving on this manifold belong to same group. Some other examples include the groups of n-dimensional rotations SO(n) (special orthogonal) and rigid motions SE(n) (special Euclidean).

Images are explained as product of transformation and shape. It is reminiscent of Plato’s theory of forms, that physical objects and matter are merely imitations of the non-physical ideas. Elaborate more.

Formally, [2] combines Lie Group transformation learning and sparse coding within a Bayesian model. Images become a sparse superposition of shape components followed by a transformation that is parameterized by n continuous variables. The generative model is

$$ I = T(s) Φα + ε $$

The transformations T(s) are parameterized as actions of compact, connected, commutative Lie groups on images. They can be decomposed using Peter-Weyl theorem to get

$$ I = e^{A s}Φα + ε $$ $$ I = We^{\sum s}W^T Φα + ε $$ $$ I = WR(s)W^T Φα + ε $$

This imposes theoretical constraints on learnable transformations, but they can still be approximately learned in practice.

The block diagonal structure allows for efficient inference and learning. Reformulate as vector factorization

$$ W^TI = R(s)W^T Φα + ε $$

$$ \overset{\sim}{I} = z(s) .* \overset{\sim}{O}(\alpha) $$

Can’t gradient descent because of too many local minima. Performs gradient ascent on their log-likelihood

$$ \nabla_{\theta} \ln P_{\theta}(\mathbf{I}) \approx \mathbb{E}{s \sim P{\theta}(s | \mathbf{I}, \hat{\alpha})} \left[ \nabla_{\theta} \ln P_{\theta}(\mathbf{I} | s, \hat{\alpha}) \right] $$

optimization uses Riemannian ADAM

Set of values {x} can be represented in weighted superposition. Product(N) to sum(N) representation complexity.