====== Encoding Information in Video Frames ======

  * Content is encoded into base64 prior to transmission.
  * We fix the frame size in 640x480 (common VGA resolution)

====== Overlaying information in the background video ======
We have 2 approaches for overlaying payload on a video:

  * **Spatial domain:** The overlay has a given fixed size which occludes a portion of the background video. To retain the background video characteristics, the modified video must only be overlayed by a given fraction. Motion vector estimation for the overlay is expected to be different than that of the regular background video, either by introducing movement where it originally did not occur, either by occluding rapid movement scenes.
{{  covert:matrix2.png?320x240  }}
  * **Frequency domain:** The overlay could occupy all the background video area. Instead of completely occluding the background video, the overlay could be applied with a given alpha channel so that motion vector estimation gets close to that of the background video. (This means differences in color, however. How does that translate into H264 encoding differences and unobservability?)
{{  covert:matrix.png?320x240  }}

====== Overlay matrix construction ======

===== A Bit per Pixel =====
A first approach to encode information into a frame's pixels was to encode a data bit into each available pixel. Using a black/white color scheme for 0 and 1, respectively, a 640x480 SD frame could hold about 38,4 KB of data.

This represents the maximum capacity of this scheme. However, once a video is encoded and decoded by Skype at both ends, the final image we obtain exhibits a lot of color blending between adjacent pixels. This means that a pixel between a black and white one can be gray and we are left clueless about it's true value.

===== A Bit per Group of Pixels =====
In order to mitigate the aforementioned issue about the decision of the color of a pixel, another approach is to use a group of pixels to encode a given bit. For instance, if a 2x2 pixel group encode a single bit, an average of the pixels in the group yields a more accurate estimate of the bit value.

  * Bit 2x2 -> 320x240 bits -> 9,6 KB data
  * Bit 4x4 -> 160x120 bits -> 2,4 KB of data
  * Bit 8x8 -> 80x60 bits -> 600 B of data

The correctness of the bit value retrieval increases with the size of the pixel group, with the trade-off of less data per frame.

===== A Byte per Group of Pixels =====
A second approach for the increase of the encoding efficiency is to get each group of pixels to encode a given symbol, instead of a single bit. 

The base64 alphabet can represent binary data through a radix-64 representation. This means that if we are able to encode 64 symbols into different colors, we may be able to encode a full 76,8 KB in a single frame where a symbol is represented by a 2x2 group of pixels.

  * Byte 2x2 -> 320x240 bytes -> 76,8 KB data
  * Byte 4x4 -> 160x120 bytes -> 19,2 KB of data
  * Byte 8x8 -> 80x60 bytes -> 4,8 KB of data

Due to color blending on encoding/decoding, the insigth is to encode bytes in RGB values which are as far apart as possible, in order to diminish the possibility of overlap on color measurements between adjacent colored cells.

Upon experiments, due to quantization and the codec block prediction modes we are unable to get enough separate color ranges to identify a given byte.

===== A Nibble per Group of Pixels =====
Same as above. The difference is the expansion in color range that we are able to achieve with only 16 different color cells.

===== Three bits per Group of Pixels =====
Same as above, 8 different color cells.