Introduction

On Monday, we went over the parts of the eye that convert photons into neural impulses: The retina and, in particular, a sensitive subregion of the retina called the fovea.
We talked about the two kinds of cells, called photoreceptors, in the retina that respond to light: the rods and the cones, and how each responds to light.
We said neural signals from receptors leave the eye through the optic disk, a blind spot in the retina, along the optic nerve.
We said that these signals are processed in a hierarchical fashion by the visual cortex.
Let's pick up the discussion from there...

Neural Layers of the Eye

From the rods and cones, the neural impulses immediately traverse, bipolar, amacrine, and horizontal cells (together called the inner nuclear layer of the eye).
Because of our contorted evolution, in vertebrates, these cell layers are actually above the rods and cones in the eye (i.e., standing between the light and the rods and cones). You can see this in the first image on the left.
For invertebrates, like octopuses, a more reasonable rods and cones outside approach is used. (Right side of first image).
A ganglion cell layer tranverses outside these layers to a hole in the retina to connect to the optic nerve.
Bipolar cells connect to the rods and cones: Usually around 1-10 cones/bipolar cell and 30-50 rods/bipolar cell.
There are two types of Bipolar cells.
For rods, ON Bipolar cells react when the rate of photo absorption in its connected photoreceptors increases; OFF Bipolar Cells decreases.
For cones, these two types are called vertical which connect directly to ganglion cells and horizontal which output to photoreceptors and act as inhibitors.
Amacrine cells connect horizontally between bipolar cells and other amacrine cells, and vertically to ganglion.
The most understood types of ganglion cells are: midget, parasol, and bistratified. They perform simple filter operation over their receptive field based on spatial temporal and spectral variations. For example, in the right figure above, the ganglion might respond to red in the center of its responsive region together with not green in the surrounding area (called spatial opponency).

Away from the Eye

The optic nerve connects to a part of the thalamus called the lateral geniculate nucleus (LGN).
The LGN both performs some processing as well as routes the incoming sense signals to various parts of the brain.
For vision, the LGN sends image information to a region of the back of the brain called the primary visual cortex (V1). This is shown in the left image above.
The visual cortex (highlighted middle image) contains several interconnected areas that each perform specialized functions.
One such specialized function, orientation tuning, has been well studied and the right image above shows the response of single neurons in this sub-area to images of various orientations.
Understanding how all these areas operate together and integrate their responses is an active area of research.

Eye Movement

Eye movements are important to being able to see the world, because as we said earlier, the eye needs to position the fovea in the direction of a feature of interest in order to see that feature clearly.
To get a clear, coherent, detailed view of a whole scene, the eyes rapidly scan over the scene while fixating on points of interest. The figure above shows this in action as it show the results of eye tracking software of someone looking at a face.
Another reason for eye movement is that our photoreceptors are slow (10ms-100ms), so we might need to move the eyes to keep the feature we are interested in front of the fovea.
One more reason for eye motion is to allow for a stereoscopic view of an object even after looking at it for a while.
One can experimentally show that if one totally suppresses eye movement, then visual perception disappears completely.

Eye Muscles

Each eye is controlled by six muscles. These are each attached to the outer eye, the schlera, by a tendon. (See images above)
These muscles pull on the eye in opposite pairs.
For example, to perform a side-to-side motion (yaw), the tensions on the medial rectus and lateral rectus are varied while the other muscles are left unchanged.
Besides yaw, eyes can do pitches (up-down) (uses two-top and top bottom muscles) and combinations of yaws and pitches (uses all six muscles).
They cannot easily do rolls (look straight ahead, yet turn eyes upside down in sockets), so eye motions are typically modeled as 2D rather than 3D.

Types of Eye Movements

There are seven main categories of eye movements:
1. Saccades. These are a fast movements to relocate the fovea over the next important feature in a scene. They last less than 45ms, and can travels 900 degree/second. The brain uses saccadic masking to hide the intervals over which saccades occur from our memory. Although they occur frequently, we have little or no awareness of them. We can to some degree consciously control them by choosing to fixate on some feature of a scene.
2. Smooth Pursuit. These are slower eye movements (less than 30degrees/second) to track a moving target feature. They are done to reduce motion blur on the retina. (aka for image stabilization). For fast moving targets Smooth Pursuit and saccades are sometimes combined.
3. Vestibulo-ocular reflex (VOR). This is a largely involuntary reflex that allows one to fixate on an object while the head is moving. The eye motion is controlled base on angular accelerations sensed by the vestibular organs.
4. Optokinetic reflex. This is the combination of smooth pursuit and saccades mentioned above used to track fast objects.
5. Vergence. Motions used to align the two eyes on the same object (eyes fixating on the same object is called stereopsis). If an object is closer than a previous fixation, then convergence motions are used (eyes rotate towards each other). If an object is further than a previous fixation, then the opposite kinds of motion a divergence motion is sued.
6. Microsaccades. These are fast erratic motions of less than a degree of arc. They are used to do things like improve visual acuity, reduce perceptual fading due to adaptation, and resolve perceptual ambiguities.
7. Rapid Eye Movements. These occur while dreaming when asleep, so are not really applicable to VR.
Most of the time the head and eye are moving together. The total range of vision while not moving ones body is the combination.
The eye can look left to right about 35 degrees, can look up 20 degrees and down 25 degrees. The figure above shows this in combination with head motion ranges.
The fact that the eye can't look up as much as down means it is optimal to place the center of a VR display scene slightly below the pupils when looking directly forward.

Implications for VR

Physiological properties such as photoreceptor density or VOR circuitry directly impact the engineering requirements for visual display hardware -- we only need the system to be good enough to fool our senses, we don't need levels of quality that are well beyond the limits of our receptors.
For a VR display, three crucial factor for "good enough" are:
1. Spatial Resolution. How many pixels per square area are needed?
2. Intensity Resolution and Range. How many intensity values can be produced, and what are the minimum and maximum intensity values? (We will look at color resolution and range later. Scotopic vision is not handled by any current VR systems)
3. Temporal Resolution. How fast do displays need to change their pixels?

How Many Pixels is Enough?

The middle figure above shows a highly magnified display.
Even zooming out to the image on the left, we still perceive diagonals lines as jagged, a phenomena known as aliasing.
Another kind of artifact is the screen-door effect apparent in the middle image where we can see the darkness around and between pixels.
In 2010, Apple's Steve Jobs claimed that having a 326 pixel/inch display was enough to avoid such issues. One could ask, was this reasonable, and how it does this apply with respect to VR displays?
One issue is that red, green and blue cones are arranged in mosaic pattern in the retina, not in precise grids, and not with thee same densities.
Vision scientists have look at the smallest kinds of dots of particular colors can be perceived on a surface.
One commonly used concept to used to measure some components of this is cycles per degree, the number of stripes that can be scene separately along a viewing arc. (Stripes in letter E in right image above).
Normal vision on a Snellen Eye Chart (20/20 vision or 6/6 vision in metric) corresponds to barely making our the horizontal stripes of a letter "E" from 20 feet away. Here the letter height is scaled to correspond to 30 cycles/degree from 20 feet away. This corresponds to 1/6 of a degree of arc.
Using trigonometry `s = d tan theta`, so we can determine that moving the chart to 10 feet would make the letters appear twice as large on the retina.
To be able to generate 30 cycles/degree we need to display at least 60 pixels per degree at 20 feet. This works out to 14.32 PPI at 20 feet. At 12 inches from the eye, it works out to 286.4 PPI. In the case of VR, where with the lens the effective distance is about 1.5 inches, we would need 2291.6 PPI. In comparison a Gear VR system gives us about 577 PPi (Oculus Go not much better).
For people with really good vision you might need as high as 4583 PPI.

In-Class Exercise

How tall would the letter `E` of the last slide be if it is 1/6 of degree high at 20 feet?
Post your solutions to the Apr 10 In-Class Exercise thread.

How much field of view is enough?

From our earlier discussion, the maximum human field of view is about 270 degrees, much larger than what can be provided by a flat screen (180 degree).
Bringing the screen closer, so that it fills more of the field of view, would both require higher pixel densities, but also cause lens aberrations.
You don't want your eyelashes to bonk into the screen -- so you can't make it too close.
Curved screens might help solve some of these issues.

Human Vision

Outline

Introduction

Neural Layers of the Eye

Away from the Eye

Eye Movement

Eye Muscles

Types of Eye Movements

Implications for VR

How Many Pixels is Enough?

In-Class Exercise

How much field of view is enough?