Last week, we introduced the Javascript language, the language we will use to
script WebVR scenes.
We then talked about A-Frame.
We looked at basic A-Frame primitives, how to position them, and how to change their properties.
We looked at how to make links, cursors, controls in A-Frame and how to make objects respond to events.
We then started talking about Chapter 2 from our book which is how VR systems are implemented in hardware and software .
We learned about sensors, sense organs, and degrees of freedom of a sense organs configuration space.
We start today by continuing talking about VR Hardware and Software.
Virtual World Generators
The image on the left above is from last day and represents a simple model of how a sense organ interacts with the brain and VR System.
We define a virtual world generator to be software that runs on a computer and that produces "another world", which could be either a recording of the real world or synthetic.
A human perceives the virtual world through each targeted sense organ using a display, which emits energy that is specifically designed to mimic the type of stimulus that would appear without VR.
The process of converting from the VWG internal storage into output for the display is called rendering.
A display need not be visual. For example, the display for the sense of hearing is often called a speaker.
If the VR system is effective, then we hope to "fool" the brain as in the image on the right.
World-Fixed versus User-Fixed Sound
Aural displays such as the surround sound system above are world-fixed in that the user moves around within the display.
Headphones, on the other hand, provide input directly into the ear as a sensory organ and so are user-fixed.
Some differences between world and user fixed systems are the following:
The stimulus is generated farther away in a world-fixed system than in a user-fixed system.
Because of the above, world-fixed systems usually require more power.
World-fixed systems tend to be less private than user-fixed systems.
On other hand, world-fixed systems tend to be more comfortable over longer time-scales.
Several people can also more easily enjoy the same experience with a world-fixed system.
3D sound tends to be harder to do with a user-fixed system.
In the real world moving your head should effect the sound, this is not so easy to accomplish in a user-fixed system without head-tracking sound.
World-Fixed versus User-Fixed Visual Input
For video displays, an example world-fixed display is a cave automatic virtual environment (CAVE) as shown in the figure to the left above.
An example user-fixed, video display system could be a headset such as the Rift (as seen on the right above), Vive, or Go.
Tracking is even more important for headsets than headphones to be convincing.
If your head moves in one direction, it is critical for the headset to rotate the scene in the opposite direction by precisely the correct amount to simulate what you would be looking at.
If tracking is not timely and accurate, one tends to get VR Sickness.
Even for CAVEs, you might need to do tracking and update the screens if your eye position within the scene changes.
Quiz
Which of the following statements is true?
A-Frame extends basic HTML with custom tags.
Javascript is a strongly typed language.
Rigid objects that can move through space have at most 3 degrees of freedom.
Hardware Components of a VR System
The hardware components of VR systems are classified as:
Displays: Devices that each stimulate a sense organ.
Sensors: Devices that extract information from the real world.
Computers: Devices that process inputs and outputs sequentially.
Components in Detail - Displays
Most CAVE systems use a combination of digital projectors and mirrors.
With the drop in price of flat panel monitors, such monitors are more frequently being used to replace projectors.
Headsets often leverage the latest LED display technology from the mobile phone industry.
Headsets nowadays typically provide between 1 to 2 megapixels per eye with a 60-90Hz refresh rate, depending on the headset.
Vergence-accomodation mismatch is the problem with some VR displays where the eyes get tired because unlike in the real world where the eyes both need to change in focus and convergence for nearer objects, in VR headsets they may only need to change in convergence. This causes problems because the eye still wants to change focus.
Display technologies like light field displays and multi-focal-plane displays try to solve this problem.
For sound displays, classic speaker technology is often used.
Alternatively, bone conduction methods, which vibrate the skull, propagating the sound to the inner ear, might be used.
For the sense of touch, haptic displays as shown in the images above might provide pressure, vibrational, or temperature feedback.
Components in Detail - Sensors
We now consider what things a VR system needs to sense.
For visual and auditory user-fixed displays, we need to keep track of position and orientation of the sense organ that needs to be tracked.
Orientation is usually tracked by an inertial measurement unit (IMU) which in turn consists typically of a gyroscope to measure rate of rotation in terms of three components of angular velocity.
The gyroscopes used in phones and headsets are commonly microelectromechanical systems (MEMS) (see figure above). These rely on vibrating structures in each of the axes and measuring the extent to which the vibrating structure experiences coriolis forces to measure changes in orientation.
Angular velocity measurements are integrated over time to determine a cumulative change in orientation.
Drift error in this total change in orientation is corrected for by using additional sensors (also implemented in silicon) such as accelerometers (measuring forces of acceleration) and magnetometers (measuring magnetic field changes).
Cameras are another important type of sensor. They often exploit line-of-sight visibility to track moving objects against a stationary background.
Cameras can be used to track eyes, heads, hands, whole bodies, or other objects in the physical world.
Depth cameras, such as in a Microsoft Kinect, which can do more accurate tracking in 3D, often bounce IR signals off objects to determine distance based on reflections.
Components in Detail - Computers
Computers are responsible for executing the code of the Virtual World Generator.
The location of the computers is less important for world-fixed systems.
For headset based systems, the computer can either be in the device itself (Go, smart phone based systems) or a PC attached to the device by a tether (Oculus Rift, HTC Vive).
In addition to the main computing processors, one typically also has graphics processing units which have been optimized for quickly rendering graphics to a screen.
Microcontrollers are frequently used to gather information from sensing devices and send them to the main computer using a standard protocol such as USB.
Above we see a tear-down of an developer version of the Oculus Rift.
VR Software
A VR Engine is a software platform which allows users to provide high level descriptions of a VR environment and then determines the low-level details automatically to simulate this environment.
It plays roughly the same role that a game engine plays for developing video games.
At this point, there are no well established VR Engines.
A-Frame provides some of the features we would like of such a engine.
Typically, however, people either start with a basic Software Development Kit (SDK) from a VR headset vendor and then develop a VWG from scratch or choose a VWG which is ready-made but geared to a particular domain, such as games, and make use of VR extensions to it. (Unity 3D approach).
Software Components needed for a VR Experience
A VWG needs to maintain enough information about its internal reality, and process enough information about its external reality, so as to be able to calculate outputs for its displays.
This might involve keeping track of lights, textures, graphics models made of triangles, video, audio sources, and virtual world coordinates of these to represent its internal world.
It needs to also be able to match motions from the external world of the organism to internal motions in the virtual world.
Matched motions often involve a safe region in the external world called a matched zone in which the user in the external world is allowed to move and to which the virtual and real world perfectly align.
Since users typically want to move more than in the safe zone in the virtual world, the VWG needs to be able to handle locomotion over longer distances while in the real world the users only moves in the matched zone. This might involve various in-world ways to teleport from location to location.
Objects in the virtual world will typically move, and so the VWG needs to be able to handle how to update objects, simulate their physics, detect if objects are colliding and if so how to handle it.
Finally, we often want to be able handle multiple people enjoying a VR experience, so the VWG should be able to handle networking between different users and their interactions.
Human Physiology and Perception
Perceptual psychology is the science of understanding how the brain converts sensory stimulation into perceived phenomena.
It can inform how we design our VWG by allowing us to give more realistic answers to question such as:
How far away does an object appear to be to the user (as opposed to how far way it is)?
How much video resolution is needed to avoid seeing pixels?
How many frames per second are enough to perceive motion as continuous?
Is the user's head appearing at the proper height in the virtual world?
Where is a given virtual sound coming from?
Why is this VR system making me throw up?
Why are some VR experiences more tiring than others?
What is presence?
Studying (1) basic physiology of the human body, sense organs, and neural pathways, (2) key theories and insights of experimental perceptual psychology, and (3) the interference between VR system and own perceptual processes, can help us answer these questions.
Optical illusions
Optical illusions such as the Ponzo illusion of two identical length yellow bars in a one point perspective scene, or the checker illusion of two identically shaded squares, one supposedly in shadow, one not, illustrate that the brain does a variety of behind the scenes processing to determine depth and color.
These inferring of what the real-world must be like require computational energy.
They also show that it is unwise to make the situation harder for a user by going against visual expectations.
Classification of Senses
Perceptions and illusions are not limited to the eye.
Above is a chart of the basic senses indicating for each sense the kind of stimulus it handles, the kind of receptors that are used to measure that stimulus, and the sense organ in which those receptors reside.
Notice each receptor is targeted to a particular kind of stimulus. This is called sensory system selectivity.
For example, in your eye you have around 100 million photoreceptors that target electromagnetic energy in the frequencies of visible light.
The least familiar organ in the above chart is the vestibular organ of the inner ear.
Most senses have engineering equivalents. For example, you could install chemical sensors and pH meters to measure taste and touch.
We could, in theory, using such engineering equivalents record accurately all of our senses and replay them either later or in a different location (in say, a telepresence system).
Brains and Processing
Perception happens when the sense organs convert the stimuli into neural impulses.
These then go to the brain which is made up of 86 billion or so neurons.
The brain then processes these impulses.
About 20 billion neurons in your brain, the cerebral cortex are dedicated to
processing perception as well as many other high-level functions such as attention, memory, language, and consciousness.
In comparison, a round worm has 302 neurons, a fruit fly has 100,000, a rat has 200 million, and an elephant has 250 billion.
Only mammals have a cerebral cortex.
Another important factor in perception, and overall cognitive ability, is the interconnections between neurons.
In the human brain, each neuron is connected to around 7000 other neurons via synaptic connections.
If we imagine the brain as a graph it thus would have about `10^15` edges.
Hierarchal Processing in the Brain
Upon leaving the sense-organ receptors, signals propagate among neurons to eventually reach the cerebral cortex.
Along the way, hierarchical processing is performed.
Through selectivity each receptor responds to a narrow range of stimuli, across time, space frequency, and so on.
After passing through several neurons, signals from numerous receptors are simultaneously taken into account.
This allows for increasingly complex patterns to be detected.
In the cerebral cortex, these signals from sensors are combined with anything else from our own prior experiences that may be relevant in interpreting the stimuli.
Perceptual phenomena such as recognizing faces, etc occur.
Together these combine to give a global picture of the world around us.