Digging Deep

My last blog looked at some of the challenge of pushing a living system to change, since they often resist.  I suggested listening and blending as a critical component, as it provides an understanding of the values that anchor the system.  In the Push Hands analogy, the drive to remain upright and “rooted” can cause a system to sometimes contort itself and make itself vulnerable.  People who cling to a particular vision or value too strongly and inflexibly can find that the work of clinging works against them.  A deeper understanding of where values and meaning come from allows you to root even deeper, surrendering short term objectives in particular situations, in order to maintain integrity and a larger picture.

Although I imply it is important to learn the goals, objectives, values of whomever or whatever system you engage with, it is also important that listening go deep enough.  It takes time and work to explore a system, and it is tempting to stop as soon a clear vision appears that can guide action. Indeed, sometimes that is all we have time and energy for.  Nonetheless, it is an important skill to practice so that it becomes easier and more efficient.  You will eventually have to reassess.  Values and goals change as one grows from child to teen to adult.  The environment is constantly changing, and the technological pressures I raised before are shifting the playing field so that goals have to be constantly redefined.

One advantage to spending the time to go deep is that systems tend to change less quickly as you dig into the lower layers.  History embeds itself into systems whether it is in physical scars of tendons and ligaments, emotional scars of past traumas, or historical patterns of past social behaviors that continue to return to a nation or other self identified group of people.   “Going deeper” is a process of probing different layers of time.

Image of the layered rock walls of the Grand Canyon
The Grand Canyon, carved out of the Colorado Plateau by the Colorado river

It is easy to see layers in the exposed canyon walls of Zion or the Grand Canyon. These layers reveal a history of oceans coming and going in the sedimentation of sand, silt and corals creating sandstone, shale, and limestone.  We can find evidence of past erosion and geologic uplift.  To know the earth – where it has been and where it might go, we need to learn the implications of what we find in each layer and how it is related to other layers.  We want to imagine the different possibilities: what would make a layer thicker or thinner?  What if the weather changes or the composition of the atmosphere?  How about a change in the tilt of the earth’s axis or the rate of its precession?  These mental experiments help clarify the connections between the different processes that create geologic history.  They also reveal the assumptions inherent in the deductions we make from evidence, identifying what we believe important in prediction and what we believe is unimportant.  For scientists, this process gives birth to the hypotheses we must test to find the relevance and accuracy of our ideas.

For more information about the layers of rock in the Grand Canyon and how they formed, one online resource is at kaibab.org.  This describes some of the geologic processes that formed the different types of rocks as well as some of how we acquired that understanding.  Geology interfaces with meteorology and the layers of the atmosphere and the flows there which in turn connect to astrophysics and the flows of the solar wind, magnetic fields and orbit of the planet.  Precession is the rotation of the axis of a spinning body, and in the context of the earth, it creates a 26,000 year cycle in the movement of the stars in the sky and in related myths from solar ram to celestial ocean.

Image of layers of the retina.
Layers organize neurons of the retina.

Our brains are also layered in planes of associated neurons. Neuroscientists have been exploring the construction of these layers and how they combine together to create the functions of perception and understanding.  For vision, light activates the rod and cone photoreceptors in back of the retina, triggering shifts in the patterns of electrical response that propagate through bipolar neurons in the inner nuclear layer and then to ganglion cells in the ganglion cell layer.  These signal are relayed through the layered thalamus to the layered visual neocortex.

Our understanding of how the layers of neocortex function grew out of the research of Hubel and Wiesel beginning with their famous 1959 paper, Receptive fields of single neurones in the cat’s striate cortex.  Their work eventually garnered them a Nobel prize in 1981.  A readable account, putting their research in context is Recounting the impact of Hubel and Wiesel.  Their 1962 paper, Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex, describes how the “simple cell” of the visual (striate) cortex could take the center/surround contrast information encoded in the output of retinal ganglion cells and in turn create their own output encoding edge orientation and location for a next layer of “complex” cells.  They also describe the processing steps of the “complex” and “hypercomplex” cells.
Note that although I present a serial description to the flow of information that implies functional layers, these cell types are not located in separate physical layers.

Image comparing of layers of visual cortex in different primates
Comparison of layers of visual cortex in different primates from P Balaram, JH Kaas, Frontiers in neuroanatomy, 2014.

There are usually six layers to mammalian neocortex. The output from the eye through the lateral geniculate of the thalamus is distributed to cells in layer IVc about 66 msec after stimulation.  A study by Sillito in 1977 found “simple cells” in layers II, III, IV and V (mostly layer IV).  They identified three classes of “complex” cells: type 1 found in layers II, III and IV (mostly III), type 2 found in layers II-V (mostly III and IV), and type 3 found in mostly in layer V.  “Hypercomplex” cells were only found in layers II and III.  For those who are visual learners who want a better understanding of all this, visit the Brain.

In this case, time allows us to track the progression of information through ‘layers’ of the brain.  Whether layered retina or the layers of the neocortex, energy and chemical changes are cascading in staged patterns that give insight into how we perceive, how we react, and into a prior history of previous cascades that may learn or get stuck in loops.

This understanding of the layered transmission of information has inspired the design of artificial neural networks, particularly convolutional neural networks.   Unlike programming paradigms where the programmer specifies a set of instructions of what pattern to look for and how to look for it, these networks learn to recognize patterns through exposure to hundreds of thousand training examples that are each accompanied by a simple target response to “supervise” training.  The target defines what is “good”.  These systems rely less on the intelligence of a specific programmer, and more on the availability of big data sets to define what is desired.

The convolutional neural net (CNN) when applied to vision uses multiple kernels that are each mathematically convolved with the matrix of pixels forming an image, producing a filtered image or feature map.  The amount of emphasis to put on each different area of a particular kernel is adjusted based on errors calculated from a comparison of the prediction that the network makes and the actual desired output defined by supervision.  Thus, through repetition, each filter is trained to better identify a helpful feature in producing the desired output.

MNIST digitsThis process is described in detail in LeCun et al.’s classic 1998 paper, Gradient-based learning applied to document recognition where the authors applied these principles to the task of recognizing the ten handwritten individual digits. After trained on a set of 60,000 examples, their “LeNet” acheived an error rate of less than 1%.  It used two layers of convolution filters, which allowed the network to make a complex feature map of the first level (simple) feature map.  Other layers completed the network by connecting the outputs of the feature maps to a mapping of the desired output (that’s a one, that’s an eight).

Example from ILSVRC
Example from ILSVRC

More recently, in the 2014 ImageNet Large Scale Visual Recognition Challenge, the competition task was to identify the location and name of objects in a photo given 1.2 million examples like the one to the left.  The “GoogLeNet” team won with a convolutional network with 22 layers attaining a 6.7% error.  They published their work in the article, Going deeper with convolutions, which is freely available in an Open Access version.  For those who like a talk, I will be presenting on Deep Learning using the MXNet framework in Julia at CSUMB in April.

As computer scientists and companies like Microsoft, Facebook and Google delve more into “deep networks”, it is becoming clear that we can create progressively deeper networks that can recognize progressively more complex patterns.  These systems are already being used for speech recognition in phones, face recognition in cameras, and emotion recognition in texts and tweets.  The ubiquity of devices that can create large amounts of data dovetails with the training demands of these systems.  These trends may also converge with the tendency to feel overwhelmed with technology and modern life demands and a desire to have machines solve our problems for us.  People want cameras to automatically focus on a face and not a nearby bush.  They want to be able to talk to their computers and be understood, without having to go through the work of precisely defining what they want.

There are some people who want to manually focus their cameras.  There are others that want to directly specify how they want their computers to operate.  However, with an increasingly complex and demanding society, there will be fewer people with the energy, knowledge and time to grow their own crops, clean their own house, complete their own taxes, maintain their own cars, and direct negotiations with neighbors who are polluting the air or hoarding the water.  We will become increasing reliant on our technology.  As that technology begins to learn and respond to more details of who we are and what we want, it is incumbent upon us to be clear in defining what is important to us and how we define that. Paranoid people might want guns or bombs to keep themselves safe, but those tools are not paranoid in themselves – only the way they were applied might be.  When tools begin to learn from example, however, it may be that paranoid people make paranoid systems, aggressive people make aggressive systems, and balanced people make balanced systems. The paranoid systems would prioritize personal safety, aggressive systems would focus on how to exert more force, and balanced systems would attempt to link those systems with their opposing counterparts.

We need to supervise our deep networks, and to do so, we will need to look deeper into ourselves and our values and develop a deeper understanding of how we connect, survive, and thrive.  Digging into the past and learning the slower rhythms that compose our core is critical if we want our next generation of tools to work to support of our sense of self and our humanity.