A thesis submitted to the School of Design, Carnegie Mellon University, for the degree of Master of Design in Design for Interactions.
May 2019
Spatial computing (VR, etc) reveals an expansive and underexplored possibility-space of interactions in which the physics subtending affordances and phenomena can itself be designed, rewarding novel approaches to interaction design.
Through reviewing literature and prototyping spatial interactions, this thesis explores the impact of previously-unencountered physical dynamics upon the development of familiarity with systems and identifies significant representations of external objects and the body itself, with an eye towards the larger goal of transformative tools for thought.
While identifying promising avenues from the convergence of literary sources, this thesis synthesizes research and investigates applications of interactional dynamics by designing prototypes of spatial interactions given different materialities and computed physics, gaining insight through direct engagement with novel spatial phenomena. These VR prototypes illustrate design considerations for now-accessible interactional and material unorthodoxies, recognizing consequences and applications for embodiment spanning body-environment fusion (in depiction and interaction), multisensory integration, and high-dimensional dataset traversal.
Historically the mediums and notations available in a time period have contributed to and somewhat delineated the set of available thoughts. As a medium, spatial computing, by such nuance of bodily input and output and such rigor of calculation, likely affords domains of thought and experience barely conceivable at present.
Spatial computing offers a unique and mostly unexplored design space where the body can spatially engage with a vast parameter-space of novel, interactive, computed / simulated physics systems ("hyperphysics"). Since most of these computed phenomena diverge greatly from the laws of physics we grow up immersed in, there is opportunity to grow familiar via direct, embodied engagement.
Hyperphysical materials and structures' wide set of behaviors and coordinations enable not only powerful tools and external representations, but also novel representations and functions of the body in space. This is largely unexplored territory as design has classically only operated on external structures. In spatial computing, the environment and the body are equally available to design, enabling new species of depiction and agency relevant to a spatial interface designer's conception of perception and neuroplasticity.
Through the course of this thesis, I worked primarily with Leap Motion hand tracking in VR, documenting how familiarity develops from direct, embodied engagement with novel hyperphysics, through a series of prototypes spanning body-environment fusion (in depiction and interaction), multisensory integration, and high-dimensional dataset traversal.
Historically the mediums and notations available in a time period have contributed to and somewhat delineated the set of available thoughts. As a medium, spatial computing, by such nuance of bodily input and output and such rigor of calculation, likely affords domains of thought and experience barely conceivable at present.
The process of design classically operated solely in the domain of the physical, affecting only material objects and systems. Only relatively recently did design begin operating on computed systems, involving visual structures depicted on screen rather than structures subtended by material elements that give them their properties. That classical digital depiction was nevertheless a generally two-dimensional affair, with flat representations and comparably limited user input systems (in the form of discrete buttons, and exclusively planar mouse- or finger-movement). With the advent of modern motion-tracking systems, computers' depictive capabilities vastly expanded, gaining the ability to 1: provide coherent optical flow with precisely-tracked head movement and 2: represent the motion of the body in space, mapped to the tracked motion of limbs.
Here, now, came the ability to represent entire 3D environments and, critically, the subjective experience of being bodily immersed in an environment rather than the previous experience of manipulating control surfaces of a computer and seeing the effect isolated in a tiny fraction of the visual field.
Designers can now design immersive embodied experiences of different environments with different and arbitrary physical laws, whereas previously they were confined to everyday materials and the familiar set of physical laws subtending their behavior. The entirety of a designer's previous experience with immersive environments involved a singular set of physics, and now the design space is opened to entirely new classes of physics and phenomena, and thus new classes of affordances and designed objects.
The dichotomy here involves the dynamic of intuitions about the behavior of objects and their affordances, and the novel space of design that is opened up by spatial computing. Much modern design follows the trope that the best UI is as intuitive as possible, but how should this evolve within the domain of novel environments where no intuitions have as of yet been developed? An early answer might be to bring in and therefore perpetuate current UI and interactional grammars, as both users and designers are already intimately familiar with their dynamics and are already attuned to the structure and behavior of classical affordances. However, this serves to dilute the unique aspects of spatial computing, and replaces what can inherently support novel and expansive behaviors with a mere simulation of the mundane. This cycle is continually established with each successive new medium, this inertia of interactional grammars, and has calcified, for example, the inefficient QWERTY keyboard layout well past the point when typewriters mechanically jammed with any more optimal layout, or the paper-based document model even when screens could represent 3D forms, opting for the prior and the familiar over the new.
My caution here is that, even presented with this supremely expansive and unexplored domain of spatial interaction design and physical phenomena in immersive computing, designers will hamper themselves and their users by perpetuating old UI mechanics, turning VR etc into a rough simulacrum of the constrained physical world rather than the means for its transcendence.
Further, since so many of the established UI norms are constrained to the 2D input mechanisms of computing systems previously described, their reapplication into spatial computing fails to fully leverage the immense nuance of physical input that motion tracking affords.
If every designer and craftsperson aimed to only make objects that were intuitive and immediately apparent, could virtuosity exist? Does virtuosity not arise through practice with high-dimensional-input artifacts, 'tuning' the brain and body to the specific dynamics of the available affordances? If we constrain our movements to the limited scope of previous media, we might make intuitive artifacts, at the expense of allowing users to develop new intuitions for the new spaces they are gaining access to. The draw to provide already-intuitive interfaces hampers the full exploration of interface-parameter-space. More critically, if designers reside only in bubbles of their current intuition, they constrain their conception of what can be designed, and what might be.
People build familiarity with ordinary materials and objects, the "interactional grammar" of physical affordances. This presents a challenge if computed environments can diverge from that familiarity, and users expect certain behaviors from the start that confine the designer's hand (and mind) to provide only what aligns with expectation. On the other hand, leveraging these expectations while selectively breaking them with confined novel behaviors provides opportunities to slowly ween users away from their ossifications.
· · · · · ·
The physics system that we find ourselves in at birth determines the nature of our bodies, the phenomena we encounter, and the affordances available in the environment. We become familiar with and develop intuitions about the 'interactional grammars' that we repeatedly come into contact with. Or, as Sutherland (1965) puts it, "We live in a physical world whose properties we have come to know well through long familiarity. We sense an involvement with this physical world which gives us the ability to predict its properties well." This is the default state that designers have operated within since antiquity.
With the advent of computer-rendered dynamic media, phenomena could be represented that diverged from the phenomena driven by physical laws classically confining designed artifacts. This larger space of possible physical dynamics, of which the physics of our universe is but a subset, I refer to as hyperphysics. Since these phenomena are observed and interacted with by users who developed in ordinary physics, most users are presumably attuned to the nuances of phenomena (or do "not enter devoid of expectations that come from their previous experience" (Blom, 2007)) and may be immediately aware of the similarities, recognizing that "content that is familiar to the user from the real world will be initially and automatically considered the same as a real object" (Blom, 2010). Or, as Golonka & Wilson (2018) state: "When we encounter a novel object or event, it will likely project at least some familiar information variables (e.g., whether it is moveable, alive, etc), giving us a basis for functional action in a novel context". The challenge is how to communicate hyperphysical affordances that do not have exact analogues in ordinary physics.
This familiarity is built up via repeated exposure to consistent observed physical behavior, where covariance of stimuli unifies the parallel streams of input into singular percepts. Relevantly, this incentivizes designers to provide multiple sensory responses for a given phenomena or user action, fleshing out the validity of the subjective experience. A difficulty, however, is that without coordination between designers across experiences, the preponderance of divergent interactional grammars and hypermaterial depictions might inhibit users from developing overarching familiarities.
· · · · · ·
We find much of mundane physics intuitive because we develop and spend our whole lives fully immersed in it. When the environment offers a consistent set of phenomena and consistent responses to input, the brain becomes accustomed to the perceived patterns and builds a set of intuitions about the phenomena. Piaget (1952) notices that "adaptation does not exist if the new reality has imposed motor or mental attitudes contrary to those which were adopted on contact with other earlier given data: adaptation only exists if there is coherence, hence assimilation." This consistency comes from the fact that ordinary physics do not change over time or location, and the perception of the unity of events arises from multiple senses receiving coordinated impulses. In Gibsonian (1979) parlance, "when a number of stimuli are completely covariant, when they always go together, they constitute a single 'stimulus'". Piaget (1952), in noting that "the manual schemata only assimilate the visual realm to the extent that the hand conserves and reproduces what the eyes see of it", communicates the unification of tactile and visual sensory input, that merely "the act of looking at the hand seems to augment the hand's activity or on the contrary to limit its displacements to the interior of the visual field."
The usefulness of an environment is a function of its physical capacities, and thus the expanded set of hyperphysics within simulated systems supports, in principle, a proportionally-expanded usefulness. Direct bodily engagement is possible not only with simulations of micro- and macroscopic phenomena, but even more esoteric and unorthodox phenomena not directly realizable within our universe's laws. This vastly expands the space of interaction design, and rewards open and explorative mindsets and design approaches. Our neuroplasticity enables us to attune ourselves to the nuances of whatever our senses happen to provide, and this expanded space of computer-mediated experience supports untold applications of that plasticity.
· · · · · ·
Concepts which never before had any visual representation can be shown, for example the "constraints" in Sketchpad. By working with such displays of mathematical phenomena we can learn to know them as well as we know our own natural world. (Sutherland, 1965)
We lack corresponding familiarity with the forces on charged particles, forces in non-uniform fields, the effects of nonprojective geometric transformations, and high-inertia, low friction motion. A display connected to a digital computer gives us a chance to gain familiarity with concepts not realizable in the physical world. (Sutherland, 1965)
It is fundamentally an accident of birth to have been born into ordinary physics, but the mind is in principle capable of becoming fluent in many other physics:
Our perceptions are but what they are, amidst all those which could possibly be conceived. Euclidean space which is linked to our organs is only one of the kinds of space which are adapted to physical experience. In contrast, the deductive and organizing activity of the mind is unlimited and leads, in the realm of space, precisely to generalizations which surpass intuition. (Piaget, 1952)
A key constraint then becomes the ability of designers to envision novel physics to then manifest, as
...computers are so versatile in crafting interactive environments that we are more limited by our theoretical notions of learning and our imaginations. We can go far beyond the constraints of conventional materials... (diSessa, 1988)
Hyperphysics supports novel behaviors that have no necessary analogue in ordinary physics. Thus the entire structural, visual, and dynamic "language" of ordinary affordances is inadequate to fully cover all possible transformations and behaviors that hyperphysics supports. Even fundamental material behaviors like collision are not in principle guaranteed. Dourish (2004) describes how collision can be an essential property for certain useful arrangements:
Tangible-computing designers have sought to create artifacts whose form leads users naturally to the functionality that they embody while steering them away from inconsistent uses by exploiting physical constraints. As a simple example, two objects cannot be in the same place at the same time, so a "mutual exclusion" constraint can be embodied directly in the mapping of data objects onto physical ones; or objects can be designed so that they fit together only in certain ways, making it impossible for users to connect them in ways that might make sense physically, but not computationally.
However, the greater space of possible physical behaviors offers opportunities to create new affordances with new interactional grammars that can take advantage of the specificity of computing power and the precise motion tracking of the body.
The body's relationship to tools is often quite fluid, where prolonged use allows tools to be mentally fused with the body, and engagement with the world is perceived at the tool's interface with the world rather than the body's interface with the tool. Blind people can build a relationship with their cane such that "the cane is ... incorporated into [their] body schema and is experienced as a transparent extension of [their] motor system" (Heersmink, 2014). The opportunities for spatial computing are even more potent here, where the medium's capacities for tracking the body's motion allows an even greater mapping between the rendered environment's behavior and the user's motion than ordinary dynamic media constrained to two-dimensional screens and rudimentary inputs.
The ability to depict the body in novel and hyperphysical ways, while still mapping the depicted body's movement to the base movements of the user, enables profoundly transformative computer interfaces such as increasing the number of limbs,
Participants could hit more targets using an avatar with three upper limbs, which allowed greater reach with less physical movement. This was true even though motions mapped from the participants' tracked movements were rendered in a different modality (rotation of the wrist moved the avatar's third limb in arcs corresponding to pitch and yaw). Use of more intuitive mappings might enable even faster adaptation and greater success. (Won et al, 2015)
or changing the physical form of the hands to better interface with a task, as explored by Leithinger et al (2014): "...we can also morph into other tools that are optimal for the task, while controlled by the user. Examples include grippers, bowls, ramps, and claws — tools with specific properties that facilitate or constrain the interactions". The question then becomes how many familiar aspects to include so as to conserve intuition, framed by Won et al (2015) as "...what affordances are required for people to use a novel body to effectively interact with the environment?", especially when "such realism may reinforce the user's desire to move as he or she would in the physical world." Though, critically, the brain's plasticity allows for novel environments to eventually become quite literally second-nature, as in the classic Heideggerian example of the hammer, articulated by Heersmink (2014): "When I first start using a hammer, my skills are underdeveloped and the hammer is not yet transparent. But gradually my hammer-using skills develop and the artifact becomes transparent which will then alter my stance towards the world."
Ideally, the increased adoption and bodily engagement with hyperphysics will prove us with new tools to understand and represent not only the world around us at scales heretofore inaccessible (as Sutherland (1965) envisions about subatomic particles: "With such a display, a computer model of particles in an electric field could combine manual control of the position of a moving charge, replete with the sensation of forces on the charge, with visual presentation of the charge's position", but also purer forms of knowledge such as mathematical relationships), and will lift our minds to new heights as previous notations for thought have already done. Gooding (2001) articulates it well:
Computer-based simulation methods may turn out to be a similar representational turning point for the sciences. An important point about these developments is that they are not merely ways of describing. Unlike sense-extending devices such as microscopes, telescopes or cosmic ray detectors, each enabled a new way of thinking about a particular domain.
The sciences frequently run up against the limitations of a way of representing aspects of the world — from material objects such as fundamental particles to abstract entities such as numbers or space and time. One of the most profound changes in our ability to describe aspects of experience has involved developing new conceptions of what it is possible to represent.
As the scale and complexity of problems experienced by humanity grows, it is critical to augment our problem-solving ability, a large part of which involves the creation of new forms of representation, ideally giving us a better grasp on the most fundamental questions. Gooding (2001), again, articulates it well:
But the environment is increasingly populated by artefacts which function as records and as guides for reasoning procedures that are too complex to conduct solely with internal or mental representations. In this way we are continually enhancing the capacity of our environment for creative thought, by adding new cognitive technologies.
These tools are still in their infancy, and only through an open exploration of the frontiers of their possibility-space will we find the most powerful means to augment our intellect.
I aim to frame simulated environments, such as those found in virtual reality etc, as comparable to (or in the same spectrum as) notations (like mathematical notation) in that they are "environments" with rulesets that can be internalized. This is a framing I haven't explored completely, and I aim to use this section as an attempt to assay its consonance with other areas of interest within my overall thesis, namely the behavior of hyperphysical affordances and the opportunities for embodiment with simulated objects.
Our bodies have aspects that afford certain approaches to the world. By default these are determined by our physiology, and then circumstance edits that phenotype, augmenting our bodies with environmental objects that can be embodied. We are provided by birth with a body of environmental objects bound to classical physics that we gain facility in maneuvering around, that we feel identified and embodied with. When I use the term internalize, I mean to evoke this ability of the body and brain to gather information about the dynamics and behavior of objects sensed from the world via observation and, crucially, direct manipulation and physical engagement, such that the brain restructures to better model the perceived phenomena ("schema") as a sort of internal model. This is the development of familiarity, the transition from explicit guiding commands for limbs and attention to a more automatic, fluent engagement.
Objects in the world can be found or fashioned and incorporated into the body, and that now-changed body encounters the environment in different ways. Critically, as the environment is encountered repeatedly, the (perhaps newfound) capacities of the body collide with and interface with the environment, simultaneously giving the user/owner opportunities to internalize the dynamics of their body and the dynamics of the environment (particularly useful in the ways the environment is newly-accessible or perceivable specifically from the body's new capacities through embodied object augmentation).
The available elements within the environment need not be exclusively materials to be manipulatable. I see the manipulation of elements on/with
pages (as with algebraic notation),
screens (as with the interactional "grammar" of a certain software), or
materials (as with the pattern of operation/manipulation of beads on a soroban/abacus)
as being the manipulation of what the brain treats as an internally-coherent environment whose rules and parameter space can be explored and learned.
To take algebraic notation as an example, its spatial, modular structure of coefficients, variables, and operators has specific rules the user must follow when rearranging elements to maintain equality and mathematical truth. Crucially, the spatial operativity of modern algebraic notation engages the user in ways decidedly unavailable with prior notation-attempts. Though earlier algebraic notations are capable of describing the same mathematical relationships as the more modern algebraic notation, the paragraphic notation prevalent in Ancient Greece, while accurately articulating the mathematics, is notationally unavailable to spatial rearrangement in the way that algebraic notation engages spatial intuition. As a tool for thought, it does not afford manual manipulation of elements in a way that algebraic notation allows the user to explore the system through manual rearrangement. It is this manual operativity that I see as a quality of explorable environments, whether manifested notationally on static paper, or dynamically on screens, or with spatially-interactable objects (be them material or simulated).
These notations are, in a sense, internally-coherent environments created by humans, able to be partially inhabited (through the affordances of their supporting medium, classically though perhaps too often paper) by the body and thus the mind. The most powerful thing about some notations is that their ruleset is more internalizable, that their mode of operation can become purely mental, not requiring the initially-supporting medium, and their internalization scaffolds a mental model / simulation of that "environment's" ruleset/laws in the same way that we develop a mental model (or models) of our classical environment's laws, our brains able to simulate hypotheses and without even desiring so, pursue causal chains so ingrained in our set of expectations that it doesn't even feel like thinking or analysis, but something far more direct.
I now wonder if these schema, these mentally-internalized models of experienced environments (be them the classically spatial or more notational) form a sort of Gibsonian ecology in our own minds that via repeated engagement arranges itself into alignment with our external circumstances, whatever they may be (this is where I see simulated, virtual environments' hyperphysics entering into relevance). Might this be relevant for the development of expectation/preparedness/familiarity? I've wondered how Gibson treats prediction, as that does seem to require a sort of internal model/representation independent (though at basis directly dependent on prior sense impressions) of current sense data.
This power of the brain, to plasticly incorporate objects into itself when given enough time to wield them as it learns to wield the genetically-provided object of the body, becomes especially powerful when the objects to wield and embody have a range of behaviors beyond what classical physics allows, as is the case with computer-depicted-and-simulated objects as are interactable in VR etc. This connects back to my framing of notations as alternate "environments", with the key difference that the rules for (for example paper-based-) notation are maintained/forwarded by human mastery of that ruleset, and failures of "accurate depiction" if the rules are forgotten or a single operation is made incorrectly break the environment, whereas the computer ostensibly is rigidly locked into self-accuracy, not to mention the orders of magnitude greater depth of ruleset simulation possible by digital computation.
This greater range of possible behaviors boggles the mind, which makes the job of the designer difficult, and the exploration of the parameter space of possible "universes" of behavior rulesets to find the most useful (and embodiable) simulated objects/phenomena will be a cultural, likely generational project.
A role of many designers has involved tool design within the classical physics of our lived environment. As computers became ascendant as tools, their simulating ability allowed the design of phenomena (UI) that could behave in ways other than classical physics, specifically allowing novel tools for thought and thus novel ways of situating/scaffolding the mind. However, the depictive media (e.g. screens) available to represent computed phenomena available were too often exclusively two-dimensional, with only two-dimensional input, failing to leverage the nuanced spatial facility of the body. Now there exist computing systems capable of tracking the motion of the head (thus orientation within possible optical arrays) and the motion of any prehensile limb, capable of simulating three-dimensional phenomena and providing a coherent and interpretable optical array as if the user was themself present in amidst the simulated phenomena.
Critically, a role of the designer no longer purely involves the design of phenomena within physics, but has come to also encompass the design of the physics themselves, exploring how different system-parameters can sustain different phenomena, different notations, and thus new modes of behavior, productivity, and thought.
In the following sections I am going to discuss prototypes (available at github.com/graycrawford/masters-thesis-VR) I built to investigate the development of familiarity with novel embodied interactions.
Raymarching signed distance fields (SDFs) is a method of rendering 3D shapes without using polygons. It defines each object as its geometric primitive, each influencing a shared surrounding concentric "distance field", and renders an isosurface at a given radius away, thus visually fusing any objects that are within 2r of each other. This property produces very organic forms, where any collision smoothly joins the objects into a melted, singular mass.
I had seen this technique used for external objects, but never for the rendering of the hands themselves, and I suspected it might be quite compelling.
After placing two raymarched spheres on my thumbtip and index fingertip, upon pinching the spheres smoothly transitioned from separate objects to a rounded hourglass to a single ellipsoid. I added the other eight fingertips and populated the world with a sphere and a couple cylinders to observe raymarched fusion with external objects. This was immediately mesmerizing, and changing the effective isosurface radius changed my hands from separate spheres only overlapping within close proximity to a singular doughy mass where the underlying proprioceptive motion remained intact if not only slightly masked.
I added spheres for the rest of my finger joints and knuckles, and found that it felt slightly more dynamic to only include the joints that I could move separately. My knuckles weren't contributing to the prehensility and only added to the lumpiness's visual extent, so I removed them.
I envisioned that this rendering technique might allow hands wherein the UI was fused with or emitted out of the body directly, elements stored in the palm until their activation.
I used a torus as my palm, as it leaves a circular hole that a spherical UI element could fit in. Upon activation when the sphere floats above the palm, the torus offers negative space behind the sphere which provides extra visual contrast, heightening the appearance of the floating UI. By rising above the palm, the sphere delineates itself from its previously-fused state, spatially and kinetically demonstrating its activeness and availability. This materiality prototype operates more as a wireframe, as a chance to engage with the dynamics of these species of meldings without an immediate application. The sphere is pokable and pinchable, perhaps the type of object that could be pulled away from its anchor and placed somewhere in space (expanding into a larger set of UI elements).
On my right hand, instead of a prehendable object, I wished to see how something closer to a flat UI panel might behave amidst the hand. To remain consistent, I again chose the torus as the palm, and embedded a thin disk in its center that, when the palm faces me, rises above the palm a few centimeters. While docked, the restrained real-estate of the torus again provides the panel breathing-room such that the pair do not, in their fusing, expand to occupy a disproportional volume. In its current implementation, the panel remains the same size through its spatial translation. Future development will change its size during translation such that in its active state it is much larger and might perhaps be removable, existing apart from the hand as a separate panel.
These experiments begin to touch on this novel materiality, and point at ways that UI might be stored within the body, perhaps reinforcing an eventual bodily identification with the UI itself. Further, the ways that grabbed objects fuse with the hand mirrors how the brain assimilates tools into its body schema, and begins to more directly blur the line between user and tool, body and environment, internal and external.
It is common in mundane [in-universe / established / classical / common / foundational / base / "real"] physics to act upon objects directly or via intermediating structures that mechanically extend us. Thus we become familiar with relying on our bodies and tools to interact with external structures. However, computers can simulate hyperphysics that support action at an (apparent) distance, the simulation itself being the subtending structure allowing for mediation across more than physical movement, but any pairing of parameters. How might this capability affect perception of causality between events and user agency?
In the simple case of a hyperphysically simulated, purely movement-based causal link, the computer keeps track of every detail of the interaction, translating the movement of the acting object to the affected object even if such objects are not mechanically in contact as we would expect in mundane physics (generally — of course magnetism and gravitation act at a distance but I refer to more common human-scale interactions). This leads to hyperphysical interactions without any necessary and apparent visually intermediating structures, potentially confusing users expecting consistency with mundane physics.
When phenomena are driven by body motion, and those mappings grow increasingly distant or slight, what is the sensation when discerning the connection between our proprioception and the visual structures?
For example, a direct 1:1 mapping of a sphere's position to the palm's position allows immediate identification with the driven structure, as visual movement is immediately correlatable with proprioception, even if the structure differs from our physiology. However, a mapping where [body part location :: sound frequency or amplitude] is not a sensorially-equivalent mapping of like with like, but the agency is still apparent (if the scaling is such that the sound changes are discernible), and that recognition only falls out of direct engagement and perception of the effects.
It can become increasingly difficult to identify one's self with the observed phenomena as the amount of abstraction between the input body data and the output environmental behavior increases, as seen in Dust Bodies. The Unity VFX Graph particles conform to the signed distance field of my hand, and the raw visual input varies greatly though is nevertheless causally mapped to my bodily motion. Some particles may be so close to my hand SDF that they are drawn quite closely to my hand's motion and are thus positioned essentially 1:1 with my hand surface, being immediately correlatable and thus identifiable as me, whereas further-flung particles receive weaker forces drawing them towards the SDF and enter ever-changing and decaying orbits that somewhat mask the structural basis for their motion.
Whereas the previous particle systems were reactive on a massive scale, they were unable to collide with each other. In the search for a more physically reactive system, I discovered NVIDIA's Flex particle simulation library for Unity. Flex allows for many thousands of colliding particles, and as long as they are all the same radius, they can be meshed into flexible fabrics, enclosed volumes, rigid- and soft-bodies, or remain free-flowing fluids.
Without a strict plan of attack, I began placing particle emitters on my head and hands, any place that offered me manual control over the placement of the particles in realtime. Immediately I found the physical dynamics captivating, spending multiple hours tuning the available parameters. As the particles were emitted from my palm and came to collide with a flat surface I erected, I changed the parameters of the simulation such that there was supremely increased friction, causing the particles to bunch up immediately upon colliding with each other or the surface, behaving more like a highly viscous goo.
As mentioned in the Coherence and Coordination of Phenomena section, simultaneous stimuli across the senses reinforce each other to form more concrete perceptions of given phenomena.
Physical controllers such as the Oculus Touch provide haptic feedback with embedded vibration motors, and their haptics are used very successfully in many VR experiences. Valve's The Lab's Longbow provides haptics when the user draws the bowstring back, replicating the subtle clicks when tensioning a string. When combined with stereo audio feedback spatialized from the location of the string, the visual feedback of the string, and the proprioception of the pulling finger, these sensory channels fuse into a relatively rich perception of a bow. Curiously, sensations not directly provided by the VR output surfaces can be somewhat provoked or confabulated out of the perceptions from other senses. The combination of sense inputs from Longbow serve to also hint at the sensation of tension between the arms, forces a real bow is able to impart but that are impossible with the provided technology. Similarly, passing the hands through the stack of physically-reactive cloths in my 2018 VR music piece Strata has for myself and others independently produced the sensation of cobwebs or some light tactile texture, even though there is only visual indication of the dynamics and materiality of the physical structures.
Spatial, embodied familiarity can be applied to the control and exploration of non-spatial structures, such as might exist purely mathematically, with no explicit physics applied to any sort of objects with spatial extent. Though these non-spatial structures aren't themselves spatial in the sense of how our bodies and minds experience mundane space (or most spatial computing experiences (VR, etc)), since the (spatial) body is tied to the control of computers generating the non-spatial structure, a comparable embodied familiarity is available, though unique in its divergence from the body's motion's conventional unification with explicitly spatial outcomes. Similar dynamics arise involving the development of embodied familiarity with non-spatial structures.
BigGAN and High-Dimensional Space
In the midst of this thesis a new image generation technique was published: BigGAN. This species of neural network, when trained on massive dataset of photographs, can come to generate strikingly photoplausible images via the coadaptation of two opposing (or Adversarial) networks. The "generator" network attempts to increasingly accurately reproduce from scratch images to appear as though they were taken from the training dataset, and the "discriminator" network attempts to increasingly accurately differentiate the generated images from the ground truth dataset images. The pixel differences between generated and dataset images are then used to backpropagate through the neural networks and adjust the mathematical weights amidst the neurons to, over time, attune the generator and discriminator to generate and discriminate increasingly subtle visual and structural characteristics present in the original dataset.
In practice, BigGAN (as trained by Google on fourteen million photographs from ImageNet, those neural weights made freely available) can be made to generate unique images from the startlingly realistic, to the curious and absurd, to the extremities of human conceivability. BigGAN develops a structure which categorizes the visual patterns relevant to identify the provided categories of images into clumped localities where, for instance, all the lion-categorized images occupy a locally adjacent space, the more similar of each lion images themselves clumping in more local clumps, etc. A different clump far distant would contain all the images similar to, for instance, bedrooms. And a spot equidistant between those two clumps would contain a smooth interpolation of visual structure and color to intermix bedroom and lion.
This structure, or "embedding", has one thousand dimensions / sliders / parameters controlling the relative influence of each of the one thousand image categories chosen by Google to train on, taken from ImageNet.
BigGAN contains a second type of dimension, called the z-vector, a 128-dimensional vector, with 128 components / sliders / parameters. The z-vector affects primarily compositional and structural aspects of the image, whereas the 1000-dimensional category vector affects the colors and textures of the image. BigGAN creates an image when it receives a command containing the 1128-dimensional vector that is the coordinate of the specified image.
I collaborated with fellow student Aman Tiwari (CMU, 2019) over the course of one month to create an interactive BigGAN explorer controlled by body motion entitled Xoromancy. By surfacing fourteen components of the z-vector to be controlled by the motion of the hands, seven per hand, Xoromancy provides opportunities for proprioception to sync with the visual output from BigGAN, increasing the rate of exploration and reinforcing the perceived agency over and unity with the dynamics of the output.
Though the body classically builds fluency with spatial, physical interactions, Xoromancy demonstrates that the spatial extent of the body can be tied to explicitly non-spatial outputs while nevertheless retaining the species of familiarity-development common to spatial interactions, as the output responds in concert with specific body motions.
The central challenge of Xoromancy's interaction design involved the mapping between input movements and output dimension modulation. We agreed early on to use the Leap Motion hand sensor as the input source, and though there is a massive set of possible orientations and contortions that the hand can mechanically undergo, the choices of structuring the input/output mapping are far fewer, in part due to the limitations of Leap Motion's tracking. The final mapping involved, for each hand, maps translational movement along the x, y, and z axes to three components of the z-vector. The rotation of each hand was mapped to four more components of the z-vector using the 4D quaternion method of decomposing 3D rotations providing Xoromancy an extra z-vector dimension of control per hand than if the classical Eulerian xyz decomposition were used.
Work had to be done to tune the scaling factor of each transform to fit within the range of z-vector component magnitudes that produced coherent output imagery. Outside of a certain range, z-vector values produce visuals that initially seem to break out of the (unchanged) category vector, intermixing subjects, before extreme z-vector values dissolve into harsh visual artifacting.
The choice of the seven dimensions per hand was most constrained by the desire/requirement that each dimension be traversable orthogonally to any other — that is, that no action to modulate a given vector component would necessarily modulate any other, that they be independent unless otherwise desired. For example, one could rotate one's hand (affecting the quaternion-controlled dimensions) without affecting the dimension mapped onto translation along the y-axis.
Attempts to map all 128 z-vector components onto the hand is a near-impossibility given current hand-tracking fidelity, and much work needs to be done to explore/determine what the maximum number of orthogonal dimensions might be mapped onto the set of contortions of the hand. Conversations with computational artist Memo Akten illuminated the possibility of treating the high-dimensional space of all possible comfortable and mechanically valid conformations of the hand as a subsidiary, embedded "manifold" of all possible 3D arrangements of the 20-odd joint vertices (ignoring mechanical limitations). A neural network could be trained to project between the mechanically-valid manifold and the 128- or 1128-dimensional space of BigGAN images, optimizing for maximum visual difference of output per given input movement. Such an approach highlights the size of the design space once neural network optimization is used to descend towards mappings that might be impossible to arrive at manually, but that a computer would have little trouble in balancing.
Upon mapping hand motion in those fourteen total dimensions to the first fourteen components of the z-vector, the nuance possible was immediately apparent. Although there is no explicit meaning behind the ordering or operation of each z-vector component, once each component is mapped to a body movement, merely moving the body instantly displays how the visual form is controlled. Visual patterns of growth or sliding or transformation or inversion are apparent, and are reinforced when the proprioceptive awareness of the hands' movements are correlated to visual output. Since hand movements are reversible, and the visual feedback reflects that reversion in realtime, coordinates in BigGAN's "latent space" are available to be coherently returned to.
Xoromancy is distinct amongst the other prototypes in that 1. it is not VR-based, and 2. it is the only one deployed into the world and tested beyond myself, having had multiple public exhibitions.
First premiering at CMU's Frame Gallery, February 22-24, 2019, Xoromancy enabled the audience to explore curated BigGAN images. Participants had a range of reactions, some of whom used exclusively gross movements, whereas others were quite meticulous and measured in their BigGAN traversal. Multiple participants expressed the sense that they were "mastering" it over time and this matched my and Tiwari's personal experience, where repeated engagement gave opportunities to increasingly attune oneself to the intricacies with the BigGAN model.
Xoromancy premiered to the general public at New York Live Arts for their 2019 Live Ideas festival from May 8-22.
Xoromancy was also accepted to IEEE-GEM [Games Entertainment & Media] Conference 2019 as Xoromancy: Image Creation via Gestural Control of High-Dimensional Spaces (Tiwari & Crawford, 2019), and showed from June 19-22 at Yale University.
· · · · · ·
BigGAN's latent space, while not being literally a "space" containing physical dynamics or affording embodied movement within, is nevertheless a space that the body and mind can grow direct, embodied familiarity with, provided that some species of motion tracking enables mapping between the body and the output. Qualitatively distinct experiences of this latent space are possible when the latent space is itself attached to you, where the only visible results of your motions are explicitly non-embodied and non-spatial, instead being 2D images that change in deeply nuanced ways that coherently reflect your motions.
The scale of the GAN latent space situate GANs and other generative neural networks as a pivotal new opportunity for spatial, embodied interaction, where the wide ranges of motion of the body can interface with such nuanced and nigh-inexhaustible visual content. As machine learning continues its inexorable climb into greater relevance, I predict many rendering methods will come to incorporate it.
Though Xoromancy isnot a spatial representation, I see no reason why entire coherent stereo optical arrays can't in principle be generated by neural networks in realtime, with all of the capabilities that current spatially-computed renders offer. This opens up an entirely new class of hyperphenomena, and deserves much investigation.
Though varied in form and implementation, my prototypes are the products of an approach to exploring spatial computing that provokes questions surrounding the nature of the body and its relationship with computed phenomena. As the field of spatial computing develops and evolves, new methods of generating sensory phenomena are developed and become available, in effect continually widening the available hyperphysical design space.
Many of my prototypes stemmed from seeing Twitter posts showcasing or announcing new rendering methods such as BigGAN, or Unity's Visual Effect Graph. Implementing them in an embodied context produced novel results and provided insights that I would likely not arrived at had I only used my imagination. At the moment, Twitter houses the majority of the field's public conversation around spatial computing, and the visually satisfying novelty of creative reapplications serves to amplify the collective awareness of the more expansive possibilities. As a unified learning and sharing platform, it accords quite well with the dynamics of the field's burgeoning medium.
By evaluating my prototypes solo and in real time, I was able to iterate rapidly. Assisted by runtime editing in Unity, this often took the form of associating a hyperphysical structure with the location or movement of my hand, tweaking parameters in the simulation as I actively modulated the structure with my tracked hand's movements. This expressivity of bodily motion brought out nuances of physical reactivity that I might not have arrived at had I only watched the reactivity rather than having been its causal source.
What I explored was a minute section of the available exploration space, nevertheless revealing many insights into new dynamics within spatial computing. It is my hope that designers operating in this field take opportunities to experiment with novel materialities and mappings, as collective effort to discover and map hyperphysical frontiers furthers the field and deepens the collective understanding. Even more critically, the direct, embodied experience of these new phenomena gives a nuanced understanding that is impossible to fully convey with paragraphs, so in many ways the best method of propagating these understandings is through the creation and sharing of the experiences themselves.
The framing of hyperphysics urges designers to conceive of the classical physical and computational context as only one of a massive set of possible physics, ripe for exploration. Critically, the body's relevance in hyperphysical exploration cannot be overstated. Within spatial computing, the representation and behavior of the body is as available to designers as any other computed structure, and this heralds a transformation in how we perceive and conceive of our bodies and our relationships with the external.
Many invariants taken for granted when designing in the physical world fall away when opened up to the computer to simulate, and this space of design supports a range of phenomena that we haven't had access to before, requiring methods to explore and assess the available design space.
The series of prototypes produced in this thesis investigate the opportunities of representing the body with novel hypermaterials that can undergo transformations impossible for our physiology, and methods of mapping bodily motion into computed environments, enabling