See also: IBM Systems Journal paper

Things That Talk

by V. Gerasimov and W. Bender

Sound is explored as an inter-device communication medium. Considerations include robustness in various environments, potential interference, frequency limitations of conventional and piezoelectric devices, computational complexity, and strategies for ultrasonic and human audible frequencies. Algorithms include modem protocols, data hiding techniques, impulse coding, and dual tone modulation.

Why does a phone ring not tell your computer who is calling? Why does your wrist-watch alarm not signal your oven to turn off? Many commonplace devices and appliances contain information that can be shared with others. While many of these devices have mechanisms, e.g., audio or video, to transfer information to people, they are deficient in their ability to communicate information that can be important to other devices. For example, almost all modern electronic devices have internal clocks, but are incapable of automatically transferring the time and date settings from one device to another. ``Things That Talk'' research focuses on the use of sound to enable device-to-device communication.

Table of Contents

About Sound

Sound in device-to-device communication

Sound signals and the environment

Communication Protocols

Research results

Medical concerns


Future work

Sound samples


About authors

About sound

Sound is radiant energy that is transmitted in longitudinal waves that consist of alternating compressions and rarefactions in a medium. The maximum possible frequency of sound in any medium is about . The speed of sound is dependent upon properties of the material through which it travels (see Table 1). Sound has propagation features quite different from that of light, which make an excellent alternative in situations where light does not work well. For example, because sound travels slowly, the Doppler effect is strong enough to be used for motion detection of the relatively slow moving objects around us.

Many animals use high-frequency sound to detect the position and speed of objects in their environment (see Table 2). This method of orientation is called echolocation. It is especially useful for animals such as bats, which live in dark caves and need to hunt for insects at night, and whales, which spend their lives in turbid water. Humans have not overlooked this property of sound. In medicine, low-power ultrasonic waves (300 kHz) are used in place of X-rays to image internal body structures. Focused ultrasound can cause localized heating and destroy bacteria or tissue without harmful effect on surrounding tissue. Ultrasonic (100-16,000 MHz) microscopy1 provides better resolution than optical microscopy, and enables the investigation of a wide range of material properties in amazing detail. An important advantage of ultrasonic microscopes is their ability to distinguish between different parts of a live cell by viscosity. Also, because sound has minimal affect on the function of live cells, ultrasonic microscopy is used to study living cells. Finally, ultrasonic sonars are widely used for underwater exploration.

Table 1. The speed of sound in various media
MediumSpeed (meters/second)
Air at 0°C 331 (increases with temperature and pressure)
Water 1400-1600 (increases with temperature, salinity, and depth)
Lead 1210
Rolled aluminum 5000

Table2. Sound frequencies used for echolocation by animals [1]
SpeciesFrequency RangeWavelength Range
Whales50-200 kHz (water)7-32 mm
Bats20-80 kHz (air)4-11 mm

Table 3. Frequency range of hearing [1]
SpeciesFrequency (Hz)
low high
Humans20 20,000
Cats100 32,000
Dogs40 46,000
Elephants16 12,000
Bats1,000 150,000
Grasshoppers100 50,000
Rodents1,000 100,000
Whales and Dolphins70 200,000


In contrast to these specialized imaging applications, sound in computers and communications applications has been limited to music, simple sound effects, and speech synthesis and recognition. (Notable exceptions include dual tone modulation (DTMF) used in telephony and phase-shift-keying (FSK) used in modems and facsimile machines. In fact, modems and faxes are pseudo-acoustic devices and cannot effectively communicate through air. They work well only with electric signals and require a wire to isolate the system from external disturbances.) Sound may be a good alternative for optical computer vision; for example, rather than trying to recognize people by imaging their faces or fingers in visible light, sonar technology can be used to measure unique body parameters, such as volumes, shapes, and locations of different organs.

Many electronic devices in and around the man-made environment are designed to speak and/or listen. Many devices have a speaker or beeper to relay status to a human. Some of these devices, such as telephones or faxes, have microphones. Piezoelectric beepers in wristwatches can also be used as microphones.[2]

This paper explores the use of sound as an inter-device communication medium. Considerations include robustness in various environments, potential interference, frequency limitations of conventional and piezoelectric devices, computational complexity, and strategies for ultrasonic and human audible frequencies.

-Table of Contents-

Sound in device-to-device communication

Sound has many positive features that make it an attractive alternative to infrared light (IR) and radio frequencies (RF) for device-to-device communication. Humans (and most of the other living beings) use sound to communicate. Therefore, sound is a likely choice wherever people live or work. The human environment is usually designed to localize and preserve sound, and in such environments, sound is more predictable than IR and RF. Sound, unlike IR and RF, does not have unexpected foes such as sunlight, rain, metal objects, etc. Sound can be hidden from people if its frequency is higher than 20,000 Hz (see Table 3). IR usually requires direct visibility, and does not work in sunlight or bright interior light. Also, IR has some unexpected features. For example, it propagates through materials that are not transparent in visible light, and reflects from different materials than visible light. RF suffers from interference problems, and can be blocked by metallic objects. Sound does not have these problems. It travels around corners, and can still be localized within a room.

However, one feature of sound makes it difficult to analyze - sound travels slowly (about 300 meters/second in air). This creates several problems: (1) A signal reaches its target with a noticeable delay; and (2) Reflections of the signal from walls or other objects can be comparable in intensity to the original signal, and can reach the receiver with different delays, interfering with the main signal. (Due to the slow speed of sound, an echo may create long disturbances of the signal, making it necessary to use relatively long coding packets.) Other problems include the non-linear characteristics of most audio equipment that tend to shift signal phase and environments that have very different and practically unpredictable acoustic characteristics. Since the environment has little effect upon the relative amplitude of a sound within narrow bands of frequency, spectral analysis is a reliable method of acoustic signal analysis in unknown environments. These relative amplitude changes of the spectral components of a transmitted signal can be detected by a receiver.

Audible sound and lower-frequency ultrasound have similar features. Many small speakers and microphones used in electronic devices perform reasonably well at frequencies up to 30,000 Hz. The frequency limit of piezoelectric devices is much higher. This means that the same devices can use ultrasound whenever inter-device communication is not relevant for people and might be distracting or disturbing.

-Table of Contents-

Sound signals and the environment

Sound signals have several convenient mathematical representations. Each representation splits the received sound signal to a function of the original signal plus noise:

Equation 1.

A received signal can be represented as a sum of the convolution of the transmitted signal with an impulse response of the environment and noise [3] (Equation 1). The lower bound in the convolution integral can be set to zero, since the system does not predict the future. Unfortunately, whenever air or any objects in a given environment move, the parameters of the system change, and as a result, the impulse response may change in time. Therefore, g(x) depends on t. g(x) cannot be calculated at the source and then used to analyze or predict the received signal. However, in any environment that can be used for acoustic communication, g(x) has to decrease exponentially as . Also the integral of g(x) over the whole axis must be less or equal to 1.

The noise component in Equation 1 represents both the deviation of the system behavior from the regular convolution formula due to non-linearities and all other sources of sound in the environment. Noise is uncontrollable and unpredictable.

An impulse function gives the most complete description of an environment. Unfortunately, numerical methods that can find an impulse function and deconvolve a signal are not yet well developed, and modern computers are not yet fast enough to process signals at that level of abstraction in real time.

It is possible to use a simpler representation of the received signal. The original signal can reach the receiver either directly or after reflection from one or more surfaces. Therefore, the received signal can be represented as a sum of the directly received signals; all reflections (echoes) of the original signal that reach the receiver; and noise:

Equation 2. Where is an energy reduction coefficient and is the corresponding delay of a reflected or directly received signal.

Equation 2 has some useful features that help to explain why spectral analysis works in this kind of tasks. If the original signal is a sine wave, the sum of all reflections of the original signal is also a sine wave of the same frequency [4] (Equation 3). The amplitude of the reflected signal is proportional to the amplitude of the original signal, while its phase is hard to predict. In the worst case the resulting amplitude can be constant zero. This happens if the receiver (microphone) is exactly at a trough of the interference pattern produced by the wave in the environment. Though it is theoretically possible, the probability of this occurring in a real environment is extremely low.

Equation 3.

Any even signal can be broken down into a sum of sine waves. The spectrum of the signal gives the amplitudes of those waves. According to the previous results, all components of the spectrum of the received signal are proportional to the corresponding components of the spectrum of the source signal. Therefore, the spectrum of the original signal can be estimated using the spectrum of the received signal.

The Fourier transform allows the system to calculate the spectrum of a signal. However, if the sender uses only a few frequencies to encode data, there is not much sense in calculating the whole spectrum. It is sufficient to calculate only the Fourier coefficients that correspond to the coding frequencies.

Since sound reflects differently from different objects and propagates rather slowly, a sharp change in the original signal spectrum will cause several changes in the received signal spectrum. The receiver can correctly analyze the signal only after a certain amount of time known as the settle time. Therefore, a system can work only if either coding frames are longer than the average settle time, or it has an adequate echo tolerance reserve. Although the detectable echo tail in a medium-size room can be as much as several seconds long, the strongest echo components are the first reflections from large surfaces, e.g., walls, floor, ceiling, furniture, etc. The strongest echo components in a room usually have delays only up to several hundredths of a second.

-Table of Contents-

Communication Protocols

Acoustic computer communication requires development of special protocols. The protocols used in modems, faxes, telephones, and remote controls are designed for mediums other than sound. However, all those protocols can be modified to work reliably with acoustic signals.

Modem protocols. An obvious first step in this research is to adopt existing fax/modem protocols for acoustic transmission. The fastest modem protocols exploit almost all of the properties of existing telephone lines. They rely on phase information; and they do not tolerate multiple echoes. Less sophisticated modem protocols (up to 2400 bps) are less dependent upon specific telephone-line characteristics and can be modified for use with through the air communication.

The audio signal stream in low speed modem protocols consists of a sequence of short coding frames. The transmitter generates several predefined frames that must be recognized by the receiver. All of the coding frames have the same length. Different coding frames must have different spectral characteristics to be distinguished by the receiver.3,5 Coding frames can contain either a single frequency, or a mixture of several frequencies. A simple technique used in modems is FSK, which uses single-frequency frames. The FSK frequencies are chosen so that there are a whole number of waves per frame and the frames can be easily distinguished by the receiver.

Usually, after traveling through the air, the signal is blurred because of echoes. A technique for circumventing this echo problem is to increase the frame length beyond the average echo settle time. Echo tolerance can also be increased by adjusting the coding frequencies used in adjacent frames.

Data-hiding protocols. Bender et al.6 suggest several possible techniques for adding recoverable inaudible data to a host acoustic signal. Spread-spectrum and phase encoding are useful for digital or high-quality analog transfers. One of their suggested techniques, echo coding, preserves hidden data after the sound has traveled by air.

Impulse protocols. To send data, IR remote controls use a sequence of light bursts with different delays between them. This method can be used with sound. An acoustic remote control can send short impulses with pauses between them. This protocol is easy to implement and requires minimal processing to recover the data. But it cannot deliver high continuous data transfer rates.

Touch-tones. Touch-tones used to dial telephone numbers and send information among telephone nodes work fine in air, but are slow and subject to noise. However, it is a ready-to-use method for computers to communicate with telephones and faxes. Recognition and generation algorithms are easy to implement. Each touch-tone code is a mixture of two tones chosen from a set of eight basic tones. Table 4 lists the standard telephone codes. The digits, ``*'', and ``#'' are used to dial numbers and ``A''-''D'' are used as special signals.

A personal computer (PC) sound board can be programmed to communicate with touch-tones. Patterns generated for each touch-tone code can be sent to the sound driver when the user presses the corresponding buttons. The recognition algorithm uses eight digital filters to detect basic tones. The touch-tones work well if they are at least 20 ms long. Thus, the maximum reliable data transmission speed that can be achieved using touch-tones is about 200 baud. The standard touch-tones are 70 ms or longer, which is above an average echo settle time. Therefore, computers, telephones, and faxes can communicate reliably using touch-tones. The set of tones used in this protocol can be expanded to achieve higher data transfer rates.

Table 4. Standard touch-tone codes
1209 Hz 1336 Hz1477 Hz 1633 Hz
697 Hz1 23 A
770 Hz4 56 B
852 Hz7 89 C
941 Hz* 0# D

-Table of Contents-

Research Results

The Things That Talk research is an exploration of different data encoding schemes for acoustic data transmission. In order to verify robustness of the protocols, several working prototype systems were built that use sound to communicate.

Echo coding. In the echo coding experiments, a one-bit modulation was used. An echo added to a signal corresponds to one and subtracted from the signal corresponds to zero:

Equation 4. Echo modulation. Whereis the echo delay; is the echo/signal ratio; is the encoded bit.

If the synthetic echo should change its sign abruptly, any change from zero to one or from one to zero will result in a click. A way to avoid these clicks is to increase and decrease the echo component linearly at either side of each frame.

There are several methods of detecting an echo with a given delay. The most robust methods require a Fourier transform of the signal and an analysis of its spectrum. Consequently, these methods need a great deal of processing power. Since one goal of this research is to use sound in ``ordinary'' devices, it is desirable to find methods of only modest computational complexity. A simplified echo coding method that requires only one multiplication per sample was developed:

Equation 5. Echo detection. Where is the sampling period; is the echo delay; is the number of samples used by the filter.

When a modulated signal is fed to the filter, the filter value can be presented as a sum of two components:

Equation 6. Filtering of a modulated signal.

The first component is a sum-of-squares of the original function, and the second component is a sum-of-products of the original function at different points. The detection of echo is based on the fact that the absolute value of the first component is usually greater than the absolute value of the second component. The sign of the filter output is defined by the encoded bit.

The program does not analyze the original signal, and, therefore, may have poor results if the signal contains pauses or resonance frequencies. It is necessary to use an error correction code and to verify check-sums to support reliable data transfers.

Experiments with echo coding indicated that this method can give positive results at speeds of up to 100 baud, but it requires too much computational power to implement a robust real time communication protocol. Although the simplified echo detection algorithm is fast enough to work in real time on 8 MHz computers, its noise tolerance and signal independence leave something to be desired. The algorithm can reliably recover only about ten bits/second. However, since the coding frame size is longer than an expected echo tail in a room, this method does not degrade due to echoes in the environment.

The principle advantage of the echo coding technique is that it adds information to any acoustic signal in an audible spectrum (speech, music, noise), so that people do not clearly distinguish the communication between devices. One possible use of echo coding is to mix a message for people with a message for devices.

Echo coding has the three problems. First, silence has no echo, therefore the echo coding technique will not work well with signals that contain pauses. Silence can be avoided by adding noise to the signal, but this has the potential of spoiling the quality of the sound. An ultrasonic tone can be added to the signal, but receiving ultrasound requires special equipment. Second, speech and music may have fragments that resemble an echo. For example, if the original signal is a sine-wave and the chosen echo delay is equal to an integer number of the signal wavelength, an echo-like fragment results. Last, the echo coding does not completely hide the data signal since humans can detect even very weak echoes.

Phase-shift-keying and amplitude modulation. Two protocols based upon variants of FSK and amplitude modulation were developed. The first protocol uses frames of two frequencies to transmit four bits at simultaneously. 64 byte packets are sent as a synchronous sequence of four-bit frames, demarcated by a hail frame at the start of each sequence. The receiver, upon recognizing the hail signal, attempts to synchronize with the incoming data. Once synchronization is achieved, 64 bytes of information are transmitted. The system uses a partial Fourier transform to identify frequency patterns in the received signal. The second protocol uses an amplitude modulated 18.4 kHz sound signal. Again, a hail signal is used for synchronization. Data is sent as a set of one bit frames. Both protocols are implemented as part of Sonicom, a device-to-device communications prototype that uses a sound board to send and receive signals. Sonicom transfers data at speeds of up to 1.4 kbaud (see Tables 5 and 6).

Table 5. Sonicom two-frequency protocol. Technical data.
Transmitter output1 channel, 8-bit, 44,100 samples/second
Receiver input1 channel, 16-bit, 44,100 samples/second
Frame size120 samples: ~2.72 ms
Coding frequencies735 Hz, 1470 Hz, 2205 Hz, 2940 Hz, 3675 Hz, 4410 Hz
Hail frequency5512.5 Hz
Encoding4 bits/frame
Transmission speed1470 baud

Table 6. Sonicom ultrasonic protocol. Technical data.
Transmitter output1 channel, 8-bit, 44,100 samples/second
Receiver input1 channel, 16-bit, 44,100 samples/second
Frame size30 samples: ~0.68 ms
Coding frequency18,375 Hz
Hail frequency5512.5 Hz
Encoding1 bits/frame
Transmission speed1470 baud

The Sonicom transmitter program generates and holds samples for each coding frame. The two-frequency protocol uses two tones to encode four bits. Each frame is comprised of either two different tones mixed together or silence (see Figures 1 and 2).

In order to define sixteen different frames (four bits), the system uses six different frequencies. There are frames composited from pairs of frequencies. The sixteenth frame is silence.

Figure1. A packet is comprised of the hail signal and 16 different coding frames.

Figure 2. The same packet, after traveling approximately 1 meter through air, includes noise and echo added by the environment

Figure 3. Sonicom data transfer and receive methods.

A hail frame designates the beginning of each data packet (128 frames). Hail frames use a different frequency than data frames. This ensures that the receiver is able to readily identify the synchronization impulses.

All of the frames are multiplied by the Blackman function [3] (which is often used as a framing function in discrete Fourier analysis):

Equation 7. The Blackman function: is the frame width.

Performance is improved when the Blackman function is used by the transmitter to form frames. Use of the Blackman function also reduces the number of calculations required in the receiver.

In general, it is always better to encode data using as few discrete frequencies as possible. Fewer numbers of frequencies require fewer numbers of filters, thereby reducing the amount of computation necessary to decode a signal. Also, using fewer frequencies improves the overall frequency separation and noise tolerance of the system.

It is possible to define four bits using only five frequencies: frequency pairs, five single frequencies, and silence. In practice, it is problematic to clearly recognize single frequency packets because of echo from packets transmitted earlier. The six-frequency scheme proved to be more tolerant of echo and noise.

The amplitude protocol uses a high-frequency (ultrasonic) tone to encode data (see Table 6). The data frames are 30 samples wide at 44.1 kHz. With this protocol, each frame represents only one bit. Typical computer microphones and speakers can handle frequencies up to about 30 kHz, but amplifiers and digital-to-analog converters (D/As) and analog-to-digital converters (A/Ds) have a limit of approximately 22 kHz. Conventional sound boards have maximum sampling rates of 44.1 kHz. However, they do not work well at this upper limit. A 18,375 Hz tone was chosen to correspond to one. Silence was used to correspond to zero. The same hailing frequency was used as in the two-frequency protocol. At 5.5 kHz, it was audible. This is adequate for a demonstration system, but must be increased if the protocol is to be truly ultrasonic.

The Sonicom receiver program conveys data from the sound board driver to a set of digital filters.7 The system uses the filter outputs to detect and decode incoming data. The receiver program maintains an idle state when no incoming packets are detected. When in this idle state, the system uses only the one filter that is needed to detect the hailing frequency. When a hail frame is received, the filter output first increases and then decreases. This results in a peak that is one frame wide. By estimating the mid-point of the peak, the receiver is able to synchronize with the incoming data. It is only necessary to activate multiple filters in order to decode the incoming data packets. Thus, computational load is reduced between data transmissions.

The digital filters are written in assembler using the Intel 80386 command set. Fixed-point arithmetic is used, enabling the program to run in real-time on Intel 80486DX 33 MHz computers. The value of each filter is calculated by the formula [4]:

Equation 8. Filter function: is the length of filter in samples; is the filtering frequency; is the sampling frequency; is sample values. The value of each filter is a squared amplitude of the fourier coefficient corresponding to the filtered frequency.

Filters are the most computationally expensive part of the system. The filter blocks are optimized so that they perform only three multiplication operations per filter per sample. The program calculates the values of sine and cosine during initialization and stores them in memory arrays for subsequent low cost filter state updates.

The Sonicom (see Figure 3) has good echo tolerance. This makes it possible to use a short frame size. The protocol works well with distances between transmitter and receiver of up to two meters (without obstacles). The reception improves if the receiver uses two microphones and sums the inputs.

A ``Battleship'' demonstration was prepared for the MIT Media Laboratory's Tenth Anniversary. Two PCs played Battleship with each other. The computers used sound to negotiate moves during the game. Two ``Talking Balloons''2 were used as speakers. The Battleship program implemented the traditional game rules. It displayed the game in progress as well as the comments sent and the messages received. The program used modules from Sonicom to send/receive and modulate/demodulate acoustic signals. Due to the poor acoustic properties of the balloons, two low-frequency tones were used for one-bit data encoding. Also, the frame width was increased. The balloons could have been used as microphones but were not because the quality was unsatisfactory. There was also a feedback problem when a single balloon was connected to a sound board as both a speaker and a microphone. Therefore, conventional microphones were used to receive sound.

Impulse coding. Another protocol, impulse coding, is similar to Morse code, but instead of using impulses of different length, it sends impulses of constant length with variable length pauses between them. The protocol encodes one bit at a time. The value of an encoded bit corresponds to the distance between two consecutive sound impulses. A short pause equals zero, and a long pause equals one (see Table 7). A similar protocol is implemented in IR remote controls. The protocol is easy to implement, requiring little computational overhead by either the sender or the receiver.

Table 7. Impulse coding technical characteristics.
Transmitter output1 channel, 1-bit, timer-modulated
Receiver input1 channel, 16-bit, 44,100 samples/second
Coding impulse width~5.7 ms (50 semiwaves)
Coding impulse frequency4,405 Hz
Bit 0 pause~5.7 ms (50 semiwaves)
Bit 1 pause~17 ms (150 semiwaves)
Average transmission speed59 baud (7-8 bytes/second)

The receiver uses one digital filter to detect impulses. It dynamically adjusts its response level depending on the maximum filter output level from the last received impulse. The program measures the distance between the rising edges of two consecutive impulses. If the distance approximates the coding length of either a zero or a one, then the receiver adds the decoded bit to a buffer. Otherwise, it resets the buffer.

The impulse protocol is implemented on the handy-board (see Figure 4), [8] a computer developed by the Media Laboratory's Epistemology and Learning Group. It is capable of sending the contents of the notebook to other computers using sound. The board consists of a Motorola MC68HC11 CPU (8 MHz), 32K of static RAM, a 16x2 alpha-numeric LCD display, a speaker, an infrared receiver, an RS-232 serial port, several analog and digital input/output ports, several LEDs, two buttons, and a knob. The implementation has two parts: (1) A program for sending data using sound; and (2) a receiver in the form of PC programs that decode the sound signals sent by a handy-board.

The handy-board has been programmed as a clock and a notebook. The handy-board program sends out a notebook record using the built-in speaker. A receive program is run on a host computer to update the contents of the notebook.

The data transfer-rate of impulse coding is slower than protocols that use multifrequency encoding.

Figure 4. The handy-board

The impulse protocol suffers somewhat from echo, but otherwise proves to be reliable. However, if audible impulses are used, it creates a disturbing noise.

-Table of Contents-

Medical concerns

Both audible sound and ultrasound have a psycho-physiological effect on people. Audible sound probably can be used safely either in rather short data exchanges, when people may want to control the transmission (e.g., phone rings, acoustic remote controls, data exchange between wrist-watches and computers), or for long data exchanges, when people are not present. Ultrasound in the 20,000-500,000 Hz (and probably higher) range is harmless, if it is not very loud. If ultrasound is extremely intense, it may cause heating and scattering of hard objects. Noise at frequencies lower than 100,000 Hz may be unpleasant for pets. There are many natural sources of ultrasound, such as bats and some insects, that may produce very loud ultrasonic noise. These sounds do not seem to affect people or animals. Almost all television sets and computer monitors constantly squeak with line-scanning frequency, which has not been recognized as a source of medical problems.

-Table of Contents-


Five subjects from the MIT Media Laboratory participated in a study conducted to determine which encoding techniques produced the least disturbing noise. Subjects listened to examples of DTMF, ultrasonic coding, FSK, impulse coding, and echo coding. As expected, since ultrasonic and echo coding techniques operate near the threshold of the human auditory system, all of the subjects found these the least disruptive.

Four subjects preferred touch-tones to other audible techniques. They explained that either they were accustomed to touch-tones or that the data transmission sounded melodic to them. One subject pointed out that since FSK has a much faster data transfer-rate than touch-tones, it might be preferable to listen to an FSK squeal for a short period of time rather than to touch-tones or impulse coding sounds for a longer period.

One could conclude that ultrasound should be used for device-to-device communication and that audible signals be sent only when devices communicate to people. The FSK modulation can be used ultrasonically, achieving even higher data transfer-rates than reported here. However, this will require special hardware, that can operate at higher sampling-rates than conventional sound boards or that can process sound signal in analog. Touch-tones are relatively attractive, not only because they are commonplace, but also because the combination of two frequencies produces a beat (characteristic pulsation) that makes sound alternately soft and loud. This is similar to the sounds produced by musical instruments. In moderation, the tones seem pleasant. Touch-tones are typically much longer than frames in FSK modulation. It explains why those data coding techniques sound so different to human observers. Perhaps, the attractiveness of double-frequency tones can be used to make an encoding that sounds like pleasant music. The only advantage of the impulse (remote control) modulation, which is used in the handy-board project, is its simplicity. It requires minimal acoustic hardware quality and processing power, and can be used in small devices with limited resources.

-Table of Contents-

Future work

A wristwatch can use the sound from its beeper to transmit data to a computer. This may be of use as a communication back-channel for wristwatches that keep address and phone data. An airport public address announcement can transmit data for wristwatches, PDAs, or laptop computers. A computer may have an easier time than people in decoding the contents of those announcements.

Several computers close to the same wall, floor, ceiling, or table can use an acoustic transceiver to transfer data through a solid object. It would work as a ``wireless'' network. The sound would travel much faster and at higher frequencies than it does through air.

Three acoustic responders placed on a ceiling can serve as reference points to track the position of other objects in a room. A device that wants to know its position can send a ``ping'' to the responders, receive responses, measure time delays, and calculate the distances to each of the responders. This method can provide relative coordinates of the device with very high precision. The theoretical precision limit for digital processing with a typical sound board (44.1 K samples/second) is about 15 mm, but analog ultrasonic devices can achieve much better results. The advantage of this method over electromagnetic sensors is that all of the coordinate calculations are linear.

-Table of Contents-

Sound Samples

Echo encoding (10 baud)

DTMF (Dual Tone)

Sonicom two-frequency

Sonicom ultrasonic

Impulse coding

-Table of Contents-


  1. Encyclopedia Britannica,
  2. The Interactive Balloon: Sensing, Actuation, and Behavior in a Common Object, Joseph Paradiso (to appear in IBM Systems Journal, 35/3), 1996.
  3. Digital signal processing in communication systems, Marvin E. Frerking. 1994.
  4. Ñïðàâî÷íèê ïî ìàòåìàòèêå. / È.Í.Áðîíøòåéí, Ê.À.Ñåìåíäÿåâ, 1974
  5. Signal processing methods for audio, images and telecommunications, edited by Peter M. Clarkson, Henry Stark, 1995.
  6. Techniques for data hiding, Walter Bender, Daniel Gruhl, Norishige Morimoto (to appear in IBM Systems Journal, 35/3), 1996.
  7. Pipelined adaptive digital filters, Naresh R. Shanbhag, 1994.
  8. The Handy Board, Fred Martin,, 1995.
  9. Signal Coding and Processing, Graham Wade, 1994.
  10. Understanding Musical Sound with Forward Models and Physical Models Connection Science, Casey, M. A. 1994.

This work was supported in part by the Things That Think Consortium at the MIT Media Laboratory.

Vadim Gerasimov MIT Media Laboratory, 20 Ames Street, Cambridge, Massachusetts, 02139 (electronic mail: Mr. Gerasimov graduated with a Master of Science in Media Arts and Sciences from MIT in 1996.

Walter Bender MIT Media Laboratory, 20 Ames Street, Cambridge, Massachusetts, 02139 (electronic mail: Mr. Bender is Associate Director of the MIT Media Laboratory and Director of the Laboratory's News in the Future Research Consortium.

-Table of Contents-