Monday, March 12, 2012

Turning Whalesong into Rainbows: 1000 Numbers to 1000 words

Art by Casey Roberts
Two weeks ago I wrote about how sound gets from a whale, and into your computer.  At the end of that post, I included a pretty rainbow-colored picture of some dolphin whistles.  But I didn't ever explain how all that digital data got translated from zeros and ones into a nice pretty picture.


Why are pretty pictures important to science?  Awards are given out every year by several prestigious groups for the best images in science and biology.  These images are not only beautiful, but they often help us understand complex concepts that are difficult to understand using words alone.  For instance, I can tell you that dolphins have fat deposits in their jaw bones that help transfer sound into their ears, but the following picture by Darlene Ketton's lab at Woods Hole really shows you how the auditory fats (orange) connect up with the inner ear bones (red) to help the dolphin hear.  This picture does a better job of communicating the concept than I could do in 1000 words, AND you're not bored to death, either.


A 3-D image generated from a CT scan highlights selected
 tissue groups of a bottlenose dolphin's head. 
 (Courtesy of Darlene Ketten, WHOI)
Humans process most of their information visually, so we often need to translate acoustic information into visual information.


The acoustic data that I collect in my research is stored as binary data, which means that it is stored as a bunch of ones and zeroes, like this: 


01001000 01100101 01101100 01101100 01101111 


Each of these sets of ones and zeros corresponds to a letter or a number (the code I just used says "Hello"). Unfortunately, I can't look at binary code and understand what it means, because I'm not Neo from the Matrix.  Alas!


However, my computer understands the language of binary, and it translates all of these ones and zeros into slightly shorter strings of numbers, which look something like this (except a whole lot longer):

0.000113379 0.000136054 0.00015873 0.000181406 0.000204082 0.000226757

Not really a whole lot better, is it?  

2-D

Fortunately, I know that these random-looking numbers are the pressure values for a sound wave that has been measured 80,000 times every second. This means that the first number has a time value of 0, followed by 1/80000, 2/80000, etc.  Now that I know what the time values are, I can make graph with time on the x axis and the value for pressure on the y axis, like so: 

This is better: I can see that the sounds get louder and softer over time.  If the sound is louder, it will make bigger bumps on the graph, and if it is quieter, the bumps will be smaller.  It's still pretty hard for me to pick out the different sound frequencies, or pitches.

I can kind of see what is going on here, but it is like looking at only one line of pixels from an image of Marilyn Monroe; I don't really know what is going on [A]:


Fortunately for me, a French mathematician named Jean Baptiste Joseph Fourier (1768-1830) figured out a way to represent a continuous periodic signal (like a sound wave) as the sum of a bunch of sine waves.  This means that I can break up the signal above into a lot of simple sine waves.  I can also do the opposite, by adding up a bunch of simple sine waves to create a complicated one [B]:
Adding two sine waves together.
The top four waves combined (light blue) create the more complex wave at the bottom (dark blue) 

Even REALLY complicated waves, like this one, can be created by combining fourteen simpler waves:


2-D Again - Frequency

Once we break the sound up into its component frequencies, we can create a different picture.  This picture looks at how much of the sound is made up of each frequency.  For example, in this picture we are looking at the relative contributions of different frequencies to a sound recording.  You can see that there are more low sounds than high sounds, and that there are two small "peaks" between the frequencies of 1250 Hz and 1500 Hz.  (Hertz measures the frequency of a sound.  You can hear between 20 and 20000 Hz, depending on your age.) These two small peaks are actually the two frequencies at which the humpback whale is singing at this particular moment in time.

Now, it's like I'm looking at that picture of Marilyn vertically, but still only one row at a time.




Let's recap: To make a picture of a whale song, the computer breaks up the original sound recording into hundreds or thousands of individual segments.  For each of these segments, it does a Fourier Transform, which breaks the sound wave up into its component frequencies and creates the graph shown above.  

Next, the sound values are assigned a color.  Loud sounds are generally shown as red, and quiet sounds are blue.  If we stack these values next to each other, we start to see more of the picture:  


As more and more Fourier transforms are combined, they create the picture below, which shows the whistles of two common dolphins recorded in the Irish Sea. This picture allows us to see several important parts of the sound at once - how long it is, how loud it is, and how shrill or low the noise is.  I guess it's no lady in a red dress, but it's a whole lot more useful to me!



[A] Of course, this is not a perfect analogy, because a sound waveform is a combination of all the different frequency parts.  

[B] The explanations in this post about Fourier transform are grossly, extremely simplified, and skip over most of the math.  If you want a more thorough explanation, please see The Scientist and Engineer's Guide to Digital Signal Processing.  If you have any comments on my over-simplification, please leave them on the post or shoot me an email. 




No comments:

Post a Comment

What do you think?