The oscilloscope is set to X-Y mode which means that one channel deflect the beam in horizontaly and the other one verticaly.
What you haer is the actual sound waveform fed into the channels of the oscilloscope. Left channel for horizontal and right for vertical deflection.
I didn't create the audio file, the credits for that goes to Jerobeam