hpr4648 :: Simple Podcasting - Episode 4 - Audio Analysis Fun

This episode is the fourth in an 4 part series on simple podcasting covering Audio Analysis Fun with

Hosted by Whiskeyjack on Wednesday, 2026-05-27 is flagged as Clean and is released under a CC-BY-SA license.
audio recording. (Be the first).

Listen in ogg, opus, or mp3 format. Play now:

Duration: 00:26:09
Download the transcription and subtitles.

general.

01

This is the fourth episode in a four part series on simple podcasting.

02 Introduction

In this episode we will discuss alternatives to Audacity when it comes to analyzing audio spectrums to find the sources of unwanted noise.

I previously promised some gratuitous hackery, and we will get into that in this episode.

03

Recall that with Audacity you first import the audio file, then select the part of the audio you wish to analyze (or ctrl-A for all), and then select analyze > plot spectrum.

This is in fact the only feature of Audacity that I know how to use. I am definitely not an audio expert.

I do however have some background in processing and analyzing other signals, so some of the basics are familiar to me.

04

We can accomplish the same thing that Audacity does in this instance provided we can do the following.

First, we need to get the data out of the audio file and into a form which we can import into other software.

Second, we need to perform certain mathematical operations on this data.

Finally, we need to be able to plot the results of these calculations on a chart.

--------------------

05 Fourier Transforms

First though, we need a bit of mathematical background.

What Audacity is doing when it shows a plot of frequency versus amplitude is that it is showing the results of a Fourier Transform.

A Fourier Transforms is a mathematical operation that converts the time domain into the frequency domain.

Any complex signal, audio or otherwise, can be broken down into a collection of sine waves of various frequencies.

For example, a simple square wave signal of say 100 hertz can be represented as a sine wave of frequency 100 hertz plus a collection of higher frequency sine waves which add together to give the sharp corners.

06

A Fourier Transform finds these sine waves and sorts them out into separate bins, with each bin representing an individual frequency or a collection of closely related frequencies, depending on how fine grained the sorting is.

07

This is exactly what we want when we are trying to figure out how to filter out noise.

Recall that earlier in this series we had to solve a problem with a high pitched background noise which was originating in my cheap microphone.

Analyzing this audio by frequency showed that it was a series of individual tones at 1 kHz intervals.

We were then able to use filters targeted at those frequencies to get rid of that noise.

08

There are several optimized versions of the Fourier Transform algorithm.

A very common one is the Fast Fourier Transform, common abbreviated to just "FFT".

This is so common that the term "FFT" is often used to simply mean any Fourier Transform even though this is not technically correct.

09

Typical FFT algorithms require that the number of data samples is exactly a power of two.

So the number of samples we need may be something like 4096, 8192, or 65536, to give a few random examples.

When we transform from the time domain to the frequency domain, each sample becomes a single frequency "bin". So the more samples we have, the finer the resolution we get in terms of frequency.

10

If we assume we are dealing with flac files recorded at a 44.1 kHz sample rate, that is, 44100 samples per second, then if we have 32768 samples, each "bin" represents slightly more than 1 hertz.

If we have 65536 samples, then each "bin" represents a fraction of a hertz.

For our purposes we will pick 65536 samples.

That means we need 1.48 seconds of data.

For simplicity's sake we will record at least 2 seconds of data and then just discard the samples that we don't need.

11

There is a further complication here. Fourier Transforms normally work with complex numbers.

Recall from your school days that as well as integers and real numbers there are complex numbers.

Each complex number consists of two parts, a real component and an imaginary component.

I won't go into the details of this, just accept that each sample needs to have two components.

Fortunately, if we don't have complex number data we can just set the imaginary component to zero and use that.

This is enough talking about the theory, let's get into the practical details.

--------------------

12 Extracting Data from Audio Files

First we will look at how to extract the data from the audio files.

Fortunately, one of the programs which we have already been using can do this.

To do this we will use Sox.

I am not aware of an equivalent feature in ffmpeg.

13

Sox calls itself "SoX - Sound eXchange, the Swiss Army knife of audio manipulation"

Sox is free software and is licensed under the GPLV2 or later.

In this case we want to use a feature which allows us to convert a binary audio signal file to a text data file.

To convert the file to text data we just give the output file a ".dat" file extension and Sox will do this for us.

14

Here is a command example.

sox inputfile.flac tdata.dat

15

This gives us a file in the following format, assuming this is a mono audio recording.

; Sample Rate 44100

; Channels 1

0 0.045471191406

2.2675737e-05 0.055023193359

4.5351474e-05 0.048217773438

6.8027211e-05 0.053192138672

etc.

The first line states the sample frequency

The second line states that the data is for channel 1.

The data starts on the third line.

Column 1 is the time in seconds.

Column 2 is the waveform data point.

16

To analyze the data we want a subset of these samples.

When we convert from the time domain to the frequency domain, our resolution will be determined by the number of samples. We would like therefore to have at least as many samples as the sampling rate.

We also want the samples size to be an even multiple of two.

The number of points we want to have is equal to the next even multiple of two above our chosen sampling rate, 44,100 Hz.

This number would be 65536.

17

To extract this data from the file we can do the following.

tail tdata.dat -n+3 | head -n65536 | awk '{printf "%s\n", $2}' > tdata.csv

18

We use tail to skip over the first three lines.

We use head to take the next 65536 lines and discard the rest.

We use awk to extract the second column which we will use as the real component.

We now have this data as a csv file in one column.

--------------------

19 Analyzing the Data

To analyze the data we need software which can calculate FFTs.

I will now show two examples of this, a very simple case using Libre Office Calc, and a more complex but more complete one using GNU Octave.

20 Using Libre Office

We can do fourier analysis and plot charts using Libre Office.

Take the csv file of data that we previously created.

For this example I used data from a recording of silence so that I could see what internal noise was being generated by the headset.

Open the csv file and import it into Libre Office Calc.

21

Now select all 65536 rows of column A.

The Fourier function will automatically fill the imaginary component with zeros if we don't provide an column of imaginary numbers, so we don't need to provide a column of zeros.

Then select Data > Statistics > Fourier Analysis.

22

A window will open allowing you to select various parameters.

For Results to:, enter "D1".

Grouped by Columns.

Select OK.

23

New data should now appear starting in cell D1.

The first line will say " Fourier Transform"

The second line will state the input range.

The third line will state "Real" in column D, and "Imaginary" in column E.

The data will start in row 4.

24

For our simple example we will ignore the imaginary data and just use the real data, which will form our Y component when we plot it on a chart.

We now need to create the X axis data.

25

Each cell is a "bin" of frequencies.

Each cell therefore represents (sample frequency) / (Number of samples) Hz.

26

To create the X axis data showing frequency, enter the following formula in to column C to the left of each D column number.

=((44100/65536) * (ROW() - 4)

27

We can now create an XY chart showing the frequency analysis.

You may need to exclude the first couple of dozen rows as very low frequency components which cannot be heard may otherwise overwhelm the data we are interested in.

Also, you only need the first half of the chart.

The FFT mirrors the data from the first half of the array into the second half.

28

Because characterizing a sine wave requires a minimum of 2 points, although we have a sample frequency of 44.1 kHz, we really only have sound waves up to a maximum of half that, or 22.05 kHz.

Create the chart with lines only.

If you followed the above instructions, you should see something resembling what we saw in Audacity, except with each bin more sharply defined.

29

In the data that I had from a recording of unfiltered headset noise, I could see a distinct noise spike every 1000 hertz.

30

However, we have taken several shortcuts.

First, the imaginary component of the data was ignored.

Second, the magnitude (that is, Y axis) has both positive and negative peaks.

Third, the data is not scaled to dB sound units, so we just have a relative measure.

However, that by itself is enough to tell us where the frequencies are that we need to construct filters to deal with.

31

We could refine this spreadsheet a bit more to deal with the above issues, but I think we have demonstrated the basic principle, and working with a spreadsheet can be a bit awkward.

However, if working with a spreadsheet is what you want to do, then you can add more columns and more formulae to improve on it.

--------------------

32 Other Analysis Software

I will go on to GNU Octave in a moment, but I want to get a few other alternatives out of the way first.

I won't go into any detail on them other than to point them out to people who want to have a go at trying these themselves.

33 Grace

There is math and plotting software called Grace.

This is free software, released under the GPL V2.

According to the documentation, it seems to have the features we need, including an FFT function.

However, I could not get it to work properly on Ubuntu 24.04.

I could not get it to load a data file and plot data.

34

The error messages were vague and unhelpful.

The file navigation system didn't work.

There was no obvious path to success, and if it isn't easy to use then there is no point to it.

This is fairly old software, designed for X Window and Motif.

I gave up on it as not suitable for this series as I am looking for some fairly low effort things for people to try themselves.

If someone else can get it to work on their PC, perhaps they could do an HPR episode on this themselves.

35 Command Line FFT Packages

There are several command line FFT packages.

They will read data from std in or from a file and output the FFT.

However, these are not packaged for Ubuntu and appear to be distributed as C source code which you would download and compile.

You can experiment with those if you wish, but I felt they were a bit out of scope for discussion here as I am looking at common tools that are ready to use.

36

Here are two examples.

One is

Command-line Fast Fourier Transform utility

https://github.com/gregfjohnson/fft

Another is

cli-fft

https://github.com/jonolafur/cli-fft

37

I have not tried these and cannot say whether they are any good or not.

Similarly, there are a number of FFT packages that are libraries for languages such as Python.

If you want to take the time to write a short program to go with them, you can create a dedicated FFT command line program.

However, I felt that this too was out of scope for what I was trying to do here.

38 Doing it the Hard Way

Hypothetically, it may be possible to write an FFT function in bash bc, which is the arbitrary precision calculator language which is part of the standard shell package.

I say hypothetically, because I have not tried it.

I think it would be an interesting challenge, but I don't have the time at the moment to try it.

If anyone feels motivated to give it a try, they're welcome to give it a go and then do a podcast episode on it.

--------------------

39 GNU Octave

We have seen that as well as using features built into Audacity to analyze the audio spectrum to see the frequencies of undesired noises, we were able to do the same using a Libre Office spreadsheet.

40

Now we'll look at another bit of software, GNU Octave. GNU Octave is free software, licensed under the GPL V3 or later. It is a mathematical scripting language, very similar to Matlab. People use it for mathematical, engineering, and scientific work. It can be found in most Linux distros and is available for some other operating systems as well.

41

Octave has two features built in that we need for our purposes. It does FFTs, and it has a plotting system built in to produce graphs.

--------------------

42

We will take the same audio test file that we used with Audacity and Libre Office and use it here as well.

The bash script to convert the flac file to text data is essentially the same, with the exception that file extension on the output file as is ".txt" instead of ".csv". This latter change was an arbitrary decision on my part.

43

As a quick review, this bash script uses sox to convert a flac file to a text ".dat" file.

Then it uses tail, head, and awk to extract the first 65536 rows of data, skipping over the header information and ignoring the first column of time data.

This script will be in the show notes.

--------------------

#!/bin/bash

# This version is for use with the GNU Octave script.

sox hsnoisemono.flac hsnoisemono.dat

tail hsnoisemono.dat -n+3 | head -n65536 | awk '{printf "%s\n", $2}' > hsnoisemono.txt

--------------------

44

We now have a 1.1 MB file containing 65536 samples of data in text format.

Now the next thing we need to do is to create a short Octave script file.

I will just give a brief overview of the script here, the full script will be in the show notes.

45

I put the script in a file called "octavespectrum.m". I have never used Octave before now, but the convention seems to be to give the script a ".m" ending.

The "she-bang" line is "#!/usr/bin/env octave". If you make the file executable you can run it like any other script, or you can type "octave" and then the name of the script to run.

46

I won't read out the script in detail, as that would be too hard to following along in a podcast.

However, I pass several arguments to the script including the name of the data file, and then two integers that I use to limit the display area in the Y and X axes so I can have the chart focus on the areas of interest that I want to see.

I also pass a string containing the name of the graphic file that I want the chart exported to.

This was an arbitrary decision on my part and you can just hard code these values in if that is what you want to do.

47

The arguments are accessed by calling the "args()" function, which returns an array of strings.

Next, it reads in the specified file using the "dlmread()" function. This reads all of the data into an array.

48

Next, it performs a hamming windowing function on the data. I'll explain that briefly.

It is standard practice when doing FFT signal processing to "window" the signal.

Since the signal sample is of finite length, it will stop at each end of the array.

49

Unless you were lucky enough for this to happen exactly at a zero crossing, this would produced an abrupt transition in the data which looks like "noise" to the FFT.

The solution is to taper the signal off gradually towards the ends so that when it gets cut off the signal is fairly small at that point anyway.

There are a variety of different windowing functions, but "hamming" seems to be the most commonly used.

50

Next, it does an FFT using the "fft()" function.

51

This gives us real and imaginary outputs.

These are combined by summing the squares of each corresponding real and imaginary element and then taking the square root of each and storing that in a new array. This gives a single array of the same length as the originals, but combining the two output components.

If anyone wants to tell me that this isn't how things are done in the audio world, they're welcome to make an HPR episode telling us all the right way to do things.

52

Then it does some scaling and selection of subsets of data so we get the X axis in hertz and just the number of samples that we wish to look at.

If you are looking at the script, the thing to keep in mind is that Octave will work on entire arrays of data in a single operation. You don't need to write explicit loops for this. The looping is handled implicitly as part of the syntax.

53

It also does various other things that make the chart easier to read. The comments in the script describe these in more detail. Since this is a script it's easier to add these sorts of refinements than is the case for a spreadsheet so I have made the effort to add them.

Finally it calls the "plot()" function.

If an output graphics file name was provided, it also creates a PNG file containing the same image using the "saveas" function.

54

We now see the chart, and it looks more or less as expected.

However, this chart is interactive.

You can zoom and pan the data, something that you can't do with either Audacity or Libre Office.

The chart window doesn't have a function for exporting the resulting chart to a "png" file, it will only save to an ".ofig" file. The ofig file is not a standard graphics file, it is a serialization of the chart data that can only be looked at using the Octave chart viewer.

55

Alternatively, you can just take a screenshot of the chart after you have interactively zoomed and panned to a point of interest.

At the bottom left of the chart window is a pair of x-y coordinates which tell you the current position of the mouse pointer in chart units.

This is very handy as it can be used to get the exact (or close to exact) frequency of each noise spike.

56

The Y axis is not scaled in any particular units such as dB, as I'm not sure how to do that according to audio industry conventions.

On the other hand, I'm not sure that it's really necessary, as I don't know what dB means in tangible terms anyway.

It does show relative sizes, so it helps to determine whether you have one noise frequency or multiple frequencies to worry about.

57

If anyone is familiar with how to scale the raw data from a flac file as exported by Sox into dB units according to audio industry convention, then they are welcome to create an HPR episode telling us how to do it.

--------------------

58 Comments on GNU Octave

I had never used GNU Octave before this, although I had heard of it and it is quite a significant piece of software for a specific segment of users.

59

The syntax is a bit odd especially in how it deals with array operations, but I was able to google various examples and answers to eventually get this working.

A few other peculiarities are that it uses the percent "%" character to denote a comment, and leaving out the semi-colon at the end of the line causes it to print the answer to the console after executing the statement.

60

The GNU Octave solution was harder to get working than the Libre Office method.

However, once it was working it is easier to use repeatedly.

If I were to want to automatically generate audio files with different filtering or other options and wanted to script the creation of a large number of images showing the results, this would be the way to do it.

61

When your run the Octave script you may get a warning which says something like

"QSocketNotifier: Can only be used with threads started with QThread".

This is apparently a routine warning message from the Qt graphics system which has no real significance in this context and can be ignored for our purposes.

--------------------

62

We now have a bash script which will use sox to extract the data from a flac file, and a GNU Octave script which can be used to display the resulting frequency spectrum.

This does more or less the same thing as "Plot Spectrum" does in Audacity, but allows for zooming and panning to get a more detailed look at the data.

63

However it doesn't give you an absolute reading of the sound levels in dB, something that Audacity does provide.

What I wanted it for though was to find the frequencies of the audible noise in the signal, something that it does quite well.

--------------------

#!/usr/bin/env octave

% Perform an FFT on the data in a file and plot the results.

% ======================================================================

% The sampling frequency. This must be changed to accommodate the

% actual sampling frequency if it was something else.

samplefreq = 44100;

% Thickness of line on plot.

linewidth = 2;

% ======================================================================

% The name of the data file is passed as a argument.

args = argv();

if length(args) < 3

quit

endif

% File name.

fname = args{1};

% Clip the peak values.

peakclip = str2double(args{2});

% How much data to show, in kHz.

rbound = str2double(args{3}) * 1000;

% The optional file name to save a chart image to.

if length(args) > 3

chartfile = args{4};

else

chartfile = "";

endif

% ======================================================================

% Read the data in from the file.

sampledata = dlmread(fname);

% Number of samples.

samplecount = length(sampledata);

% ======================================================================

% Window the data. This helps deal with the discontinuity of data at

% each end of the array and the effects this has on introducing apparent

% noise into the signal.

windoweddata = (hamming(samplecount) .* sampledata);

% ======================================================================

% Do the actual FFT.

fftresults = fft(windoweddata);

% Get real component.

r = real(fftresults);

% Get the imaginary component.

i = imag(fftresults);

% Combine the real and imaginary. In order to square each element of each

% array, we must use the ".^" operator, not just "^".

rfft = sqrt(r.^2 + i.^2);

realfft = rfft(1:samplecount);

% ======================================================================

% Scale factor for frequency.

fscale = samplefreq / samplecount;

% X axis scale, scaled to frequency.

f = (0:samplefreq/2) * fscale;

% Take a subset of the data if specified. rbound has to be re-scaled

% from kHz to array increments.

freq = f(1:min(rbound / fscale,length(f)));

% y axis. We take the absolute value and then limit (clip) the peaks

% so that a few large peaks don't obscure the smaller ones.

mag = min(abs(realfft(1: length(freq))), peakclip);

% Plot the results.

figure;

whandle = plot(freq, mag, 'LineWidth', linewidth);

title(["Audio Spectrum of ", fname]);

xlabel("Frequency (Hz)");

ylabel("Unscaled Magnitude");

grid on;

% If the appropriate optional argument was specified, save the chart

% to a file of that name.

if length(chartfile) > 4

saveas(gcf, chartfile, "png");

endif

% Need this so the plot window stays open.

waitfor(whandle);

% ======================================================================

--------------------

This is the shell script used with the above Octave script.

The arguments are

1 - the file name for the input data file.

2 - The value to clip the peaks at.

3 - The upper frequency bound in kHz.

4 - The output graphics file name.

#!/bin/bash

octave octavespectrum.m hsnoisemono.txt 10 12 hsnoisemono.png

--------------------

64 Episode Conclusion

In this episode we covered the following topics.

What Fourier transforms are.

Extracting data from audio files using Sox.

Analyzing the data using Libre Office.

Analyzing the data using GNU Octave.

And, several alternative analysis methods.

65 Series Conclusion

This is the end of a four part series on simple podcasting.

In the first episode, we covered a simple podcast recording method. This first episode is all you really need to make a podcast.

66

In the second episode we covered basic filtering and a few other simple topics. The methods discussed in that episode provide basic improvements to your audio if you feel the need for it.

67

In the third episode we covered how to analyze audio noise problems using Audacity and additional filtering techniques to deal with specific problems that we may find.

We also covered command line recording, playback, and getting information about an audio recording.

68

In the fourth episode we engaged in a bit of gratuitous hackery for the fun of it and showed how to use alternative software methods to analyze audio signals.

69

I hope that this series has been both useful and entertaining and that you will use the knowledge gained here to create and submit your own HPR podcast episodes.

--------------------

--------------------


Comments

Subscribe to the comments RSS feed.

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Title:
Comment:
Anti Spam Question: What does the letter P in HPR stand for?