Interfaces for analysis of long duration audio recordings

I’m a PhD student at Queensland University of Technology. I’m approaching the end of my degree. My final seminar was in late July, and I’m revising my thesis before I submit it for external examination. My work was in the discipline of Human-Computer Interaction. My research looked into computer interface designs for enabling bird watchers to analyse long duration audio recordings from the natural terrestrial environment.

I did this because acoustic sensors are becoming more and more popular as a means of biodiversity monitoring. However, they produce large amounts of data that need to be processed. Automated processing is advancing rapidly, although raw recordings of the environment can be complex with overlapping and distant sounds. There is also the need for example sounds to train automated algorithms. Birdcalls are some of the target sounds, and dedicated bird watchers have a wealth of experience and knowledge that cannot be found anywhere else. Therefore, it would be beneficial to enable bird watchers to share their knowledge by analysing audio recordings.

Common interfaces for audio playback, metadata, and organisation are not suitable for long duration audio that needs to be analysed. This presents more than just technical implementation hurdles: the everyday interactions with sound for most people are around communication, ambient sound, or music.

The interface for the VLC media player is a good example:

VLC media player interface

VLC media player interface

VLC can play both audio and video. There are many other programs that can play multimedia. They all have very similar interfaces. Some differ in the situation they will be used, for example media players on smart devices need to be designed to be accessible in a pocket or attached to an arm. However, their purpose is to play music or for communication. Most songs are between 3 to 5 minutes in duration. Radio and communication via Skype or other voice call software often gives only the duration of the call from the beginning, or the point within the current program for radio.

Analysis of long duration audio recordings requires not only a playback interface that can deal with recordings up to a day in duration, but also provide some means of seeking within a recording, and a method of adding metadata to the recording. Any metadata or annotations need to remain attached to the recording. Current software simply is not appropriate for analysing long duration recordings. There is software for audio editing, such as Audacity, which provides tools for modifying and visualising audio. The visuals can be a waveform (a time-amplitude representation of sound) or a spectrogram (a time-frequency-amplitude representation of sound).

Here’s an audio file loaded in Audcity showing the spectrogram:

Audacity interface showing spectrogram

Audacity interface showing spectrogram

Audacity is an improvement, as the spectrogram visualisation and other tools make it easier to work with audio files. However, many files are more than 1 gigabyte in size, which is often more than Audacity can handle. This screenshot also shows only 16 seconds of audio. For a day long recording (86400 seconds), there are another 5400 16-second segments. The problem of how to associate metadata has still not been solved. There are software programs created specifically for analysis of environmental acoustic data, such as Song Scope and Raven. While these programs do allow for annotation and automated analysis, they are not suited for many long duration audio files and concentrate on automated analysis.

An interface that effectively enables birders to analyse long duration recordings needs a few things:

  • A visualisation, most likely a spectrogram, to provide visual context for the sound
  • A way to deal with the long duration (seeking over hours, days, or months rather than minutes)
  • A method for attaching information about the identity of the birds that made calls (as metadata or annotations)
  • Information about the date and time, location, surroundings, and device used to record

For my research, I created two prototype websites to investigate how bird watchers could analyse many long duration audio recordings. I looked at how bird watchers could navigate long duration recordings and attach metadata through annotations. This included information about the recordings, such as where and when they were recorded along with a photographs of the surrounding area. In a subsequent post I will go through the prototypes I created and how they were tested.

Exploring Digital Ocean and Dokku

I recently needed to host two websites: one using node.js and one a Ruby on Rails site. I do have an account with bluehost, although setting up recent node and Rails sites on bluehost is something I’ve attempted in the past, and it was rather difficult.

Instead, I gave heroku at go. It went quite well, I was particularly pleased at the deployment process. However, I could see that the prices could quickly build up, particularly as I needed at database and potentially a number of other add-ons. I had a look around for other options, and I found a list of cloud services for Rails, ranging from Platform as a Service (PaaS) to Infrastructure as a Service (IaaS).

I liked the look of the Digital Ocean (DO) droplet pricing ($10/month for a 30GB SSD with 1GB RAM). I then quickly found Dokku:

The smallest PaaS implementation you’ve ever seen

Docker powered mini-Heroku in around 200 lines of Bash

I thought I’d give it a go, and I’ve been quickly hooked. There are quite a few blog posts about setting up Dokku on Digital Ocean. Some of the ones I made use of are:

Here’s my take, from a new DO droplet to a running server with deployed applications.


In this tutorial


Indicates the command is to be run on the local machine, while


means the command is to be run on the droplet.


Let’s get going! Keep the Dokku documentation handy, you’ll most likely want to consult it at some point.

  1. Launch a DO droplet with Dokku and Ubuntu 14.04.
  2. I’d suggest making the droplet name match the domain name that will be used.
  3. Go to http://<droplet-ip>.
  4. Fill in public key with your personal public key.
  5. Set the domain name. This is recommended to be a subdomain. You don’t have to, however things will work much better if you do. An IP address makes it impossible to use sub-sub-domains. For example, I used
  6. Tick Use virtualhost naming for apps only if you set a domain name.
  7. Click Finish Setup

If you choose not to use the Dokku application image from Digital Ocean, don’t forget to add your public key to Dokku.

local$ cat ~/.ssh/ | ssh [sudouser]@[yourdomain].com "sudo sshcommand acl-add dokku [description]"


You’ll need to set some DNS records:

  • ‘A’ record named <subdomain> with droplet IP address (e.g. from).
  • ‘A’ record named *.<subdomain> with droplet IP address (e.g. *.from).

Set up droplet

Now we can set up the droplet. These steps are a condensed version of the Digital Ocean Ubuntu set up guide.

Create a non-root user

Using the root user for everything is really not a great idea.

local$ ssh root@<droplet-ip>
$ apt-get update
$ adduser <user name>

Make sure you set a strong password for the user, and don’t lose it! You’ll need it for dokku commands.

$ gpasswd -a <user name> sudo
$ su - <user name>
$ cd ~
$ mkdir .ssh
$ chmod 700 .ssh
$ vi .ssh/authorized_keys

Add your personal public key to the authorized_keys file.

$ chmod 600 .ssh/authorized_keys
$ exit

Disable password and root login by editing ssh config

It is better if root is not allowed to log in at all.

$ vi /etc/ssh/sshd_config

Disable password logins and root login by changing these lines:

PasswordAuthentication no
PermitRootLogin no

Restart the ssh service so the changes take effect

$ service ssh restart

Check that <user name> can log in before exiting the ssh session. Keep the root session open until you confirm that you can run sudo.

Start a new console session and log in as the new user:

local$ ssh <user name>@<droplet-ip or host name>
$ sudo apt-get update
$ sudo apt-get dist-upgrade

If that works, exit the root ssh session.

Configure firewall

Ensure ssh logins are allowed

$ sudo ufw allow ssh

Also allow ports 80 (http) and 443 (https) for web traffic

$ sudo ufw allow 80/tcp
$ sudo ufw allow 443/tcp

Check the exceptions

$ sudo ufw show added

Enable the firewall

$ sudo ufw enable

Configure Time

Set time zone by following the instructions to choose your timezone

$ sudo dpkg-reconfigure tzdata

Install NTP sync so the system clock stays at the correct time

$ sudo apt-get install ntp

Create a swapfile

There may not be one in the Ubuntu droplet, which I found odd.

$ sudo fallocate -l <RAM x 2>G /swapfile
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile
$ sudo sh -c 'echo "/swapfile none swap sw 0 0" >> /etc/fstab'

Set host name

If the droplet’s host name is the same as the domain name, there is nothing to do.

Otherwise, these tutorials might be helpful:


Now to deploy an application to dokku! For example:

local$ mkdir ~/projects
local$ cd  ~/projects
local$ git clone <git url>
local$ cd ./<application name>

You’ll need to set a new git remote.

local$ git remote -v
local$ git remote add dokku dokku@<subdomain>:<application name>
local$ git push dokku master

If the deploy was successful, have a look at the website at

http://<application name>.<subdomain>

Application setting

YOu might need to set some application settings, such a Rails secret_key_base or a database connection string. I chose to do this via environment variables. This is not my preferred method, but it was quick and simple. Set configuration vars like this:

$ dokku config:set power-outages secret_token=<long string of characters> secret_key_base=<long string of characters>

Add a database

Some applications need a database. I’m a fan of Postgres. Add the dokku postgresql plugin and create a database

$ sudo dokku plugin:install postgres
$ export POSTGRES_IMAGE="postgres"
$ dokku postgres:create <application name>-db
$ dokku postgres:link <application name>-db <application name>

Then set the databse connection string as a configuration setting:

$ dokku config:set DATABASE_URL="postgres://<db user>:<db pass>@<dokku ip>:<db port>/<db name>"

For a Rails application, you may need to run migrations:

$ dokku run <application name> bundle exec rake db:migrate


There we go, from creating a Digital Ocean droplet to deploying an application.

The trouble with normal

This is something I wrote in Feburary 2013. I’d like to share it.

Normal seems easy, common, correct, non-threatening. It can feel like “why would anyone want anything else?” Problem is it leaves no room for experimentation, no space to grow and change, no leeway to explore. Because no two people are quite the same. Bundling the, for most unimaginable, range of relationships and sexual expression into a prepackaged ‘this is what people want’. There are so many decisions overriden by this. Infinite possibilities left unexplored. There is very little room for personal choice when a ‘normal’ holds such sway.

We live in a society scared of itself. Seemingly unable to accept difference, disability, experimentation, varied life paths. Respect, compassion and understanding are deemed radical, as if we are not all part of the same world, as if we do not share resources, as if we do not have our common and individual triumphs and failures.

We would trust in our opinions of experiences and ideas that we have little knowledge of, rather than go out and observe and participate. Assume the worst of people we do not understand, expending no time to gain that understanding.

I believe we all must have the right, and ability, to make our own decisions based on experience. Real experience, real options. Not prepackaged, not filtered, but the full spectrum, without interference. Similar to the idea of freedom of speech, the responses to expressions are the way to show agreement or not, rather than pre-screening and restricting the possibilities.

Respect, not loathing. Compassion, not suspicion. Understanding, not fear.

The Avid Reader Reader

I visited Avid Reader in Brisbane’s West End for National Bookshop Day on Saturday 8 August. In addition to some peaceful colouring in while eating cupcakes, I was lucky enough to loan one of only 40 ‘The Avid Reader Reader’ books handmade for the day by the staff. The books are rather special ‘concertina’ or ‘accordion’ style books - each page is joined to the two neighbours, rather than to the spine.

The contents of the books were also written by Avid Reader writers, with stories and illustrations. I’m currently reading it. My favourite sentence so far is from Jon and Monica by Sally Olds

“He gets excited and overreaches, and Monica gently collects his identity and hands it back, like a friendly neighbour returning a rowdy dog.”

I’m keen to keep reading and finding more gems in this rather special handmade book.

Humans (TV Show, 2015)

I’ve been watching more of the free content on ABC iView and SBS OnDemand. There are some really good TV series available sometimes, and feels like there’s more and more being brought in quickly from the UK, US, and various other countries. There’s also some decent Australian content appearing as well.

I just watched the first episode of Humans. It’s from the UK, and is another take on androids, or ‘synths’ - robots made to look like people, created mainly to do menial tasks. The first episode mentions ‘Asimov laws’ - the operating rules for a robot. It also discusses the singularity, the point at which AI is able to self-improve and replicate without human support. The plot of the first episode follows an everyday family that buys a synth. However, this synth is one of very few who has feelings, thoughts, and something approaching consciousness. I liked this take on an alternate modern reality. The only difference is the synths. There are the same cars, jobs, and problems.

I liked the extras available as well. I’m always interested to get some background into the people behind a movie or TV show, from actors and writers.