Hidden within the Data: Wildlife Strike Costs in Australia

21 Jul

From time to time, I like to write about wildlife-strike-related research. In my wrap-up of last September’s AAWHG Forum, I hinted at a presentation I delivered on wildlife strike costs in Australia and promised that more details were coming soon. Well, today is soon! A couple of nights ago, my first ever peer-reviewed academic journal article was published*, and it has the very scientific-sounding title of “Estimating the Cost of Wildlife Strikes in Australian Aviation Using Random Forest Modeling.”

Here is the header image:

Looks pretty cool, right?

C’mon, Dan, An Academic Journal?

Academic writing can sometimes be impenetrable. I accept that reading academic or scientific journals is not everyone’s cup of tea. Even when they are free, as this one is. Therefore, I am happy to explain my research project to you in real people’s terms.

The Headline

My colleagues and I used a complex algorithm to estimate the cost of wildlife strikes in Australia, and for the period 2008-2017, we think they cost the industry at least $12.7 million per year.

That’s the bottom line, but of course, there are a bunch of caveats and qualifications. Firstly, these are the costs associated with repairs and “other” airline costs. It does not represent the money airports and airlines spend on wildlife hazard management, i.e., strike prevention and mitigation. It is also likely to be underestimated because reporting these figures isn’t great, even when strike reporting is generally good.

And it doesn’t cover mortality and morbidity, which wasn’t a big issue in this dataset but has, unfortunately for Australia, become a consideration in the last year or so.

How did we do it?

Ok, let’s get a little academic for a moment. Before embarking on research, it is important to acknowledge the research that has come before you. When talking about wildlife strikes and especially wildlife strike costs, one is culturally obliged to cite John Allan’s seminal work and the ubiquitous figure of US $1.2 billion per year worldwide. This was hard-fought research that has remained the foundation of a lot of work in this area.

Cost estimates for the United States have been published regularly, with reported costs in the National Wildlife Strike Database (NWSD) being the basis of these calculations. In Australia, some attempts were made in the mid-2000s to share cost data, but the issue of commercial sensitivity has remained. This has meant that Australia has had no broad estimate of the cost of wildlife strikes to use as a baseline.

The challenge has been devising a way to leverage the data we have to develop a robust model for estimating costs. Modern advances in machine learning and computational power have provided an opportunity to make inroads. Enter Levi Altringer and a team from the US Department of Agriculture.

Mr Altringer and co. tested several machine learning algorithms to estimate the costs of wildlife strikes within the US NWSD. This is a process called “imputing missing values”. The estimate derived from the process used by this team was more nuanced than the previous method, albeit still likely to be an underestimate.

As is the style in science, I took the work of Altringer and co. and pushed it just a little further. My idea was to think of Australia’s wildlife strike database as an extension of the US NWSD. So, if I curated the data appropriately, I could “train” my model on the US cost data and then estimate the costs of Australian wildlife strikes. And so, results:

There was, of course, a little more work to do as what a strike costs in the US, in US dollars, using US labour on US aircraft, isn’t necessarily what it would cost in Australia. To tackle this challenge, I enlisted the help of Jason Ryan, who is doing a PhD in economics. So, we still had to battle commercial sensitivities, but I think we got a reasonable outcome.

When the above graph emerged from the data, I was delighted. The key feature that told me I was doing something more than taking an average and extrapolating was the independent movement in the cost graphs versus the reported strikes graph. This tells me the algorithm is looking much deeper into the data.

Why do this?

This research project wasn’t just an academic exercise to see if we could. The motivation remains the same as explained by John Allan, twenty years ago:

to gather the data necessary to evaluate the true cost-effectiveness of increasing bird control provisions

Airport operators, especially operations teams, have finite resources. While we do plenty of things airside because a rule says we must or an identified risk demands we address it, real discretionary spending needs data to assess things like return on investment or cost-benefit ratio. For wildlife hazard management, this data goes a long way to helping make the case for additional investment.

It doesn’t necessarily address the quirk of airports paying and airlines benefiting, but it facilitates better investment decisions and effective assessments of our wildlife hazard management activities.

What Else Can It Do?

This approach should allow an airport or airline to calculate their specific costs. I believe the algorithm is sufficiently complex to make estimates based on sub-sets of the target strike data. This is where the true cost-benefit analysis opportunities mentioned above exist.

This approach could also be used in jurisdictions with strike data but no cost information. And if you are interested in doing this, I would love to work with you on that project.

How Does it Work?

If you’re still reading, I assume you’re interested in a little more technical detail. Now, I’m not going to go full nerd on you but let’s take it up a notch.

Machine learning is a field of computer programming that uses a range of algorithms or techniques to crunch a lot of numbers looking for relationships between the data given to it. There are a bunch of different algorithms, with the one that gets most of the press called a neural network since it is analogous to a brain.

The one we used is called random forest. In a nutshell, this technique uses a decision tree approach and randomness to test out thousands and thousands of relationships in the data (called weightings). The relationships that are good at estimating known answers get more voting rights than the ones that don’t make good estimates. Judging good and bad decision trees comes from “training” the algorithm on data with known results - the US NWSD has this data.

In practice, you search for the best set of parameters to give the algorithm to train the most accurate model. Well, technically, the least bad model 😉. You then “test” these models on some “test” data you’ve kept hidden from the algorithm. Once you have the best parameters, you train the algorithm on all your available data. Then, you point this model at your data with the missing values and voila!

A lot of the above steps are heavily automated in programming, either using my favourite, Python, or R for real statisticians.

For those still reading, thanks for nerding out with me. If you’re interested in learning more or working on something cool together, feel free to send me an email or a message on LinkedIn. If you’re a fellow academic researcher and publishing in this area, don’t forget to cite the article 👍.

* Technically, I already have had a journal article peer-reviewed, and accepted, but it is taking a little longer to make its way through the publication process.

Wildlife ManagementMachine LearningPytResearchNerd!

Dan Parsons

Dan is an airport operations manager currently working at Queenstown Airport in beautiful New Zealand. His previous roles have included airport and non-process infrastructure operation manager in the mining industry, government inspector with the Civil Aviation Safety Authority and airport trainer. Dan’s special interests include risk management, leadership and process hacks to make running airports easier.

http://therunwaycentreline.com