In a world-leading project, Statistics NZ is developing a new way of estimating and forecasting the population, making greater use of a wide range of data sources using Bayesian methodology.

The beta code for this is now freely available on open-source software community GitHub, so others can test it, fix problems, and offer ways to improve it.

The software packages are part of a wider effort to figure out how to produce population statistics from administrative data, such as tax or health-system data. Population statistics are an essential input to thousands of decisions, from where to place a new maternity hospital, to how large to build a new bridge.

This Bayesian methodology is a core component of delivering a census based on administrative data, as currently being explored through our Census Transformation programme. It also makes it possible to produce statistics for smaller groups of people, such as life expectancy for individual cities or districts.

The new approach is being watched by statistical agencies around the world. Statistics NZ is building a reputation as an international pioneer in new methods for population statistics. We are particularly known for the use of Bayesian methods. Bayesian methods are a 250-year-old approach to statistics (named after the Reverend Thomas Bayes) that is making big inroads in the data analysis world.

“The software packages now being made public are for demographic estimation and forecasting. For example, family structure in different parts of New Zealand or the population of the country in 20 years,” population statistics senior researcher Dr John Bryant says.

Statistics NZ staff developed the software in co-operation with others. The software package authors include Jenny Harlow from the University of Canterbury and Dr Junni Zhang from Peking University. Dr Zhang has worked with Dr Bryant on the theory behind the packages and they are writing a book together.

The software uses R – a free software environment for statistical computing and graphics, first developed at the Statistics Department at the University of Auckland. It has become the standard tool for modern data analysis.

“We released four R packages related to population statistics on GitHub on 30 September, with another two on population statistics to follow later this year. Statistics NZ will be releasing other packages on other topics over the next few months,” Dr Bryant says.

“We will place the packages on the official R repository in a few months when they have matured.”

View the R packages on Github.