Census Innovation

Census Innovation

In addition to conventional sample surveys, Census and Statistics Department (C&SD) has been actively exploring new data sources, aiming to reduce data collection costs and the respondent burden, while ensuring the quality of statistics compiled. Incorporating such data into our work requires linking up with existing data processed by the Department, which can be challenging due to the substantial volume of records involved and differences in data format across the various sources. AI is an effective tool for handling these complex processes.

C&SD plans to utilise administrative data collected from various government departments more extensively and systematically starting from the 2026 Population Census, primarily in the following two areas:

First, we aim to replace some census questions (such as those on the rents of public housing and amounts of welfare payments) with administrative data, so as to reduce data collection costs and the burden on respondents. C&SD has employed self-developed AI-based record linkage tools to efficiently and accurately match census sample data with administrative records at the living quarters level.

Second, we aim to replace the “Short Form” questionnaire covering around 90% of all households in the 2031 Population Census with administrative data. Through comprehensive utilisation of anonymised immigration records, C&SD can now compile more precise population estimates without relying on the “Short Form” questionnaire, thereby significantly reducing the scale of operation and the costs involved.

It is expected that this new workflow of incorporating more administrative data and re-engineering work processes will significantly reduce costs. C&SD estimates that the total costs incurred for 2026 and 2031 Population Censuses will be reduced by 40%, saving around HK$680 million at current prices.


Please watch the video (Cantonese only) below for details.
[Show the video contents]
Male Statistician:

The Census and Statistics Department (C&SD) handles over a million trade declarations every month to compile the "External Merchandise Trade Statistics".

In the past, computer systems struggled to process textual data, and we were only able to manually validate a small portion of the trade declarations. This was really a significant challenge!

In recent years, we have developed two AI models using deep learning techniques to simulate the human brain's ability to recognise and analyse the information on trade declarations.

We trained these AI models using millions of records with commodity descriptions, enabling them to automatically validate the textual description on every new trade declaration, verify the commodity codes, and calculate whether the values and quantities of the commodities are reasonable.

By early 2024, we had fully implemented these AI models to process trade declarations, achieving promising results! The AI models can now validate approximately three million trade declaration records in just two and a half hours, significantly improving the quantity of trade declarations being validated and also the quality of statistics, while reducing over 40% of the manpower involved.

With the resources saved, we have established two new branches, Data Science Branch and Social Data Development Branch, and expanded our big data team to focus on the promotion and training on big data applications.

Female Statistician:

Population Censuses, conducted every ten years, traditionally required the participation of the entire population, with 10% completing a long questionnaire and the remaining 90% answering a short one. Population By-censuses, conducted between two full censuses, require only 10% of the population to complete the long questionnaire. The short questionnaire asks for basic demographic information such as age, sex, and whether the respondent is a permanent resident. These data are used to calculate Hong Kong's population base.

After analysing the 2021 Census data, the department discovered that administrative records from the Immigration Department, such as birth, death, and movement records, could already accurately reflect the demographic structure, which fulfill the same purpose as the short questionnaire.

Starting from 2026, we will conduct a population census every five years, with the scale similar to population by-census, where only 10% of the population will be selected to answer the long questionnaire. Together with the use of administrative data to calculate the population base, we can achieve results as accurate as a full census.

Additionally, we are actively utilising administrative data from other departments with a view to trimming down the long questionnaire. For example, data on floor area can be provided by the Housing Department and the Rating and Valuation Department, and data on welfare subsidies can be obtained from the Social Welfare Department, etc . By matching these data with our census records, we can reduce the number of questions respondents need to answer, saving both costs and respondents’ time!

The department estimates that by leveraging administrative data and re-organising workflows, we can save 40% of the costs for the 2026 and 2031 censuses combined, amounting to approximately 680 million dollars!

Both Statisticians: In the future, the C&SD will continue to explore the applications of new technologies, streamline workflows, and optimise manpower to provide higher-quality statistical services to the government and the public!