The Census Bureau wants $22.3 million to create an enterprise data lake, according to its fiscal 2021 budget request.
The enterprise data lake, officials wrote in the budget documents, will help the Bureau “modernize data storage and data analysis capabilities across all of its directorates.” A new platform to manage the repository of raw data will build on the successes of the Bureau’s data lake designed for the 2020 census, according to budget documents.
“This platform will increase the Bureau’s capability to ingest the ever-increasing volume of administrative records, improve the quality of data products and apply disclosure avoidance to protect PII data as required by ... data protection laws,” Census Bureau officials wrote in budget documents.
The requested data lake will “improve access” to large amounts of economic and demographic data, increased the amount of data the Bureau can ingest, and integrate and analyze survey and administrative data using tools like cloud, artificial intelligence, big data analytics and machine learning.
In a statement to Federal Times, Census Chief Information Officer Kevin Smith said the requested data lake is all about efficiency.
“The EDL will provide us the capability to support the processing of big datasets quickly and easily with large dynamically scalable compute and storage capabilities throughout the enterprise," Smith said. "We will see some reduction in redundant data, while increasing the data that is useful to our mission. Furthermore, the EDL will support the Census Bureau’s longstanding leadership in data analytics and technology. This includes accelerating data innovation, realizing benefits through standardization and using cloud and open source technology.”
The data lake that the Bureau built to store data from the 2020 census, which starts April 1, would be integrated into the enterprise data lake to assist with policy-making and research. With the new data lake, budget officials wrote, the Bureau would be able to improve links between disparate data sources and produce research more quickly.
“It will consolidate currently decentralized data management and storage systems, dispersed security and privacy implementations, and resolve technology limitations across the survey and data lifecycle,” budget officials wrote.
In FY21, the Bureau would transition the 2020 census data over the enterprise data lake, according to budget documents. In the following fiscal years, the data lake would expand to processing third-party data, administrative data and “at least” one new survey to create a new data product.
The enterprise data lake would require 30 full-time employees, the documents showed.
The request by the Census Bureau comes at a time when government agencies are beginning to grapple with how to handle the data they collect every day. The Office of Management and Budget released the federal data strategy and its year-one action plan last year in an effort to facilitate better data management and treat data as a strategic asset.