DHS screens roughly 80 million people a month for travel and immigration purposes, which is one of many "big data" challenges the agency faces. (Kevork Djansezian / Getty Images)
For a glimpse of the “big data” challenge at the Department of Homeland Security, consider this: DHS screens roughly 80 million people a month for travel and immigration purposes.
Big data is an amorphous term used to describe the skyrocketing volumes of digital information that businesses and government agencies now collect, store and analyze. Although DHS officials don’t keep public figures on that growth, Donna Roy, executive director at DHS’ Information Sharing Environment office, called it “staggering.” The department now measures the data it wields in terms of “petabytes,” equal to about 1 million gigabytes.
“It’s a technical challenge, it’s a business challenge and it’s an enormous opportunity,” said Bailey Spencer, director of federal, civilian and homeland security sales at SAS, a North Carolina maker of analytic software. Under one recently signed contract with DHS’ Office of Health Affairs, for example, SAS will be analyzing information from Twitter and other social media for early warning signs of a bioterrorist attack or epidemic, Spencer said.
While agencies have always collected data, they now have to figure out how to structure and use the information, said Jay Kalath, vice president and general manager for the national security sector at Array Information Technology, another DHS contractor headquartered in Maryland.
Although other agencies face similar data management hurdles, at DHS they are magnified by the department’s size and its role in guarding against terrorist attacks. For 2012 alone, DHS officials set aside $5.6 billion for information technology investments, according to a Government Accountability Office report released last month.
An added complication is an increase in the variety of formats, such as satellite imagery or streaming video. At DHS, about one-third of the time needed to prepare big data for analysis is tied up in scrubbing and restructuring the information to increase its usefulness.
DHS has pilot projects underway to speed up analysis of “highly valuable” data — particularly for counterterrorism purposes — while at the same time protecting individuals’ privacy, Roy said.
While she declined to discuss specific examples, a DHS-funded Center of Excellence led by Purdue and Rutgers universities is researching ways to analyze visual information with a goal of more quickly getting threat information to law enforcement and other emergency responders. While the 22 DHS components have their own IT systems, it would be too expensive to consolidate them, so the focus instead is on standardizing data as that information moves between them, Roy said.
But if data volumes are growing, agency budgets are not. Despite progress, five of eight key DHS information-sharing projects face funding shortfalls, GAO said in a separate report last month. Among them is the department’s top priority: a searchable index to streamline access to intelligence and law enforcement information across DHS.
While DHS funding is adequate for the current mission, Roy said, “we’re trying to be more agile and come to work on solutions faster.” As for what lies ahead in the next five years, she expects the growth of data to increase “exponentially” but looks ahead to leveraging efforts elsewhere in government and business to better manage the load.
In March, for example, the White House Office of Science and Technology Policy announced a “Big Data Research and Development” initiative to spur work in the field.
Down the road, Roy said, her hope is that such undertakings will help address the challenges of both data “volume and velocity.”