The Recovery Board handles massive amounts of data to provide transparency in the distribution of American Recovery and Reinvestment Act and Hurricane Sandy funds. Extracting useful information from that data is vital. (City of Freemont)
How one agency does it
Step 1: Analyze business requirements
■ Who are your customers?
■ What questions need to be answered?
■ Are there their privacy concerns with answering respective questions?
Step 2: Data discovery
■ What data exists to support answering respective questions?
■ Who are the data owners?
■ What is the refresh cycle for each data source?
■ What is the unverified data quality?
■ What are the data retrieval alternatives?
Step 3: Develop the back end
■ Develop queries to answer respective questions.
■ Validate the data quality.
■ Develop logic to increase the data quality.
Step 4: Determine business intelligence requirements
■ Conduct focus group analysis.
■ Evaluate user interface solution alternatives.
Step 5: Develop the user interface
■ Build user interface working with focus groups.
Source: Recovery Accountability and Transparency Board
Federal agencies collect massive amounts of data to perform a variety of tasks, including cybersecurity, fraud detection, crime prevention, medical research, weather modeling, intellectual property protection, operation efficiency and situational awareness.
A whole technology ecosystem is evolving to help organizations derive value from data that comes in different formats and from myriad sources: sensors, disparate databases, documents, websites and social media. Before investing in any technology, though, federal managers must know what problems they want to solve, and what questions they want to ask of the data, federal managers and experts said.
Step One: Determine what questions need to be asked of the data. “You can’t build anything if you don’t understand the context of what questions you are trying to ask,” said Shawn Kingsberry, chief information officer with the Recovery Accountability and Transparency Board.
The Recovery Board handles massive amounts of data in its role as a nonpartisan government agency charged with providing transparency of American Recovery and Reinvestment Act-related funds and the detection and prevention of fraud, waste and mismanagement. Best known for the Recovery.gov website, the board also provides information about the distribution and spending of Hurricane Sandy funds.
After determining the right questions to ask of the data, managers can start to build a profile of the customer, Kingsberry said. For instance, Recovery.gov was not built for the board. It was built for citizens, Congress and good-governance groups. The board got feedback from focus groups about what information they wanted to see and how they wanted it displayed before even talking about technology.
Step Three: Decide what it means for the agency to technically answer these questions. In some cases, the questions are so broad that technology is needed to lead agency managers to discover questions associated with data they had not thought about.
“It is a systematic process that you go through and if you go through it appropriately, you will have a higher level of success,” Kingsberry said.
From a technology perspective, the Recovery Board uses a slew of tools to process and analyze data: Microsoft, SAP and SAS Analytics, for example. For link analysis the Recovery Board uses technology from Palantir Technologies and connects it into the board’s data federation framework, which allows for the linkage of disparate entities. “If you bring in data right and organize it, you can connect a variety of tools into the same data repository to decrease the lock-in to any product,” Kingsberry said. But the process begins with being data-focused.
“Data is being generated at an alarming rate. So we want to do better things with that data,” said Chris Westphal, director of analytics technology within Raytheon’s Cyber Products group. A federated platform that can pull data from disparate sources into a single repository coupled with comprehensive analytics capabilities should be an integral part of any organization’s big data arsenal, he said.
Raytheon provides government agencies with The Data Clarity Platform, a federated search and analytics enterprise application that speeds the detection of and response to cyber threats, fraud, criminal activity and counterterrorism. The platform’s search capabilities rapidly accesses vast amounts of information located across multiple sources without moving or copying the data from its original source. The data is returned as easily digestible relevant results in seconds. Analytics, which aids in the discovery and communication of meaningful patterns in data, is the engine that drives the platform. In the future, “big data will morph into big analytics,” Westphal said.
“The challenge that federal IT organizations have is how to make big data small and relevant,” said Anil Karmel, a former deputy chief technology officer with the National Nuclear Security Administration and now CEO of C2 Labs, Inc., a cloud security and services company.
To address that challenge, federal agencies need a chief data officer and data architects or scientists. The chief data officer would keep the chief information officer and chief information security officer better informed about the value of their information and how to interact with that information to make it useful. Chief data architects/scientists are needed to design the data infrastructure and quantify the value of the data at its lowest common elements, Karmel said.