| By Bob Gourley | Article Rating: |
|
| November 1, 2012 07:19 PM EDT | Reads: |
1,760 |
By RyanKamauff
Government Computer News has an in-depth examination of how text analytics are being used in the federal government. They examined how NASA is using text analytics for airline safety, how text analytics can “read between the lines” of terabytes of data, using text analytics to identify early signs of bio threats and using text analytics for agency data mining. The full four-part series can be found here, but we wanted to summarize and analyze it ourselves so we could give you our cut.
Bottom line: Great work by GCN. These guys are adding value to the dialog. Here are more thoughts:
NASA applies deep-diving text analytics to airline safety
NASA has created the Aviation Safety Program that uses text analytics to process hundreds of thousands of unstructured data reports. NASA collects data from pilot reports to mechanics logs in an attempt to identify problems, before they happen. This database was previously only viewed by human analysts, who do not have the time or cycles to process all the data. The machine processing starts with natural language processing (NLP) and machine-learning. For more, be sure to check out the full article here.
Text analytics: Reading between the lines of terabytes of data
DHS has started using text analytics to poll social media networks trying to identify signs of terrorism. Scanning social media is nothing new, but using machine learning text analytics is finding “hidden relationships” to highlight trends and public sentiment. Further details are scant, because of the pace at which adversaries adapt to the techniques, tactics and procedures (TTPs) of our governments. The article discusses capabilities that leverage Apache Hadoop, but doesn’t mention Hadoop for some reason. For the full article, check it out here.
Canary in a data mine: How analytics detects early signs of bio threats
The National Collaborative for Bio-Preparedness (NCB-Prepared) is using a system “to monitor emergency medical services reports, poison center data and a wide array of other data sets, including social media, to detect signs of biological threats.” By looking at reports, they were able to identify a gastrointestinal outbreak two months before it would have been identified by standard reporting. This system uses SAS text analytics running on North Carolina State University’s cloud-based Virtual Computing Lab. To read more, check out the full report here.
Text analytics ready for the heavy lifting of agencies’ data mining
The last article revolves around the growing need for unstructured data analytics in the federal government. It features one of our heroes, Chris Biow, CTO of MarkLogic.
Chris Biow, federal CTO at MarkLogic, agrees. “Any agency in the government that deals in any respect with the public should be to using text analytics now,” he told GCN. “It’s maybe only being used now in 20 percent of the cases where it should. It’s as broad as treaty compliance versus watching public sentiment toward the United States overseas to predict a riot. All of that is out there.”
MarkLogic’s Biow said the most critical thing in initial implementations of text analytics is to manage expectations because machines still are not nearly as good at analyzing text as humans are. “The machine’s advantage is that it can do all the text,” he explained. “[But] you don’t have enough human beings to read it all. The machines will make a pass-over and humans can then refine that. The machines are getting better in terms of the complexity and detail that they can extract, but not necessarily in terms of the quality. That’s why it’s important to set expectations.”
“The best practice here,” Biow said, “is setting reasonable expectations. And results can definitely be improved as your users, library scientists and text analytics vendors start working together.”
There are problems because many agencies do not talk about their text analytics out in public, it is hard to get data on solutions and successes. Biow further said managing expectations can be hard as machines have much left to learn. To continue this article, check out the report here.
There are many great points in this series that we liked and we most highly recommend the series. Thanks GCN.
We hope to see follow on work by GCN along these lines, perhaps diving into the new realm of Model Enabled Analysis and capabilities like Savanna from Thetus, which is showing a great path to helping humans interact with information like that described in this GCN series.

Read the original blog entry...
Published November 1, 2012 Reads 1,760
Copyright © 2012 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Bob Gourley
Bob Gourley, former CTO of the Defense Intelligence Agency (DIA), is Founder and CTO of Crucial Point LLC, a technology research and advisory firm providing fact based technology reviews in support of venture capital, private equity and emerging technology firms. He has extensive industry experience in intelligence and security and was awarded an intelligence community meritorious achievement award by AFCEA in 2008, and has also been recognized as an Infoworld Top 25 CTO and as one of the most fascinating communicators in Government IT by GovFresh.
- Cloud People: A Who's Who of Cloud Computing
- NIST to Sponsor FFRDC Widespread Adoption of Integrated CyberSecurity
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Cloud Expo New York | Big Data: What It Means for Legal & Risk Management
- Altova Announces General Availability of RaptorXML
- Reflections on the Future of Platform as a Service (PaaS)
- Big Data Will Revolutionize Learning
- Cloud Expo New York: Getting to the Promise of Big Data
- 2013 - 2016 : solutions stabilisées, usages innovants généralisés
- Cloud Expo New York: Cloud Architecture and Engineering
- Cloud People: A Who's Who of Cloud Computing
- Portable Experimenter’s Platform, Powered by Raspberry Pi
- Predixion Software Announces General Availability of the Latest Version of its Predictive Analytics Platform
- Cloud Expo New York: Real-Time Analytics Using an In-Memory Data Grid
- Cloud Expo New York: The Big Challenge of Big Data & Hadoop Integration
- NIST to Sponsor FFRDC Widespread Adoption of Integrated CyberSecurity
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- Agile Solutions for Cloud, Big Data, Mobility Services
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Cloud Computing: Cutting Costs, Boosting Profits
- AMAX Launches StorMax(TM) CFS, powered by IBM(R) General Parallel File System(TM) (GPFS(TM))
- Benefits of Cloud Computing
- The Top 250 Players in the Cloud Computing Ecosystem
- Web Services Using ColdFusion and Apache CXF
- Cloud People: A Who's Who of Cloud Computing
- Red Hat Named "Platinum Sponsor" of Virtualization Conference & Expo
- Cloud Expo New York Call for Papers Now Open
- Eclipse "Pollinate" Project to Integrate with Apache Beehive
- An Introduction to Ant
- Cloud Expo 2011 East To Attract 10,000 Delegates and 200 Exhibitors
- Beehive Code Now Available in Apache
- 4th International Cloud Computing Conference & Expo Starts Today
- Apache's Tomcat 5.5 is First Release Ever to Use Eclipse JDT Java Compiler
- "Beehive" Now Officially an Open Source Project: Apache Beehive























