Extracting Maximum Value from Data
Learning Using Privileged Information (LUPI)
LUPI is a new supervised machine learning paradigm that enables leveraging data available only at training time to learn highly accurate models. This data is referred to as “privileged” information, because it is available only during the learning or model formation phase. The net effect of using this additional data is that LUPI can learn accurately from a much smaller number of examples than existing analytics algorithms; more precisely, if existing algorithms need D data records to achieve a certain degree of accuracy, then LUPI only needs √D data records to achieve the same accuracy. LUPI has been successfully applied to various domains including cyber security, video analytics, healthcare and other areas.
Automation is critical to cost effectively resolve data quality and transformation issues. Our Arroyo Data Transformation Tool automates the data extract, transform and load activities allowing visual exploration of data for pattern detection. Arroyo is a high performance tool which scales up to the most demanding transaction volumes involving terabytes of data and billions of records. It is able to read data sources in many formats including unstructured text documents to extract named entities, events and relationships and to sort, classify, compare, transform and store the data in various ways.
Arroyo may be used for one-time data manipulation for systems migration projects and for continuous data manipulation with links to systems with incompatible data models. Arroyo addresses data quality challenges where rules and solutions differ from domain to domain, flexible resolution rules are essential and processing and cleaning varies with the nature of the data.