Dahu Arc uses a number of components to provide a complete Data Discovery solution. The tools include content connectors, advanced processing engines and user interfaces. Arc is designed to work with many commercial and open-source search engine platforms and runs on Windows or Linux, on-premise or in the cloud, or of course a combination of these. The components can be used to enrich an already-existing enterprise search solution, or to build a complete solution for data discovery and GDPR processing.
Dahu Edge is our unique series of content connectors designed to find and gather content from all your unstructured content repositories. Our connectors are specifically built to support and optimise data discovery. For instance, unlike normal search indexing, we keep records of all duplicate instances of content so we get a true picture of your data estate. Even when the content might normally be skipped due to size, content type or security, we always create a record with all the available metadata. Connectors available include Databases, File Systems and coming soon, cloud storage including Google Docs and Microsoft One Drive.
To make calculations about the level of risk in your unstructured content, you need to be able to identify all the personal and sensitive data in held that content, and make it available for analysis. This is what Dahu Vector is designed to do. It has a extensive rule base that allows it to discover all the GDPR-stipulated sensitive data types and personal references in any content that flows through it. It relies on a series of complementary technologies to do the identification including machine learning, NLP and pattern matching. It's vital to be able to understand and explain the decisions that processing systems take so we designed the processes that Vector uses to be fully audit-able.
To leverage the data we discovere in the content using Dahu Edge connectors and Dahu Vector, you need supporting applications that can use that data in a way that is focused on the specific GDPR tasks. Dahu Surface is our User Interface platform that allows us to provide search tools and dashboards to support the SAR process and also track risk when undertaking your initial risk assessment or running a Data Protection Impact Assessment (DPIA) prior to a content processing task. Surface allows User Interface API translation such that our interfaces can work on most current search engine technologies.
How does Dahu ARC fit in your environment?
Dahu takes a very consultative aproach to providing solutions to our customers. We have a long history of providing search, discovery and analytics consultancy to many customers over the years. With that in mind, we deisgned ARC to be a complete solution for GDPR Data Discovery, or alternatively, as a series of components to use in existing systems. Lets discuss a few possible scenarios for using Dahu Edge connectors, Dahu Vector processing engine and Dahu Surface UI services.
Add Personal and Sensitive Data Discovery to your existing solutions
It is likely that you already have a significant investment in Enterprise Search technologies, possibly working hard on data-discovery duties. You might also have other big-data solutions that process your unstructured or unmanaged content. Perhaps you need to scan content as you prepare for migration to the Cloud, or prepare to put it under control in a records management system. For these kind of scenarios, you can use Dahu Vector, our content processing platform, to identify personal and sensitive data in your content as it is processed and use the resulting identified elements to make risk-based decisions.
In this scenario, we assume you have a fairly typical Enterprise Search infrastructure with some content connectors, a pipeline processing environment and some existing search tools to make use of the indexed data.
We can augment the processing that occurs in the pipeline by calling out to Dahu Vector to identify and markup references to personal and sensitive data. This data, including the specific type of data and the position information can then be indexed and used in your search applications.
This approach would let you augment your existing infrastructure and investment and allow you to meet your Subject Access Request (SAR) obligations.
Add content for Data Discovery to you existing Search
Being able to assess the level of risk across all your content is an important part of GDPR readyness. This means you need to be able to connect to the data where-ever it is. Dahu Edge connectors are designed to do this, and to record details on every file or document found.
Here, we are using Dahu Edge connectors with an existing search system to extend its reach. The content might be on premise or might be in the cloud. If its in the cloud, we can run the Edge connectors in the cloud to avoid pulling all the content back out of the cloud. Edge maintains its own state information on every file or document it finds and automatically decides how often to revisit content areas to suit the frequency of changes. Edge works seemlessly with Vecor to process the content before passing it to an indexer.
In our scenario we are focused on integrating with search systems, but Edge and Vector are just as applicable to other data processing solutions, such as migtration tools or big-data suites. Vector can be configured with a variety of 'output' plugins to allow the discovered and enriched content to be directed wherever is most appropriate.
Deploy a full solution for GDPR Data Discovery
Of course, its possible to use Dahu Arc to provide a complete GDPR Data Discovery solution. This provides the necessary data connectivity and processing as well as the interfaces to manage your SARs as well as dashboards to manage your DPIAs and initial gap-analysis.
Our interfaces (coming very soon) are built using best-of-breed web tools and techniques and are simple to install in most environments. They can run under our own Def Surface transformation engine, or under any application server or simple HTTP server.
Dahu Surface can host the search interfaces itself, providing in-line translation from popular search systems so that the interfaces work with existing tools and don't need to be re-coded. Surface also provides APIs to allow other systems to make use of the discovered personal and sensitve data and the associated risk profiles for the content.