Platform Requirements

Platform Requirements

Two of the Diskover’s deployment components, Elasticsearch and web-server, can be hosted on-premise or in the cloud, and the third component, Diskover file indexer, is typically deployed using the customer’s on-premise resources. Contact us for further technical specifications.

It is recommended to separate the Elasticsearch, web-server and indexing host(s). Indices ideally should be on SSD. NFS data stores do not usually perform well for indices.

📕 Access the Diskover Installation Guide for more information.

Multi-Platform Support

Diskover can integrate seamlessly into existing Linux, Windows, and Mac environments with native scanners, as well as with AWS (Amazon Web Services).

AWS Amazon Logo a Technological Partner of Diskover Data Curation Software
This image is the Linux Logo which is a supported platform of Diskover Data Curation Software
This image is the Microsoft Logo which is a supported platform of Diskover Data Curation Software
This image is the Mac OS logo which is a supported platform of Diskover Data Curation Software

Best Practices

Performance, scalability, or recovery issues outside of our recommended best practices of a minimum of 3 Elasticsearch nodes are not guaranteed/supported by Diskover and will incur a support charge of $10,000. Included in the list of problematic issues when buying less than 3 Elasticsearch nodes are, but not limited to, support for multiple geographic locations, high-frequency indexing, a large amount of data, and a large number of file systems.

Architecture Diagrams

Image representing how the data storage repositories are scanned through Elasticsearch and are visible in a single pane of glass via the Diskover Web user interface. The Image also depicts the unlimited scalability of Diskover.
Image representing how Elasticsearch's architecture interacts with Diskover, also some terminology definition to technical terms like nodes, clusters, indices, shard, replica, etc.

Prerequisites

Main Requirements

Python v3.5 +
Elasticsearch v7.x +
PHP v7.x + PHP-FPM
NGINX or Apache

Other Notes

Disabling SELinux and using a software firewall are optional although not required to run Diskover.
Internet access is required during the installation to download packages with yum.

Elasticsearch Domain

The foundation of the Diskover platform consists of a series of Elasticsearch indexes. These indexes are created and stored within the Elasticsearch endpoint. Elasticsearch is a scale out architecture using 1 to N nodes.

🖱️ Click here for more detailed Elasticsearch and AWS sizing guidelines.

🖱️ Click here for information on resilience in small clusters.

Elasticsearch Cluster

Production Deployments

Minimum 3 nodes for performance and redundancy
16 CPU cores per node
32 GB RAM per node (16 GB reserved to Elasticsearch memory heap
1 TB of SSD storage per node (see Elasticsearch Storage Requirements below)

Proof of Concept

Minimum of 1 node for testing
8 CPU cores
16 GB RAM per node (8 GB reserved to Elasticsearch memory heap)
500 GB of SSD storage per node (see Elasticsearch Storage Requirements below)

Indices

Rule of Thumb Shard Size

Try to keep shard size between 10 – 50 GB
Ideal shard size approximately 20 – 40 GB

Examples

Index that is 60 GB in size: you will want to set shards to 3 and replicas* to 1 or 2 and spread across 3 ES nodes.
Index that is 5 GB in size: you will want to set shards to 1 and replicas* to 1 or 2 and be on 1 ES node or spread across 3 ES nodes (recommended).
Replicas help with search performance, redundancy and provide fault tolerance. When you change shard/replica numbers, you have to delete the index and re-index.

Estimating Elasticsearch Storage Requirements

Individual Index Size

1 GB for every 5 million files / folders
20 GB for every 100 million files / folders
The size of the files is not relevant.

Rolling Indices

Each Diskover scan results in the creation of a new Elasticsearch index.
Multiple indexes can be maintained to keep the history of storage indices.
Elasticsearch overall storage requirements will depend on history index requirement
For rolling indices, you can multiply the amount of data generated for a storage index by the number of indices desired for retention period. For example, if you generate 2 GB for a day for a given storage index, and you want to keep 30 days of indices, 60 GB of storage is required to maintain a total of 30 indices.

Diskover-Web Server

The Diskover-Web HTML5 user interface requires a Web server platform. A Linux or Windows instance can be configured with applications to provide web serving capabilities is required. The Diskover-Web user interfaces provides visibility, analysis, and actions from the indexes that reside on the Elasticsearch endpoint.

Multiple indexers can be ran on a single machine or multiple machines for parallel crawling.

Linux

64-bit Red Hat Enterprise Linux Server v7.x, v8.x
64-bit CentOS v7.x, v8.x
Rocky v8.x
On an EC2 instance with 64-bit Amazon Linux 2.x

Windows

Windows 10
Windows Server
4 CPU cores 
8 GB RAM

Minimum

2 CPU cores 
4 GB RAM

Diskover Indexer(s)

You can install Diskover indexers on a server or virtual machine (VM). Multiple indexers can be ran on a single machine or multiple machines for parallel crawling.

Linux

64-bit Red Hat Enterprise Linux Server v7.x, v8.x 
64-bit CentOS v7.x, v8.x
Rocky v8.x

Windows

64-bit Windows v8, v10, v11
64-bit Windows Server

Mac

64-bit MacOS v10.x +
8-16 CPU cores  
8 GB RAM

Minimum

4 CPU cores
4 GB RAM

AWS Sizing Resource Requirements

Elastisearch Domain

The foundation of the Diskover platform consists of a series of Elasticsearch indexes. These indexes are created and stored within the AWS Elasticsearch endpoint. The recommended AWS nodes are:

Minimum

i3.large
i3.xlarge

EC2 Web-Server

The Diskover-Web HTML5 user interface requires a web-server platform. An EC2 instance configured with applications to provide web serving capabilities is required. The Diskover-Web user interfaces provides visibility, analysis, and actions from the indexes that reside on the AWS Elasticsearch endpoint. The recommended EC2 instances are:

Minimum

t3.small
t3.medium

Indexer(s)

The recommended instances are:

Minimum

t3.large
t3.xlarge

Skills and Knowledge Requirements

Although the simplification of the installation and configuration of the Diskover software is in the works, as of now, the installation is intended to be performed by service professionals and system administrators. The installer should have strong familiarity with:

  • Operating System on which on premise Diskover file Indexer(s) are installed.
  • Basic knowledge of:
    • Operating System on which Diskover-Web HTML5 user interface is installed.
    • Configuring Web Server (Apache or NGINX).

Important!

⚠️ Attempting to configure Diskover Data Curation platform without proper experience or training can affect system performance and security configuration.

⏱️ The initial installation, configuration, and deployment of Diskover Data is expected to take between 1 to 3 hours depending on time consumed with network connectivity.

Get in Touch to Schedule a Demo or a 30 Day Free Trial

Scroll to Top