RESOURCES | PLATFORM REQUIREMENTS

What you need to run Diskover.

Platform requirements.

Multi-platform support.

This image is the AWS Amazon Logo a Technological Partner of Diskover Data Curation Software
This image is the Linux Logo which is a supported platform of Diskover Data Curation Software
This image is the Microsoft Logo which is a supported platform of Diskover Data Curation Software
Apple logo
Docker logo

Best practices.

Diagrams.

Diskover Scale-Out Architecture Overview Diagram
Diskover Configuration Overview Diagram
Diskover Elasticsearch Terminology

Prerequisites.

Python v3.8 +
Elasticsearch v8.x
PHP v8.x + PHP-FPM
NGINX or Apache

Security

Disabling SELinux and using a software firewall is optional and not required to run Diskover.
Internet access is required during the installation to download packages with yum.

Operating Systems.

As per the config diagram, note that Windows and Mac are only supported for indexers.

Linux

Rocky 8 & 9
CentOS Stream 9
RHEL 8 & 9

Windows

Windows 10 & 11
Windows Server 2022

Mac

MacOS 10.11 ElCapitan +

Elasticsearch.

The foundation of the Diskover platform consists of a series of Elasticsearch indexes, which are created and stored within the Elasticsearch endpoint.

Elasticsearch cluster.

Minimum 3 nodes for performance and redundancy
8 to 32 CPU cores per node
64 GB RAM per node (16 GB reserved to Elasticsearch memory heap
500 to 1 TB of SSD storage per node (see Elasticsearch Storage Requirements below)

Proof of Concept

Minimum of 1 node for testing
8 to 32 CPU cores
8 to 16 GB RAM per node (8 GB reserved to Elasticsearch memory heap)
250 to 500 GB of SSD storage per node (see Elasticsearch Storage Requirements below)

Indices.

Rule of Thumb Shard Size

Try to keep shard size between 10 – 50 GB
Ideal shard size approximately 20 – 40 GB

Examples

Index that is 60 GB in size: you will want to set shards to 3 and replicas* to 1 or 2 and spread across 3 ES nodes.
Index that is 5 GB in size: you will want to set shards to 1 and replicas* to 1 or 2 and be on 1 ES node or spread across 3 ES nodes (recommended).
Replicas help with search performance, redundancy and provide fault tolerance. When you change shard/replica numbers, you have to delete the index and re-index.

Estimating Elasticsearch storage requirements.

Individual Index Size

1 GB for every 5 million files / folders
20 GB for every 100 million files / folders
The size of the files is not relevant.

Rolling Indices

Each Diskover scan results in the creation of a new Elasticsearch index.
Multiple indexes can be maintained to keep the history of storage indices.
Elasticsearch overall storage requirements will depend on history index requirement
For rolling indices, you can multiply the amount of data generated for a storage index by the number of indices desired for retention period. For example, if you generate 2 GB for a day for a given storage index, and you want to keep 30 days of indices, 60 GB of storage is required to maintain a total of 30 indices.

Diskover-Web server.

The Diskover-Web HTML5 user interface requires a Web server platform. It provides visibility, analysis, workflows, and file actions from the indexes that reside on the Elasticsearch endpoint.

Production Deployement

8 to 32 CPU cores 
8 to 16 GB RAM
250 to 500 GB SSD

Proof of Concept

8 to 32 CPU cores 
8 to 16 GB RAM
250 to 500 GB SSD

Diskover scanners.

Production Deployment

8 to 32 CPU cores 
8 to 16 GB RAM
250 to 500 GB SSD

Proof of Concept

8 to 32 CPU cores  
8 to 16 GB RAM  
250 to 500 GB SSD

AWS sizing resource.

Elastisearch Domain

Minimum

t3.large
t3.xlarge

EC2 Web-Server

Minimum

t3.small
t3.medium

Indexer(s)

Minimum

t3.large
t3.xlarge

Skills and knowledge.

The installation of Diskover is intended to be performed by service professionals and system administrators. The installer should have strong familiarity with:

Operating System on which on-premise Diskover file Indexer(s) are installed.
Basic knowledge of the Operating System on which Diskover-Web HTML5 user interface is installed.
Basic knowledge of configuring Web Server (Apache or NGINX).

Important!

Attempting to configure Diskover without proper experience or training can affect system performance and security configuration.
The initial installation, configuration, and deployment of Diskover Data is expected to take between 1 to 3 hours depending on time consumed with network connectivity.

LET’S TALK

Create order out of chaos with Diskover.

Scroll to Top