ELIAS

ELIAS is the CC-IN2P3 Elasticsearch service. It will allow you to optimize your text searches via Elasticsearch and visualize them as graphs organized in dashboards using Kibana and Grafana tools. A REST API is also available.

alternate text

Following the platform usage, different types of hosting are available:

  • a generic shared cluster,

  • a cluster adapted to specific needs.

To request access, please contact user support. If you have a particular need, please specify it in your request (context, reason and resource request) as for database requests.

You will then be assigned a namespace on ELIAS, along with the rights to associating one or more access accounts (human or robot) to it.

Note

The access URL to your ELIAS cluster will have the following syntax:

elias-${CLUSTERNAME}.cc.in2p3.fr

The $CLUSTERNAME variable depends on the cluster on which the user is hosted. The value of this variable will be given to the user along with their credentials.

Access the platform

All the services in ELIAS rely on a centralized authentication system and support several methods according to the account type.

Account types

An account (user or robot) is associated with one or more authentication methods and has a set of privileges.

User accounts will be identified by:

  • Kerberos,

  • Keycloack,

  • login / password.

Access methods

This method is recommended for interactive use, for example:

To be able to use Kerberos authentication, all you need is a computing-account, and to have it correctly configured.

After a simple request (a curl, for example):

% curl --negotiate -u: https://elias-elias-${CLUSTERNAME}.cc.in2p3.fr.cc.in2p3.fr:9200/

the Kerberos tickets list should look like:

% klist
Ticket cache: FILE:/tmp/krb5cc_1234_RrLMgX8bak
Default principal: username@CC.IN2P3.FR

Valid starting       Expires              Service principal
05/12/2020 10:49:33  05/15/2020 09:07:56  krbtgt/CC.IN2P3.FR@CC.IN2P3.FR
05/12/2020 10:50:30  05/15/2020 09:07:56  HTTP/ccosvmse0007.in2p3.fr@
05/12/2020 10:50:30  05/15/2020 09:07:56  HTTP/ccosvmse0007.in2p3.fr@CC.IN2P3.FR

The machine appearing as the main service HTTP/... corresponds to the ELIAS service entry point.

Services included in ELIAS

The REST API is an interface for exchanging data between web services. As an API client, you’ll use HTTP calls (GET, POST, PUT, …). To access the REST API, you’ll need to authenticate using one of the following methods:

For more information, check the following paragraphs:

Data ingestion in ELIAS

The ingestion of your data into ELIAS is only possible through the REST API.

The REST API can be leveraged by high-level libraries, some of which are maintained by elastic.co, or through command line tools like curl, wget, httpie, …

To store your data in ELIAS, three approaches are possible:

  • command line tools (curl) (practical for a first grip),

  • dedicated agent (Fluentbit, logstash, syslog-ng, …),

  • your own scripts using high-level libraries.

To specify the authentication method you can use the following options:

% kinit
% curl -u: --negotiate --cacert /path/to/elias.ca -H 'Content-Type: application/json' ...

Elasticsearch stores data in an index that can be compared to a table in relational databases. The following command connects via a certificate to ELIAS and creates an index named mynamespace-myindexname.

% curl -XPUT 'https://elias-${CLUSTERNAME}.cc.in2p3.fr:9200/mynamespace-myindexname?pretty' --key /path/to/mycertificate.pkey --cert /path/to/mycertificate.crt --cacert /path/to/elias.ca -H 'Content-Type: application/json'

Attention

The index name must respect the following syntax: <namespace>-<index>. You can create as many indexes as you want under your ELIAS space.

The name of the namespace is imposed by the ELIAS administrators and is communicated to you when creating your account.

An Elasticsearch index only manipulates JSON documents. A document is a collection of key/value tuples and it is the smallest unit that Elasticsearch manages. It is possible to store documents with completely heterogeneous data structures (under certain conditions).

Elasticsearch builds the schema as the documents arrive. However, it is possible to impose an explicit data structure by configuring the mapping.

The following example associates a minimal structure that the documents in the index mynamespace-myindexname must respect.

% curl -XPUT --key /path/to/mycertificate.pkey --cert /path/to/mycertificate.crt --cacert  /path/to/elias.ca -H 'Content-type: application/json' -XPUT 'http://localhost:9200/test3/_mapping' -d
'{
      "dynamic_templates": [
        {
          "template_stdField": {
            "path_match": "*",
            "mapping": {
              "ignore_malformed": true,
              "type": "keyword"
            }
          }
        }
      ],
      "properties": {
        "user_name": {
          "type": "text"
        }
      }
  }'

The following command defines an explicit mapping for the index mynamespace-myindexname.

% curl -XPUT --key /path/to/mycertificate.pkey --cert /path/to/mycertificate.crt --cacert  /path/to/elias.ca -H 'Content-Type: application/json' 'https://elias-${CLUSTERNAME}.cc.in2p3.fr:9200/mynamespace-myindexname/_mapping' -d @"my_mapping.json"
my_mapping.json file content
{
    "dynamic_templates": [
      { -
        "template_stdField": {
          "path_match": "*",
          "mapping": {
            "ignore_malformed": true,
            "type": "keyword"
          }
        }
      }
    ],
    "properties": {
      "user_name": {
        "type": "text"
      }
    }
}

Attention

Explicit mapping can reject nonconforming documents with the ignore-malformed option.

Note

For more information on mapping, please refer to the official documentation.

The following command connects to ELIAS by certificate and creates a first document in the kafka-testoal index:

% curl -XPOST 'https://elias-${CLUSTERNAME}.cc.in2p3.fr:9200/kafka-testoal/_doc/1/_create' --key /path/to/mycertificate.pkey --cert /path/to/mycertificate.crt --cacert /path/to/elias.ca -H 'Content-Type: application/json' -d '{ "user_name" : "John Doe" }'

To get a sample of the documents contained in an index, one can use the following command:

% curl -XGET 'https://elias-${CLUSTERNAME}.cc.in2p3.fr:9200/kafka-testoal//_search?pretty' --key /path/to/mycertificate.pkey --cert /path/to/mycertificate.crt --cacert /path/to/elias.ca -H 'Content-Type: application/json'

Note

In the case of an advanced search, please refer to the paragraph Consult your data.

View your data

To consult your data, ELIAS offers different interfaces according to your needs:

With Elasticsearch you may carry out exact searches using filters or fuzzy searches where the criteria can be completely or partially respected.

Example:

searches for documents that must contain the words application and contains.

% curl -GET --key /path/to/mycertificate.pkey --cert /path/to/mycertificate.crt --cacert /path/to/elias.ca -H 'Content-Type: application/json' 'https://elias-${CLUSTERNAME}.cc.in2p3.fr:9200/mynamespace-myindexname/_search' -d @"filter_query.json"
filter_query.json content
{
  "query": {"bool": {
      "must": [
          { "match": { "message": "application" }},
          { "match": { "message": "contains" }}
      ]
    }
  }
}

Elasticsearch implements a query language known as Elasticsearch DSL (Domain Specific Language). This language allows to express queries with complex conditions using logical, comparison and set operators.

Note

The slicing and filtering algorithm in Elasticsearch is customizable for each key. All you have to do is define the mapping appropriately.

Supervise your machine

To monitor your machines you can collect various metrics and logs using respectively:

  • Collectd, a tool allowing to collect various metrics,

  • Fluentbit, a lightweight tool for log management.

Data can then be sent by FluentBit to Elasticsearch to visualize them as shown in the diagram below:

../../../_images/fluentbit.png

Attention

You are responsible for the data you insert into Elasticsearch. Please make sure you have defined the life cycle of your data beforehand.

It is also advisable to create several indices. For example in the use case defined in this documentation it is necessary to create Collectd and FluentBit indices.

Installation

The installation procedure depends on your operating system so you will have to consult Collectd installation documentation and choose the appropriate method.

Configuration

For our example we will use collectd plugins installed by default at installation time.

In order to deploy a collectd plugin, you need to create a configuration file like this in /etc/collectd/plugins.

To use the memory plugin for example, we will create the file /etc/collectd/plugins/memory.conf with the following content:

<Plugin "memory">
ValuesAbsolute true
ValuesPercentage true
</Plugin>

and the configuration file for the network plugin /etc/collectd/plugin/network.conf with the following content:

<Plugin "network">
Server "127.0.0.1" "25826"
</Plugin>

We will redirect the output of the memory plugin to the localhost address on the default port which is 25826.

Next we will check that the lines LoadPlugin memory and LoadPlugin network are present in the file /etc/collectd/collectd.conf.

Then we start the service:

% sudo systemctl restart collectd.service

To check that everything went well we can check the logs with the command:

% journalctl -xe

Finally, to test that the data is exposed on the chosen port we can run the following command:

% nc -ul 25826

To test your configuration files you can use the visualization tool Calyptia.

Example of a configuration with the tail input which works like the tail -f command to read the end of a file and an input which will retrieve the data generated by Collectd:

# This section define various parameters for FluentBit
[SERVICE]
    flush           1
    daemon          Off
    log_level       debug
    log_file        /var/log/fluent-bit.log
    parsers_file    parsers.conf
    plugins_file    plugins.conf
    http_server     Off
    http_listen     0.0.0.0
    http_port       2020
    storage.metrics on

# This section define the input data
[INPUT]
    Name        tail
    Tag         zeppelin.log
    Path        /path/to/my/logfile/zeppelin.log
    Parser       zeppelin

# The filter field allows you to filter data using various modules like here modify which allows you to add the name of the service to the data
[FILTER]
    Name modify
    Match zeppelin.log
    Add service zeppelin

# The ouput section define the destination of data
[OUTPUT]
    Name   stdout
    Match  *

[OUTPUT]
    Name  es
    Match *.log
    Host  elias-beta.cc.in2p3.fr
    Port  9200
    Index indexname
    tls On
    tls.verify Off
    tls.ca_file      /path/to/ca.crt
    tls.crt_file     /path/to/client.crt
    tls.key_file     /path/to/client.key

# The @INCLUDE command allows you to include the content of another configuration file
@INCLUDE /etc/fluent-bit/collectd.conf