Backup and restore ElasticSearch data using GCS

You don’t know what you got until it’s gone. And unfortunately it’s the same with data.

The importance of backup is mostly realised after loosing the data. After a few data incidents, I have made backups of all my ElasticSearch clusters. I have also made it a habit to create backups and then go ahead with using ES cluster to store data.

In this article I will be using Google Cloud Storage (GCS) buckets for backup. If you are looking to backup ElasticSearch data on AWS S3 instead of Google GCS, then this Medium article will help you.

Requirements

For this article, you would require the following:

  1. ElasticSearch instance which we are interested to take backup
  2. GCP account which allows you to create service accounts, add IAM roles and create GCS buckets.

Backup ElasticSearch data to GCS

1. Create a GCS bucket. Let’s call it backup_bucket.

2. Create a GCP service account with storage.objectAdmin permission. As it will be a user managed service account, create a JSON private key and download it. Let’s say the file to be service-account-es-backup.json.

3. Add the service account credentials to ElasticSearch KeyStore. The binary for this is usually at the location /usr/share/elasticsearch/bin. If not then locate it on the ES installation directory,

./elasticsearch-keystore add-file gcs.client.default.credentials_file service-account-es-backup.json

After successfully adding the key, you would no longer need the JSON file. Feel free to delete the file from the system.

4. Install GCS backup plugin

./elasticsearch-plugin install repository-gcs

Before creating your first backup, older versions of ElasticSearch would require a restart.

5. Create a backup repository. In ES terminology, a backup repository is a collection of all the snapshots of your backup.

curl -X PUT -H "Content-Type: application/json" -d '{
  "type": "gcs",
  "settings":
  {
    "bucket": "backup_bucket",
    "base_path": "es_backup"
  }
}' localhost:9200/_snapshot/backup

In the above command, we are creating a backup repository called backup and store the snapshots on a GCS bucket called backup_bucket inside the directory es_backup.

You might think that we made our first backup of the data, but we haven’t. We have just created a backup repository to hold snapshots (if any) in future.

6. Taking your first backup:

curl -X PUT "localhost:9200/_snapshot/backup/first_snapshot"

In order to have a regular backups, use an identifiable snapshot name. In my case, I append date with the snapshot.

curl -X PUT "localhost:9200/_snapshot/backup/snapshot_`date +'%Y_%m_%d'`"

If you are using a cron job to trigger snapshot process, don’t forget to add slash (\) before %. If not, the cron command would fail.

curl -X PUT "localhost:9200/_snapshot/backup/snapshot_`date +'\%Y_\%m_\%d'`"

Note: A good rule of thumb for backup would be to use individual buckets for backups of different ES clusters (or any other data). It should not be the situation where you put all your backup eggs in one bucket and sometime in the future you loose the bucket. 🤦‍♂️

Restore ElasticSearch data from GCS

Taking a data backup without knowing how to restore it is dangerous. In some cases, it might be as bad as not taking backups.

There can be two situations for restoration process:

  1. Some data was deleted in the existing cluster and you want to restore it to the previous version.
  2. The entire cluster is inaccessible (due to disk corruption / any other strange reasons) and you want to restore the entire backup data to a brand new cluster.

Case 1: Restoring to the same cluster

As the backup is already set up, you could directly restore using the following command.

curl -X POST 'localhost:9200/_snapshot/backup/first_snapshot/_restore'

If you have multiple snapshots, then select the relevant snapshot ID from the snapshot list.

curl 'localhost:9200/_cat/snapshots/backup?v&s=id'

Case 2: Restoring to a brand new cluster

1. Install the same version of ElasticSearch (to be safe from data incompatibility issues). Install the GCS backup plugin.

I would show you how to restore the data on a ES cluster running on Docker. If you want to restore on ES cluster outside Docker containers, you could directly execute the commands which are run inside the container.

docker pull elasticsearch:VERSION
docker network create restore-network
docker run -d --name elasticsearch --net restore-network -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:VERSION

docker exec -it elasticsearch /bin/bash

# inside the docker container

cd /usr/share/elasticsearch/bin
./elasticsearch-plugin install repository-gcs

2. Create a Google service account that has read only access to the backup bucket. Copy it to the container (or write the contents to a file using nano / vi editor). Let’s call the file service-account-es-restore.json. Add it to the ElasticSearch like we did in the backup process.

./elasticsearch-keystore add-file gcs.client.default.credentials_file service-account-es-restore.json

3. You might need to restart the container in case of older ElasticSearch version.

docker restart elasticsearch

4. Create a backup repository with the same config (bucket name and base path) as before.

curl -X PUT -H "Content-Type: application/json" -d '{
  "type": "gcs",
  "settings":
  {
    "bucket": "backup_bucket",
    "base_path": "es_backup"
  }
}' localhost:9200/_snapshot/backup

5. List all the snapshots in the backup repository (which is connected to our GCS backup_bucket).

curl 'localhost:9200/_cat/snapshots/backup?v&s=id'

6. Select the latest snapshot ID and restore it.

curl -X POST 'localhost:9200/_snapshot/backup/first_snapshot/_restore'

You now would have successfully restored data from your backup.

Feel free to share this article:
error0

Leave a Reply

Your email address will not be published. Required fields are marked *