Ceph Replication, Compression, and Erasure Coding


Ceph is naturally resilient to data loss. It accomplishes this by
replicating the same set of data several times. In other words, Ceph stores
multiple copies of the same set of data. Data is either replicated across
several Object Storage Daemons (OSDs) or is replicated using erasure coding.

This guide will explain a bit how Ceph has been configured for Private Cloud
Core and also cover adjusting the number of replicated OSDs per pool.


Ceph Replication

With Private Cloud Core, Ceph has been distributed across each hard drive on
each hardware node and the data in each pool is being replicated across three
OSDs.

The configuration is put into place using Ceph Ansible and the repository
for this software is located in GitHub. Use of
Ceph Ansible assumes a base understanding of how Ansible works.

Ceph has a concept of pools and there are typically several pools that store
different sets of data. Each pool can be configured with different replication
options, such as erasure coding for one or some pools and other pools can be
set to have multiple OSDs. More on the options for configuring pools can be
found in Ceph’s documentation.

Another way to replicate data in Ceph is to use erasure coding. This is a
software approach to data replication that uses less disk space, but has more
overhead in terms of CPU and RAM usage. See the InMotion Hosting support center
for more information on this subject.

With erasure coding, this replication type must be set upon creation of the
Ceph pool and cannot be changed after a pool has been created. If you have a
set of data replicated using erasure coding and want to change how data is
replicated, a pool with the replication strategy intended needs to be created
then the data would be migrated to that pool.

Here’s an example of the pools you may see in the Ceph cluster using the ceph
osd lspools
ceph CLI command:

$ ceph osd lspools
1 device_health_metrics
2 images
3 volumes
4 vms
5 backups
6 metrics
7 manila_data
8 manila_metadata
9 .rgw.root
10 default.rgw.log
11 default.rgw.control
12 default.rgw.meta
13 default.rgw.buckets.index

See the documentation from Ceph for
details on how Ceph handles data durability, otherwise known as data
replication.


Setup replication across 2 OSDs

Let’s say you wanted the volumes pool to have replication between two
OSDs. The command to set that up would look like this:

$ ceph osd pool set volumes size 2
set pool 3 size to 2

Setup replication across 3 OSDs

The recommended number of replicated OSDs is three and to set that for the
volumes pool, this command needs to be used:

$ ceph osd pool set volumes size 3
set pool 3 size to 3

Erasure Coding

For information on erasure coding in general, see the InMotion Hosting Support Center .

NW
Nick West Systems Engineer

Nick is an avid aggressive inline skater, nature enthusiast, and loves working with open source software in a Linux environment.

More Articles by Nick

Was this article helpful? Let us know!