The goal of this post is to provide small footprint object storage CEPH cluster with high resiliency and no SPOF. We will build it with OpenSVC, Ceph, Gandi and Puppet.
Thanks to OVH HA Load Balancing usecases available on ovh.com.
CEPH software (MON and OSD) has no SPOF but RADOSGW can be one if you install only one instance. We can manage several instances and balance S3 or Openstack Swift requests with RR DNS (easy way).
I will explain how to deploy CEPH cluster with 5 Gandi small VM :
- 1 VCPU
- 512MB of RAM
- 2x5GB disks
- Base linux modules,
- SSH keys deployment,
- Gandi configuration customisation,
- OpenSVC packages and collector register,
- Ceph packages,
- Ceph-deploy deployment tools (you can do it with Puppet),
- GitHub CEPH dashboard (https://github.com/Crapworks/ceph-dash),
- RadosgGW admin tools to create S3 and Openstack Swift keys (https://github.com/fmonthel/radosgw-admin-mng-tools),
I will have a look on several Object and/or ScaleOut storage solution that can be installed on Gandi VMs (ScaleIO, Scality ?, Hedvig ?).
First at all, deploy 5 Gandi VMS on Ubuntu 14.04 64 bits LTS (HVM) and create 10 virtual disks of 5GB each (will be OSD). You can use GANDI CLI API :
I developped Puppet module to install and setup :
Administration OpenSVC service
We will create administration OpenSVC service configured as failover on 2 nodes. This service named "cephtstadm01.flox-arts.net" will host :
This administration service is configured as below and persistent data are stored on one Gandi disk (thanks to OpenSVC to manage Gandi disks) :
root@srvusceph01:/opt/opensvc/etc# cat cephtstadm01.flox-arts.net.env [DEFAULT] app = FLA comment = FLA CEPH ADM service mode = hosted cluster_type = failover service_type = TST nodes = srvusceph01.flox-arts.net srvusceph02.flox-arts.net autostart_node = srvusceph01.flox-arts.net [vg#0] type = gandi cloud_id = cloud#0 name@nodes = cephtstadm01 node@srvusceph01.flox-arts.net = srvusceph01 node@srvusceph02.flox-arts.net = srvusceph02
I don't explain here the way to install Ceph-deploy and Ceph Dash.
Service is UP and RUNNING :
Ceph Monitor OpenSVC Flex service
OpenSVC can provide service named Flex. Flex service can be available on several nodes in the same time. It's perfect for OSD et MON processes.
You can manage FLEX service with min/max nodes availability to avoid failure domain break : http://www.opensvc.com/init/news/show?news_id=17
From now we deployed CEPH Monitor (done with ceph-deploy) on the nodes and will create OpenSVC Flex service that will be UP and RUNNING in the same time on the 5 nodes.
This monitor service is configured as below :
root@srvusceph01:/opt/opensvc/etc# cat cephtstmon01.flox-arts.net.env [DEFAULT] app = FLA comment = FLA CEPH MON service mode = hosted cluster_type = flex service_type = TST nodes = srvusceph01.flox-arts.net srvusceph02.flox-arts.net srvusceph03.flox-arts.net srvusceph04.flox-arts.net srvusceph05.flox-arts.net autostart_node = srvusceph01.flox-arts.net srvusceph02.flox-arts.net srvusceph03.flox-arts.net srvusceph04.flox-arts.net srvusceph05.flox-arts.net flex_primary = srvusceph01.flox-arts.net
Do it on srvusceph01.flox-arts.net node. OpenSVC syncall command will deploy service on the whole cluster. I created too, OpenSVC startup script to start/stop/check CEPH monitor processes :
root@srvusceph01:/opt/opensvc/etc/cephtstmon01.flox-arts.net.d# ls -l total 4 lrwxrwxrwx 1 root root 8 Jun 21 17:01 C01ceph_mon -> ceph_mon lrwxrwxrwx 1 root root 8 Jun 21 17:01 K01ceph_mon -> ceph_mon lrwxrwxrwx 1 root root 8 Jun 21 17:01 S01ceph_mon -> ceph_mon -rwxr-xr-x 1 root root 1074 Jun 21 17:00 ceph_mon
Service is UP and RUNNING on the 5 nodes, have look on OpenSVC collector :
Ceph OSD OpenSVC Flex service
I don't explain here how to create OSD (done with ceph-deploy) but this cluster is populated with 10 OSD disks (2 per node).
We will create FLEX service that will run on 5 nodes. (we can play with flex-min-node to be compliant with Erasure Coding configuration)
This OSD service is configured as below :
root@srvusceph01:/opt/opensvc/etc# cat cephtstosd01.flox-arts.net.env [DEFAULT] app = FLA comment = FLA CEPH OSD service mode = hosted cluster_type = flex service_type = TST nodes = srvusceph01.flox-arts.net srvusceph02.flox-arts.net srvusceph03.flox-arts.net srvusceph04.flox-arts.net srvusceph05.flox-arts.net autostart_node = srvusceph01.flox-arts.net srvusceph02.flox-arts.net srvusceph03.flox-arts.net srvusceph04.flox-arts.net srvusceph05.flox-arts.net flex_primary = srvusceph01.flox-arts.net
Like monitor service, I created OpenSVC startup script to start/stop/check CEPH OSD processes
root@srvusceph01:/opt/opensvc/etc/cephtstosd01.flox-arts.net.d# ls -l total 4 lrwxrwxrwx 1 root root 8 Jun 21 17:05 C01ceph_osd -> ceph_osd lrwxrwxrwx 1 root root 8 Jun 21 17:05 K01ceph_osd -> ceph_osd lrwxrwxrwx 1 root root 8 Jun 21 17:05 S01ceph_osd -> ceph_osd -rwxr-xr-x 1 root root 1074 Jun 21 17:05 ceph_osd
Service is UP and RUNNING on the 5 nodes, have look on OpenSVC collector :
Ceph RadosGW / Apache OpenSVC Flex service
The most complicated part. To do it, I deployed with Puppet module :
- Apache configuration, module and vhosts HTTP and HTTPS for RadosGW objtmp01.flox-arts.net alias,
- SSL certificates on objtmp01.flox-arts.net vhosts (auto-sign),
- RadosGW keys (1 per Radosgw) and registration on CEPH cluster (ceph auth),
- RadosGW module installation,
Here vhost configuration of objtmp01.flox-arts.net (will be deployed on each node) :
<VirtualHost *:80> ServerName srvusceph01.flox-arts.net ServerAlias objtmp01.flox-arts.net ServerAlias *.objtmp01.flox-arts.net DocumentRoot /var/www/html RewriteEngine On RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] SetEnv proxy-nokeepalive 1 ProxyPass / fcgi://localhost:9000/ ErrorLog /var/log/apache2/error.log CustomLog /var/log/apache2/access.log combined ServerSignature Off </VirtualHost> <VirtualHost *:443> ServerName srvusceph01.flox-arts.net ServerAlias objtmp01.flox-arts.net ServerAlias *.objtmp01.flox-arts.net DocumentRoot /var/www/html RewriteEngine On RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] SetEnv proxy-nokeepalive 1 ProxyPass / fcgi://localhost:9000/ ErrorLog /var/log/apache2/error.log CustomLog /var/log/apache2/access.log combine ServerSignature Off SSLEngine on SSLCertificateFile /etc/apache2/ssl/apache.crt SSLCertificateKeyFile /etc/apache2/ssl/apache.key </VirtualHost>
Let's have a look on CEPH RadosGW configuration part (ceph.conf) of srvusceph05 node :
[client.radosgw.srvusceph05] host = srvusceph05 keyring = /etc/ceph/ceph.client.radosgw.keyring rgw socket path = "" log file = /var/log/radosgw/client.radosgw.gateway.log rgw frontends = fastcgi socket_port=9000 socket_host=0.0.0.0 rgw print continue = false rgw enable usage log = true rgw usage log tick interval = 30 rgw usage log flush threshold = 1024 rgw usage max shards = 32 rgw usage max user shards = 1
Like for MON and OSD, RadosGW and Apache service are managed with Flex OpenSVC service :
root@srvusceph01:/opt/opensvc/etc# cat cephtstrgw01.flox-arts.net.env [DEFAULT] app = FLA comment = FLA CEPH RGW service mode = hosted cluster_type = flex service_type = TST nodes = srvusceph01.flox-arts.net srvusceph02.flox-arts.net srvusceph03.flox-arts.net srvusceph04.flox-arts.net srvusceph05.flox-arts.net autostart_node = srvusceph01.flox-arts.net srvusceph02.flox-arts.net srvusceph03.flox-arts.net srvusceph04.flox-arts.net srvusceph05.flox-arts.net flex_primary = srvusceph01.flox-arts.net
Service is UP and RUNNING on the 5 nodes, OpenSVC service mapping start to be fuzzy :) :
Round Robin DNS to loadbalance RadosGW / Apache
To do it, create CNAME registry for objtmp01.flox-arts.net (here Gandi) to 5 servers :
We can't have TTL < 5 minutes with Gandi :(
To manage WEB server failure, you can have Monitor script (Nagios) to add or remove Ceph server from CNAME objtmp01.flox-arts.net with GANDI API. Moreover you can have also have startup script in cephtstrgw01 service that register/deregister CEPH servers from CNAME objtmp01.flox-arts.net.
Like this OVH usecase for sofoot.com, you can reinforce this setup with HAproxy loadbalancers (OpenSource) just behind RR-DNS and in front of RadosGW webservers (be careful with network throughput of HAproxy servers on object storage) in order to manage closely web services.
Enjoy :)
Enjoy CEPH Object Storage cluster. Here (GUI Amazon S3 CyberDuck) binaries of EMC ScaleIO (free usage) - Next ScaleOut tool to install :
Bye !
- OpenSVC services http://www.opensvc.com
- CEPH software-defined storage ceph.com
- Gandi VPS Cloud Hosting http://www.gandi.net/hebergement
- Puppet automation tools https://puppetlabs.com
- GitHub CEPH Dash https://github.com/Crapworks/ceph-dash
- HAProxy high performance Load Balancer http://www.haproxy.org
- OVH usecases HAProxy + RR-DNS https://www.ovh.com/ca/en/community/usercase/