Twitter

  • Reminder 1 year ago, #ChaosMonkey app... Let’s see how CxOs will save @SG_InsideIT with the Smile ;)… https://t.co/SgpvGqo1nd

HA Ceph object cluster : OpenSVC Flex, Gandi & RR-DNS

The goal of this post is to provide small footprint object storage CEPH cluster with high resiliency and no SPOF. We will build it with OpenSVC, Ceph, Gandi and Puppet.
Thanks to OVH HA Load Balancing usecases available on ovh.com.

puppet opensvc ceph gandi ovh

CEPH software (MON and OSD) has no SPOF but RADOSGW can be one if you install only one instance. We can manage several instances and balance S3 or Openstack Swift requests with RR DNS (easy way).

ceph all opensvc

I will explain how to deploy CEPH cluster with 5 Gandi small VM :

  • 1 VCPU
  • 512MB of RAM
  • 2x5GB disks
  • I will have a look on several Object and/or ScaleOut storage solution that can be installed on Gandi VMs (ScaleIO, Scality ?, Hedvig ?).

    First at all, deploy 5 Gandi VMS on Ubuntu 14.04 64 bits LTS (HVM) and create 10 virtual disks of 5GB each (will be OSD). You can use GANDI CLI API :

    gandi nodes ceph

    I developped Puppet module to install and setup :

    • Base linux modules,
    • SSH keys deployment,
    • Gandi configuration customisation,
    • OpenSVC packages and collector register,
    • Ceph packages,

    Administration OpenSVC service

    We will create administration OpenSVC service configured as failover on 2 nodes. This service named "cephtstadm01.flox-arts.net" will host :

    ceph dash

    This administration service is configured as below and persistent data are stored on one Gandi disk (thanks to OpenSVC to manage Gandi disks) :

    root@srvusceph01:/opt/opensvc/etc# cat cephtstadm01.flox-arts.net.env 
    [DEFAULT]
    app = FLA
    comment = FLA CEPH ADM service
    mode = hosted
    cluster_type = failover
    service_type = TST
    nodes =  srvusceph01.flox-arts.net srvusceph02.flox-arts.net
    autostart_node = srvusceph01.flox-arts.net
    
    [vg#0]
    type = gandi
    cloud_id = cloud#0
    name@nodes = cephtstadm01
    node@srvusceph01.flox-arts.net = srvusceph01
    node@srvusceph02.flox-arts.net = srvusceph02
    

    I don't explain here the way to install Ceph-deploy and Ceph Dash.
    Service is UP and RUNNING :

    ceph admin opensvc

    Ceph Monitor OpenSVC Flex service

    OpenSVC can provide service named Flex. Flex service can be available on several nodes in the same time. It's perfect for OSD et MON processes.

    You can manage FLEX service with min/max nodes availability to avoid failure domain break : http://www.opensvc.com/init/news/show?news_id=17

    From now we deployed CEPH Monitor (done with ceph-deploy) on the nodes and will create OpenSVC Flex service that will be UP and RUNNING in the same time on the 5 nodes.

    This monitor service is configured as below :

    root@srvusceph01:/opt/opensvc/etc# cat cephtstmon01.flox-arts.net.env 
    [DEFAULT]
    app = FLA
    comment = FLA CEPH MON service
    mode = hosted
    cluster_type = flex
    service_type = TST
    nodes =  srvusceph01.flox-arts.net srvusceph02.flox-arts.net srvusceph03.flox-arts.net srvusceph04.flox-arts.net srvusceph05.flox-arts.net
    autostart_node = srvusceph01.flox-arts.net srvusceph02.flox-arts.net srvusceph03.flox-arts.net srvusceph04.flox-arts.net srvusceph05.flox-arts.net
    flex_primary = srvusceph01.flox-arts.net
    

    Do it on srvusceph01.flox-arts.net node. OpenSVC syncall command will deploy service on the whole cluster. I created too, OpenSVC startup script to start/stop/check CEPH monitor processes :

    root@srvusceph01:/opt/opensvc/etc/cephtstmon01.flox-arts.net.d# ls -l
    total 4
    lrwxrwxrwx 1 root root    8 Jun 21 17:01 C01ceph_mon -> ceph_mon
    lrwxrwxrwx 1 root root    8 Jun 21 17:01 K01ceph_mon -> ceph_mon
    lrwxrwxrwx 1 root root    8 Jun 21 17:01 S01ceph_mon -> ceph_mon
    -rwxr-xr-x 1 root root 1074 Jun 21 17:00 ceph_mon
     

    Service is UP and RUNNING on the 5 nodes, have look on OpenSVC collector :

    ceph mon opensvc

    Ceph OSD OpenSVC Flex service

    I don't explain here how to create OSD (done with ceph-deploy) but this cluster is populated with 10 OSD disks (2 per node).
    We will create FLEX service that will run on 5 nodes. (we can play with flex-min-node to be compliant with Erasure Coding configuration)

    This OSD service is configured as below :

    root@srvusceph01:/opt/opensvc/etc# cat cephtstosd01.flox-arts.net.env 
    [DEFAULT]
    app = FLA
    comment = FLA CEPH OSD service
    mode = hosted
    cluster_type = flex
    service_type = TST
    nodes =  srvusceph01.flox-arts.net srvusceph02.flox-arts.net srvusceph03.flox-arts.net srvusceph04.flox-arts.net srvusceph05.flox-arts.net
    autostart_node = srvusceph01.flox-arts.net srvusceph02.flox-arts.net srvusceph03.flox-arts.net srvusceph04.flox-arts.net srvusceph05.flox-arts.net
    flex_primary = srvusceph01.flox-arts.net
    

    Like monitor service, I created OpenSVC startup script to start/stop/check CEPH OSD processes

    root@srvusceph01:/opt/opensvc/etc/cephtstosd01.flox-arts.net.d# ls -l
    total 4
    lrwxrwxrwx 1 root root    8 Jun 21 17:05 C01ceph_osd -> ceph_osd
    lrwxrwxrwx 1 root root    8 Jun 21 17:05 K01ceph_osd -> ceph_osd
    lrwxrwxrwx 1 root root    8 Jun 21 17:05 S01ceph_osd -> ceph_osd
    -rwxr-xr-x 1 root root 1074 Jun 21 17:05 ceph_osd
    

    Service is UP and RUNNING on the 5 nodes, have look on OpenSVC collector :

    ceph mon opensvc

    Ceph RadosGW / Apache OpenSVC Flex service

    The most complicated part. To do it, I deployed with Puppet module :

    • Apache configuration, module and vhosts HTTP and HTTPS for RadosGW objtmp01.flox-arts.net alias,
    • SSL certificates on objtmp01.flox-arts.net vhosts (auto-sign),
    • RadosGW keys (1 per Radosgw) and registration on CEPH cluster (ceph auth),
    • RadosGW module installation,

    Here vhost configuration of objtmp01.flox-arts.net (will be deployed on each node) :

    <VirtualHost *:80>
    	ServerName srvusceph01.flox-arts.net
    	ServerAlias objtmp01.flox-arts.net
    	ServerAlias *.objtmp01.flox-arts.net
    	DocumentRoot /var/www/html
    	RewriteEngine On
    	RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
    	SetEnv proxy-nokeepalive 1
    	ProxyPass / fcgi://localhost:9000/
    	ErrorLog /var/log/apache2/error.log
    	CustomLog /var/log/apache2/access.log combined
    	ServerSignature Off
    </VirtualHost>
    <VirtualHost *:443>
    	ServerName srvusceph01.flox-arts.net             
    	ServerAlias objtmp01.flox-arts.net
    	ServerAlias *.objtmp01.flox-arts.net
    	DocumentRoot /var/www/html
    	RewriteEngine On
    	RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
    	SetEnv proxy-nokeepalive 1
    	ProxyPass / fcgi://localhost:9000/
    	ErrorLog /var/log/apache2/error.log
    	CustomLog /var/log/apache2/access.log combine
    	ServerSignature Off
    	SSLEngine on
    	SSLCertificateFile /etc/apache2/ssl/apache.crt
    	SSLCertificateKeyFile /etc/apache2/ssl/apache.key
    </VirtualHost>
    

    Let's have a look on CEPH RadosGW configuration part (ceph.conf) of srvusceph05 node :

    [client.radosgw.srvusceph05]
    host = srvusceph05
    keyring = /etc/ceph/ceph.client.radosgw.keyring
    rgw socket path = ""
    log file = /var/log/radosgw/client.radosgw.gateway.log
    rgw frontends = fastcgi socket_port=9000 socket_host=0.0.0.0
    rgw print continue = false
    rgw enable usage log = true
    rgw usage log tick interval = 30
    rgw usage log flush threshold = 1024
    rgw usage max shards = 32
    rgw usage max user shards = 1
    

    Like for MON and OSD, RadosGW and Apache service are managed with Flex OpenSVC service :

    root@srvusceph01:/opt/opensvc/etc# cat cephtstrgw01.flox-arts.net.env
    [DEFAULT]
    app = FLA
    comment = FLA CEPH RGW service
    mode = hosted
    cluster_type = flex
    service_type = TST
    nodes =  srvusceph01.flox-arts.net srvusceph02.flox-arts.net srvusceph03.flox-arts.net srvusceph04.flox-arts.net srvusceph05.flox-arts.net
    autostart_node = srvusceph01.flox-arts.net srvusceph02.flox-arts.net srvusceph03.flox-arts.net srvusceph04.flox-arts.net srvusceph05.flox-arts.net
    flex_primary = srvusceph01.flox-arts.net
    

    Service is UP and RUNNING on the 5 nodes, OpenSVC service mapping start to be fuzzy :) :

    ceph all opensvc

    Round Robin DNS to loadbalance RadosGW / Apache

    To do it, create CNAME registry for objtmp01.flox-arts.net (here Gandi) to 5 servers :

    ceph all opensvc

    We can't have TTL < 5 minutes with Gandi :(

    To manage WEB server failure, you can have Monitor script (Nagios) to add or remove Ceph server from CNAME objtmp01.flox-arts.net with GANDI API. Moreover you can have also have startup script in cephtstrgw01 service that register/deregister CEPH servers from CNAME objtmp01.flox-arts.net.

    Like this OVH usecase for sofoot.com, you can reinforce this setup with HAproxy loadbalancers (OpenSource) just behind RR-DNS and in front of RadosGW webservers (be careful with network throughput of HAproxy servers on object storage) in order to manage closely web services.

    Enjoy :)

    Enjoy CEPH Object Storage cluster. Here (GUI Amazon S3 CyberDuck) binaries of EMC ScaleIO (free usage) - Next ScaleOut tool to install :

    ceph object storage example

    Bye !

Partager


Commenter