GridFTP cluster

From Lsdf

These pages are obsolete

servers

There are two servers f01-060-118,120 under a DNS alias lsdf-pilot.gridka.de

they mount gpfs on /export/gka6701

there is a soft link to the logical root: /lsdf->/export/gka6701/lsdf

services

gridftp

gridftp is installed as a rc service and should be up when machines is rebooted.

To start/stop/status manually:

service gridftp start

The package is installed in /opt/globus. If the system disk is lost, please rsync from the system machine

It's config file is:


[root@f01-060-118-e ~]# cat /opt/globus/etc/gridftp.conf
detach 1
port 2811
chdir 1
log_level info,warn,error
log_single /var/log/gridftp.log
disable_usage_stats 1
blocksize 1048576

Note the log file location:

sftp

An account xrootd is used for sftp access. This is a passwordless account. Only access by keys is allowed. Check in it's .ssh

NOTE: Due to request from Michael S., allowed access to the "itguser" account with keys (same keys as in xrootd). Also had to lift the 180 days password validity period which was expired. Changes performed in both lsdf-pilot nodes. Ariel - 2010-08-11

SAMBA

Install

I installed SOFS from IBM:

rpm -ivh samba-client-3.2.7-ctdb.54.2.x86_64.rpm ctdb-1.0-69.x86_64.rpm samba-common-3.2.7-ctdb.54.2.x86_64.rpm samba-3.2.7-ctdb.54.2.x86_64.rpm
[root@f01-060-118-e ~]# rpm -qa | egrep "samba|ctdb"
samba-client-3.2.7-ctdb.54.2
ctdb-1.0-69
samba-common-3.2.7-ctdb.54.2
samba-3.2.7-ctdb.54.2

Config

Config file /etc/samba/smb.conf. Change the following:

workgroup = WORKGROUP
security = user
passdb backend = tdbsam

Added a unix user, added him to samba database

useradd itguser
smbpasswd -a itguser

Added a readonly user - in the same group

useradd -g 28200 itgread
smbpasswd -a itgread

Home directories are shared by default

Config LSDF share

[lsdfsink]
  path = /lsdf/sink
  comment = LSDF data sink
  browseable = yes
  writable = yes

Make sure the path is writable to itguser and that all segments of the path a readable by it! Don't forget to restart.

Operate

service smb start

This launches smbd daemon listening on ports TCP: 445 and 139

client operations

smbclient -U itguser //f01-060-120/HOMES

Also mounting

[root@f01-060-120-e ~]# mkdir /lsdf/smtestmount
[root@f01-060-120-e ~]#  mount -t cifs //f01-060-118-e/LSDFSINK /lsdf/smtestmount --verbose -o user=itguser
parsing options: rw,user=itguser
Password:
mount.cifs kernel mount options unc=//f01-060-118-e\LSDFSINK,pass=itgitg,ver=1,rw,user=itguser


Now transfer test via this mountpoint

[root@f01-060-120-e ~]# dd if=/dev/zero of=/lsdf/smtestmount/test bs=1M count=20000
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 93.5397 seconds, 224 MB/s

CTDB

  • /etc/sysconfig/ctdb
CTDB_RECOVERY_LOCK="/lsdf/ctdb/lock"
  • /etc/ctdb/nodes
10.65.60.118
10.65.60.120
  • /etc/samba/smb.conf

Add :

include /etc/samba/clustering.conf
  • /etc/samba/clustering.conf
clustering = yes
idmap backend = tdb2
fileid:mapping = fsname
vfs objects = gpfs fileid
gpfs:sharemodes = No
force unknown acl user = yes
nfs4: mode = special
nfs4: chown = yes
nfs4: acedup = merge

Operate CTDB

service ctdb start
# ctdb status
Number of nodes:2
pnn:0 10.65.60.118     OK
pnn:1 10.65.60.120     OK (THIS NODE)
Generation:1898793932
Size:2
hash:0 lmaster:0
hash:1 lmaster:1
Recovery mode:NORMAL (0)
Recovery master:0

Samba monitoring

Small file rate monitoring with the following programm as cron job on hdp0 (141.52.97.23):

#!/usr/bin/perl -w

#time for i in `seq 100 105`; do date; dd if=/dev/zero of=/mnt/lsdf/test-artem/test-small-${i} bs=2516582 count=1 ; done
#    time for i in `seq 100 105`; do rm /mnt/lsdf/test-artem/test-small-${i} ; done

use Time::HiRes qw(gettimeofday);

my $dir = "/mnt/lsdf/test-artem/";
my $base = "test-small-";

my $size = 2516582;

my $rate_av=0;

my $attempts = 5;

my $count_start = int(rand(100));
my $count_end = $count_start + $attempts;

foreach $i ($count_start .. $count_end) {
    my $t_start = gettimeofday;
    system ("dd if=/dev/zero of=$dir/${base}-$i bs=$size count=1");
    my $t_end = gettimeofday;
    my $elapsed = $t_end - $t_start;

    my $rate = $size/$elapsed;

    print "Transferred $i in $elapsed seconds, rate ", $rate/1024/1024, " MB/s \n";

    $rate_av += $rate;
}

foreach $i (1 .. $attempts) {
    unlink "$dir/${base}-$i";
}

$rate_av /= $attempts;
my $rate_av_mbs=sprintf("%.1f",$rate_av/1024/1024);

print "Average rate ",$rate_av_mbs,"\n";

my $cmd = "gmetric --name=\"Small file transfer rate\" --value=\"$rate_av_mbs\" --type=\"float\" --units=\"MB/s\"";

print "CMD $cmd\n";

system "$cmd";

Crontab:

0,5,10,15,20,25,30,35,40,45,50,55       *       *       *       *       /root/small-file-test.pl > /tmp/small.log 2>&1

Ganglia page: http://iwrcgmon.fzk.de/ganglia/?r=day&c=SERVER&h=null-00e0812a1429.ka.fzk.de

GPFS

A trick to be explored - let's add a node at ITG to the LSDF-pilot gpfs cluster and see what kind of rate we can get. We can install samba on that node, this will bring latency down.

First we configure ssh on the hdp8 node.

Then on the f01-060-118:

mmaddnode 141.52.97.34

Unfortunately this cluster is in proviate gridka net, so can't easily extend it with an external node.

Changing cluster IP adresses

Here is how we changed IP addresses:

service smb stop
service gridftp stop
service ctdb stop
mmshutdown -a
mmchnode --admin-interface=f01-060-120-e.gridka.de --daemon-interface=f01-060-120-e.gridka.de -N f01-060-120
...

Then the other node, and restart everything.

GSI auth stuff

CA

The hosts' CA and gridmap file is managed by cfengine in the standard location.

gridmapfile

However we are not yet using the standard gridmpa file, because:

  • we still didn't come up with golden distro for gridftp that contains voms mapping (work in progress)
  • Users are not all in registered in the VO.

Therefore we are using static gridmap file, for which we set a corresponding variable in the /etc/init.d/gridftp:

export GRIDMAP=/etc/grid-security/grid-mapfile-static

It's content as of August 2010

"/C=DE/O=GermanGrid/OU=FZK/CN=Artem Trunov" itguser
"/C=DE/O=GermanGrid/OU=FZK/CN=f01-031-141-e.gridka.de" itguser
"/C=DE/O=GermanGrid/OU=FZK/CN=f01-031-133-e.gridka.de" itguser
"/O=GermanGrid/OU=KIT/CN=Michael Sutter" itguser
"/C=DE/O=GermanGrid/OU=KIT/CN=Michael Sutter" itguser
"/C=DE/O=GermanGrid/OU=FZK/CN=Thomas Jejkal" itguser
"/C=DE/O=GermanGrid/OU=KIT/CN=Volker Hartmann" itguser
"/O=GermanGrid/OU=Uni Karlsruhe/CN=Armin Scheurer" itguser
"/C=DE/O=GridGermany/OU=Universitaet Heidelberg/CN=Dr. Marc Hemberger" itguser
"/C=PL/O=GRID/O=Cyfronet/CN=Lukasz Flis - OPS" itguser
"/C=DE/O=GermanGrid/OU=FZK/CN=Jens Otte" itguser
"/C=DE/O=GermanGrid/OU=FZK/CN=Masanari Takamiya" itguser
"/C=DE/O=GermanGrid/OU=FZK/CN=Rainer Stotzka" itguser
"/C=DE/O=GermanGrid/OU=FZK/CN=Ruediger Rudolf" itguser
"/C=DE/O=GermanGrid/OU=FZK/CN=Francesca Rindone" itguser
"/C=DE/O=GermanGrid/OU=KIT/CN=Sven Brand" itguser
"/C=DE/O=GermanGrid/OU=KIT/CN=Wolfgang Mexner" itguser
"/C=DE/O=GermanGrid/OU=KIT/CN=Simon Ochsenreither" itguser
"/C=DE/O=GermanGrid/OU=KIT/CN=David Haas" itguser
"/C=DE/O=GermanGrid/OU=FZK/CN=Michael Goetter" itguser
"/C=DE/O=GermanGrid/OU=KIT/CN=Patrick Neuberger" itguser

user mapping

Yes, all static to local user xrootd. To be further though out.

Troubleshoting

If one gets a auth error, it's most likely CA are outdated (could be on a client too)

If transfer is hanging, it's either firewall rulles suddenly changed or a client is not enforcing globus port range of 20000,25000.

SAMBA with domain accounts

LSDF samba with domain authorization