GridFTP cluster
servers
There are two servers f01-060-118,120 under a DNS alias lsdf-pilot.gridka.de
they mount gpfs on /export/gka6701
there is a soft link to the logical root: /lsdf->/export/gka6701/lsdf
services
gridftp
gridftp is installed as a rc service and should be up when machines is rebooted.
To start/stop/status manually:
service gridftp start
The package is installed in /opt/globus. If the system disk is lost, please rsync from the system machine
It's config file is:
[root@f01-060-118-e ~]# cat /opt/globus/etc/gridftp.conf detach 1 port 2811 chdir 1 log_level info,warn,error log_single /var/log/gridftp.log disable_usage_stats 1 blocksize 1048576
Note the log file location:
sftp
An account xrootd is used for sftp access. This is a passwordless account. Only access by keys is allowed. Check in it's .ssh
NOTE: Due to request from Michael S., allowed access to the "itguser" account with keys (same keys as in xrootd). Also had to lift the 180 days password validity period which was expired. Changes performed in both lsdf-pilot nodes. Ariel - 2010-08-11
SAMBA
Install
I installed SOFS from IBM:
rpm -ivh samba-client-3.2.7-ctdb.54.2.x86_64.rpm ctdb-1.0-69.x86_64.rpm samba-common-3.2.7-ctdb.54.2.x86_64.rpm samba-3.2.7-ctdb.54.2.x86_64.rpm
[root@f01-060-118-e ~]# rpm -qa | egrep "samba|ctdb" samba-client-3.2.7-ctdb.54.2 ctdb-1.0-69 samba-common-3.2.7-ctdb.54.2 samba-3.2.7-ctdb.54.2
Config
Config file /etc/samba/smb.conf. Change the following:
workgroup = WORKGROUP security = user passdb backend = tdbsam
Added a unix user, added him to samba database
useradd itguser smbpasswd -a itguser
Added a readonly user - in the same group
useradd -g 28200 itgread smbpasswd -a itgread
Home directories are shared by default
[lsdfsink] path = /lsdf/sink comment = LSDF data sink browseable = yes writable = yes
Make sure the path is writable to itguser and that all segments of the path a readable by it! Don't forget to restart.
Operate
service smb start
This launches smbd daemon listening on ports TCP: 445 and 139
client operations
smbclient -U itguser //f01-060-120/HOMES
Also mounting
[root@f01-060-120-e ~]# mkdir /lsdf/smtestmount [root@f01-060-120-e ~]# mount -t cifs //f01-060-118-e/LSDFSINK /lsdf/smtestmount --verbose -o user=itguser parsing options: rw,user=itguser Password:
mount.cifs kernel mount options unc=//f01-060-118-e\LSDFSINK,pass=itgitg,ver=1,rw,user=itguser
Now transfer test via this mountpoint
[root@f01-060-120-e ~]# dd if=/dev/zero of=/lsdf/smtestmount/test bs=1M count=20000 20000+0 records in 20000+0 records out 20971520000 bytes (21 GB) copied, 93.5397 seconds, 224 MB/s
CTDB
- /etc/sysconfig/ctdb
CTDB_RECOVERY_LOCK="/lsdf/ctdb/lock"
- /etc/ctdb/nodes
10.65.60.118 10.65.60.120
- /etc/samba/smb.conf
Add :
include /etc/samba/clustering.conf
- /etc/samba/clustering.conf
clustering = yes idmap backend = tdb2 fileid:mapping = fsname vfs objects = gpfs fileid gpfs:sharemodes = No force unknown acl user = yes nfs4: mode = special nfs4: chown = yes nfs4: acedup = merge
Operate CTDB
service ctdb start
# ctdb status Number of nodes:2 pnn:0 10.65.60.118 OK pnn:1 10.65.60.120 OK (THIS NODE) Generation:1898793932 Size:2 hash:0 lmaster:0 hash:1 lmaster:1 Recovery mode:NORMAL (0) Recovery master:0
Samba monitoring
Small file rate monitoring with the following programm as cron job on hdp0 (141.52.97.23):
#!/usr/bin/perl -w #time for i in `seq 100 105`; do date; dd if=/dev/zero of=/mnt/lsdf/test-artem/test-small-${i} bs=2516582 count=1 ; done # time for i in `seq 100 105`; do rm /mnt/lsdf/test-artem/test-small-${i} ; done use Time::HiRes qw(gettimeofday); my $dir = "/mnt/lsdf/test-artem/"; my $base = "test-small-"; my $size = 2516582; my $rate_av=0; my $attempts = 5; my $count_start = int(rand(100)); my $count_end = $count_start + $attempts; foreach $i ($count_start .. $count_end) { my $t_start = gettimeofday; system ("dd if=/dev/zero of=$dir/${base}-$i bs=$size count=1"); my $t_end = gettimeofday; my $elapsed = $t_end - $t_start; my $rate = $size/$elapsed; print "Transferred $i in $elapsed seconds, rate ", $rate/1024/1024, " MB/s \n"; $rate_av += $rate; } foreach $i (1 .. $attempts) { unlink "$dir/${base}-$i"; } $rate_av /= $attempts; my $rate_av_mbs=sprintf("%.1f",$rate_av/1024/1024); print "Average rate ",$rate_av_mbs,"\n"; my $cmd = "gmetric --name=\"Small file transfer rate\" --value=\"$rate_av_mbs\" --type=\"float\" --units=\"MB/s\""; print "CMD $cmd\n"; system "$cmd";
Crontab:
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /root/small-file-test.pl > /tmp/small.log 2>&1
Ganglia page: http://iwrcgmon.fzk.de/ganglia/?r=day&c=SERVER&h=null-00e0812a1429.ka.fzk.de
GPFS
A trick to be explored - let's add a node at ITG to the LSDF-pilot gpfs cluster and see what kind of rate we can get. We can install samba on that node, this will bring latency down.
First we configure ssh on the hdp8 node.
Then on the f01-060-118:
mmaddnode 141.52.97.34
Unfortunately this cluster is in proviate gridka net, so can't easily extend it with an external node.
Changing cluster IP adresses
Here is how we changed IP addresses:
service smb stop service gridftp stop service ctdb stop mmshutdown -a mmchnode --admin-interface=f01-060-120-e.gridka.de --daemon-interface=f01-060-120-e.gridka.de -N f01-060-120 ...
Then the other node, and restart everything.
GSI auth stuff
CA
The hosts' CA and gridmap file is managed by cfengine in the standard location.
gridmapfile
However we are not yet using the standard gridmpa file, because:
- we still didn't come up with golden distro for gridftp that contains voms mapping (work in progress)
- Users are not all in registered in the VO.
Therefore we are using static gridmap file, for which we set a corresponding variable in the /etc/init.d/gridftp:
export GRIDMAP=/etc/grid-security/grid-mapfile-static
It's content as of August 2010
"/C=DE/O=GermanGrid/OU=FZK/CN=Artem Trunov" itguser "/C=DE/O=GermanGrid/OU=FZK/CN=f01-031-141-e.gridka.de" itguser "/C=DE/O=GermanGrid/OU=FZK/CN=f01-031-133-e.gridka.de" itguser "/O=GermanGrid/OU=KIT/CN=Michael Sutter" itguser "/C=DE/O=GermanGrid/OU=KIT/CN=Michael Sutter" itguser "/C=DE/O=GermanGrid/OU=FZK/CN=Thomas Jejkal" itguser "/C=DE/O=GermanGrid/OU=KIT/CN=Volker Hartmann" itguser "/O=GermanGrid/OU=Uni Karlsruhe/CN=Armin Scheurer" itguser "/C=DE/O=GridGermany/OU=Universitaet Heidelberg/CN=Dr. Marc Hemberger" itguser "/C=PL/O=GRID/O=Cyfronet/CN=Lukasz Flis - OPS" itguser "/C=DE/O=GermanGrid/OU=FZK/CN=Jens Otte" itguser "/C=DE/O=GermanGrid/OU=FZK/CN=Masanari Takamiya" itguser "/C=DE/O=GermanGrid/OU=FZK/CN=Rainer Stotzka" itguser "/C=DE/O=GermanGrid/OU=FZK/CN=Ruediger Rudolf" itguser "/C=DE/O=GermanGrid/OU=FZK/CN=Francesca Rindone" itguser "/C=DE/O=GermanGrid/OU=KIT/CN=Sven Brand" itguser "/C=DE/O=GermanGrid/OU=KIT/CN=Wolfgang Mexner" itguser "/C=DE/O=GermanGrid/OU=KIT/CN=Simon Ochsenreither" itguser "/C=DE/O=GermanGrid/OU=KIT/CN=David Haas" itguser "/C=DE/O=GermanGrid/OU=FZK/CN=Michael Goetter" itguser "/C=DE/O=GermanGrid/OU=KIT/CN=Patrick Neuberger" itguser
user mapping
Yes, all static to local user xrootd. To be further though out.
Troubleshoting
If one gets a auth error, it's most likely CA are outdated (could be on a client too)
If transfer is hanging, it's either firewall rulles suddenly changed or a client is not enforcing globus port range of 20000,25000.