hidden:System Logs (HPSS, TSM): Difference between revisions

Revision as of 18:52, 10 April 2017

Format of the entries

date: service [hpss|tsm|druva]: admin name: record: link to further info in gridka or lsdf wiki

10.04.2017: Config changes for the archive-sftp-[0-6] hosts. Most importantly all hosts keys were synced. Users of sftp complianed about wrong fingerprints after the new hosts were added to the production
06.04.2017: TSM_LSDF: MB: Laufwerk 579004003031 1,0,1,7  ausgetauscht am 06.04.2017 Neue SN:579004007136 T10000D
03.04.2017: GridKa: MB: Laufwerk 576004002668 2,1,1,8  ausgetauscht am 03.04.2017 Neue SN:576004001220 T10000C
01.04.2017: hpss prod and hpss test: jvw: cleaning yum db after switching to RH subscription
01.04.2017: hpss-prod:JvW:new hosts archive-sftp-0[456] installed with RH6.8
29.03.2017: hpss-prod:JvW: enabled syslog in hpss. HPSS logging now additionally goes to hpa via rsyslog)
29.03.2017: hpss-prod:AH: hpsscr/GUI change (Max. VVs To Write): TapeSC 204 (from 5 to 1), TapeSC 205 (from 3 to 1).
29.03.2017: hpss-prod:AH: hpsscr/GUI Mirgation policy (SC22) change (# Streams Per FF, Total M. Streams) from 3, 3 to 1, 1.  
29.03.2017: hpss-prod:AH: hpsscr/GUI purge policy (SC21,SC22) from 60min,50%,40% to 1440min,80%,70% (Last file access, Start purge, Stop purge) 
24.03.2017: TSM_LSDF: MB: 4 LTO5 Laufwerke in scc-tsmlib-n01.scc.kit.edu eingebaut, F4R1,F4R3,F4R6,F4R8
23.03.2017: hpss-prod:AH: archive-sftp-01/-02 set hpss-fuse option stream=1 (MB) for all project mounts /hpss/<project>
22.03.2016: hpss: MB: Drive  3,1,1,12 SN:579004000634 exchanged, new SN: 579004001831
21.03.2017: hpss-prod:AH: archive-sftp-01/-02 set hpss-fuse option stream=1 (MB) for /hpss/bwda
21.03.2017: hpss-prod:AH: archive-sftp-01/-02 edit /etc/sysctl.conf set vm.vfs_cache_pressure=100. (Default)
17.03.2017: hpss-prod:AH: archive-sftp-01/-02 edit /etc/sysctl.conf set vm.vfs_cache_pressure=1. (Test!)
15.03.2017: hpss-prod:AH: archive-sftp-01 install smem package (yum --enablerepo=epel install smem python-matplotlib).
14.03.2017: GridKa: MB: Laufwerk 576004002628 2,1,1,0  ausgetauscht am 14.03.2017 Neue SN:576004003031
14.03.2017: hpss-prod:AH: archive-sftp-01/-02 edit /etc/sysctl.conf set net.ipv4.tcp_keepalive_time=600, net.ipv4.tcp_keepalive_intvl=10, net.ipv4.tcp_keepalive_probes=6. 
13.03.2017: hpss-prod:AH: archive-sftp-01/-02 edit  EXCLUDE_MOUNTPOINTS in /etc/rear/local.conf.
10.03.2017: hpss-prod:AH: archive-sftp-01/-02 change stream=16M to default 8M for /hpss/bwda fuse mount in /etc/fstab. remount /hpss/bwda.
10.03.2017: hpss-prod:AH: archive-sftp-01/-02 disable /etc/cron.daily/mlocate.cron for updatedb. (ToDo: only exclude fuse/bindmouts from /etc/updatedb.conf)
09.03.2017: TSM: MB: CVE-2017-2636 Mitigation,done on LSDF-01/02, scc-histor-n01 and GRID1,  # echo "install n_hdlc /bin/true">> /etc/modprobe.d/disable-n_hdlc.conf
08.03.2017: hpss-prod:AH: archive-sftp-01/-02 edit /etc/sysctl.conf change vm.swappiness=20 (it was =0).
03.03.2017: hpss_prod:DL: tape storage class 103  : Maximum VVs To Write: 4 (restart Migration/Purge Server and restart Core Server)
03.03.2017: hpss-prod:AH: on both archive-sftp-01/02: add /etc/fail2ban/action.d/iptables-common.local to solve a fail2ban error related to iptables-v1.4.7
03.03.2017: hpss-prod:AH: Downtime 10-12 am, on both (archive-sftp-01/02):yum --security update, vm.overcommit_memory=2, 'rvl=86400' for hpssfs mounts.  
03.03.2017: hpss-prod:AH: Downtime 10-12 am, on both (archive-sftp-01/02):fuse remounts needed to apply ulimits for max open files changed on 24.2.2017.
28.02.2017: HPSS_PROD:DL: Migration Policy 13(test GridKA) "Aggregate Files to tape" enabled, "Max Files in Aggregate" set to 1000
28.02.2017: HPSS_PROD:DL: Migration Policy 13(test GridKA) "Number of Migration Streams per File Family" and "Total Migration Streams" changed from 1 to 4. 
27.02.2017: TSM: MB: To prevent warning messages "sb03: setopt idletimeout 30"
27.02.2017: TSM: MB: Recabling SAN at TSM CN
24.02.2017: hpss-prod && hpss-test: Dorin: Maximum Open BitFiles: 10000 (it was 2000 before)
24.02.2017: hpss-prod: AH: ulimits for max open files and #procs increased on sftp-01/-02 in (/etc/security/limits.conf).
17.02.2017: GridKa: MB: Laufwerk 576004009255 2,0,1,1  ausgetauscht am 17.02.2017 Neue SN:576004004986
17.02.2017: GridKa: MB: Laufwerk 576004009226 2,0,1,2  ausgetauscht am 17.02.2017 Neue SN:576004003733
13.02.2017: TSM: MB: TSM server sb03 database reallocate to 4 volumes
10.02.2017: GridKa: MB: Laufwerk 576004001000 2,1,1,14 ausgetauscht am 10.02.2017 Neue SN:576004004932
07.02.2017: icinga:Karin Schaefer: yum update (with kernel update) on monitoring servers scc/gridka-ora-mon
28.11.2016: hpa:C Pfeiler: Further RedHat images provided at /export/hpa/RH_repos/RHEL/. 7.3 copied into /export/hpa/RH_repos/rhel_repo/7.3
28.11.2016: hpa:C Pfeiler: RedHat updates applied.
28.11.2016: hpss-prod:jvw:added allow_other option to LSDF-Archiv hpssfs mount command in /etc/fstab because the user could not chdir since the id is not known.
25.11.2016: hpss_prod.:Dorin Lobontu: added new volumes for primary copy as well as for secondary copy (125 T10KD cartridges and 258 TS1140 cartridges) 
14.11.2016: hpss-prod:jvw:Removed scp and rsync entries from /etc/rssh.conf on archive-sftp-0[12]. This will ensure the proper warning message if a users tries to login with ssh.

Entries below are from Ahmad and copied from text file.

4.8.2016: hpss prod. UDA-Checksum : all Fuse mounts mounted on archive-sftp-01/-02 with cksum=md5,nch=g options
4.8.2016: hpss prod. new Fuse-Bugfix upgrade hpssfs-fuse-2.0.1-0.el6.x86_64 on archive-sftp-01/-02 (rpm from Scott/IBM)
/SFTP/KIT -> /hpss/bwda/000000/<user>
Projects new directories for bwdatadiss and radar
1-4.8.2016 hpss prod.: Downtime for HPSS-FS layout change (1. phase) without username, uid and gid changes doen vai scrub rename command
(!) yum update failed becuase of missing x86_64 library that is no include in the rhel repository (Bugzilla)

29.7.2016 hpss prod.+ hpss test: yum security updates on HPSS Server + Frontends (archive-sftp-01/-02 and BWDAHUB) + repoot
8.7.2016 hpss prod. Update to hpssfs-fuse-Bug-Fix (Software/HPSS/fuse-BugFix-v2001-2010/hpssfs-fuse-2.0.0-1.el6.x86_64.rpm) installed on both archive-sft-01/-02. Mounted as before without cksum Option.

4.7.2016 hpss prod. 3xTapeDives (101->mvr1 and 301,302->mvr3) had "suspect" status starting on 30.6 be4i one drive then the two others next days. and were marked repaired. Martin checked the library logs -> OK. Email sent to Waldecker on 1.7 -> no reply.

20.6.2016 hpss test: Upgrade from 743p2 to 743p3 and test read files (wriiten in 743p2) with e2e configured system (743p3).
/db2_backup/offline/hcfg/HCFG.0.hpssdb.DBPART000.20151221154752.001
/db2_backup/offline/hsubsys1/HSUBSYS1.0.hpssdb.DBPART000.20151221154835.001
16.6.2016 hpss test: E2E Tests: Downgrade from 743p3 to 743p2 based on db2 offline backup to test read of files:
30.5.2016 hpss test: Testsystem reconfiguration with Tape SC 101, 102 -> Done by Dorin
gftp03: /root/Software/HPSS/fuse-BugFix-v2001-2010/fuse_with_BZ6036
. ver. hpssfs-fuse-2.0.1-0.el6.x86_64.rpm
24.5.2016 hpss test: fuse BugFix (got from Scott) rpm installed on test FE archvie-gftp-03
24.5.2016: hpss prod. drive ID 101, IBM TS1140 on mvr01 -> Suspect status
* sftp-01/-02 # uname -r -> 2.6.32-573.26.1.el6.x86_64
* for HPSS server (cr01,cr02, mvr01,02,03,04) (!) they were already up to date?
# uname -r -> 2.6.32-573.18.1.el6.x86_64
hpss test: Update done by Dorin. for tcr03, tmvr
# uname -r -> 2.6.32-573.22.1.el6.x86_64
archive-gftp-03 done by Ahmad, uname -r -> 2.6.32-573.26.1.el6.x86_64

24.5.2016 hpss prod. Maintenance: RHEL yum update for HPSS and FE (archive-sftp-01/-02)

19.5.2016 hpss prod. E2E disk reconfiguration done. -> Email from HPSS Support/IBM Tobias
After testing access I forgot to set the right permissions -> Ahmad.
(!) - Still to decide about all projects directory permission on production

18.5.2016 hpss prod. Bareos hpss folder permissions change: frp, 777 to scrub> chmod /hpss/bareos 771
18.5.2016 hpss prod. COS 122 (GridKa SC 12) Allowed, Core Server -> restart, MPS -> restart to activate SC 12 for Disk resources to be imported/Created into HPSS after the E2E reconfiguration. Request from HPSS Support/Tobias
18.5.2016 hpss prod. Mover1/Drive 101 was marke Suspect, IBM_IU1_PVR (Major). -> Marked Repaired

11.5.2016: hpss prod. TSM team will take 50x STKT10KD volumes from STK HPSS pool. Thes volumes are not imported and crated into HPSS therefore no further work needed. BUT we have get new once later to nit got short of volumes for the first copy.
11.5.2016: hpss prod. Total broken IBM Library, IBM_IU1_01 PVR(Major). -> Martin B. => no second copy.
Refer to howtos: ghi-thresholds.howto
9.5.2016: prod. GHI. apply threshold policy as described by Scott/IBM. Done together with GPFS -Admin(Lusmilla)

6.5.2016 hpss prod. New 100x Tape volumes added from IBM TS1140 (SC 205 sec. copy) added (imported/created) after an warning message by HPSS gui about no space left on tape.

3.5.2016 hpss prod. new project bareos setup, COS 1204, Large SC 23. -> FrontEnd (bht.lsdf.kit.edu, Stephanie Boehringer) -> Done by Ahmad
April: 2016 hpss prod. New projekt gridka-dcache project setup with Fuse. COS 123. Done by

- login OK
(!) After reboot the hpss_disk_mvr05 server could be started via HPSS GUI also PVL -> OK
(!) Only hpssmvr07 resources still in Unkown Status, TODO: Decide about what to do with.

hscroot@hpss-hmc-a:~> chsysstate -r lpar -m Server-8247-22L-SN212960A -o shutdown --immed --restart -n redhat7-03
- To reboot hpssmvr05 aka redhat7-03:
29.4.2016: Test. hpssmvr05 rebooted duo host was ping/ssh unreachable and hmc console login via vterm of mkvterm hangs after Open Completed message.
# Stefan Waldecker replaced the drive's broken jbec. (-> Martin)
Then disabled all Migration and Purge Policies assigned to SCs (1,2,3)
For die Default SC 99 I disabled the Migration Policy "Test-Migrate-Disk-To-Tape-01" before.
Reason: The Tape SC 10 and 11 did not have any tape volumes assigned to both and the MPS Process kept failing with a lot of log files in /var/hpss/log every minute. And PVL went "Major".
Howto disable M&P: GUI: Configre/Storage Space/Storage Classes -> mark SC1,2,3 -> Configure-> Mig/Pruge Policy -> NONE
Core server Restart (Shutdown/Start)
MPS Restart (Shutdown/Start)

(!) SC 501 and SC 701 kept unchanged.

29.4.2016: Test. backup all hpss configs via /opt/hpss/bin/lshpss into /root/Software/HPSS/hpss-configs/

27.4.2016: hpss prod. IBM-1140 Drive 0000078DB20E problem Solved. -> HPSS OK
On movr3 : /dev/hpss/st.1b.1-03592E07-0000078DB20E (missing)
# sg_map -st -x -i
# lsscasi
showd only one tape device -> Martin informed.
25.4.2016 hpss prod. IBM-1140 Drive (0000078DB20E) went Suspect on 25.4, -> mover3 Suspect -> HPSS, PVL Major.
22.4.2016: hpss prod. GHI: ddf-s-005# chmod +x /var/hpss/ghi/etc/ghi_backup_migration.ksh

21.4.2016 hpss prod. COS 1205 (Medium 2x copies) Max file Size changed from 500GB to 600GB. (Jos Request to allow End user to tar 8192x68MB in one file of a simulation run see mail from <agathe.chouippe@kit.edu>)
For all SCs (13, 21, 22, 23) purge policy changed (see. 13.4.2016):
Start purge when space used exceeded: from 0% to 50%
Stop Purge when space used falls to: from 0% to 40%
21.4.2016 hpss prod. Change purge policies while reformat of first storage system still running
- Migrate, Purge and Repack is done for D0300100 - D0401800, therefor all disk on Storage Unit 2 are empty.
- Firmware Update on Storage Unit 2 is done.
- All Logical Drives are deleted and recreated with T10PI enabled.
- Formatting is expected to take up to 12 days (((36 Logical Drives / 8 in parallel) * 60 hours) / 24 hours).

16.4 Status mail from Tobias
- dump_acct_sum created a list of all coses and #Files belonging to
- Purge Policies (SCs: 13, 21, 22, 23) chagnged to 0%
Start purge when space used exceeded: 0%
Stop Purge when space used falls to: 0%
- Most files on disk were only on Tape and purged from disk
15.4 - Rest files on disk Repacked, Firmware upgrade done by IBM/Tobias Elpelt
- IBM/Tobias removed from HPSS and started formating first disk system. ()
- Older COSes (1201, 1202, 1203) were Allowed and activated again, Core_server and MPS restartet.
- Delete all files belonged to older COSes and other rest unneeded files.

13.4.2016 hpss prod. Start Reconfiguration procedure to reforamt Disk storage systems for E2E support:
Core Server Shutdown/Start
MPS Shutdown/Start

5.4.2016 hpss prod. UTF8 Support GUI/Global/Global Flags/Object names can contain unprintable characters (ON)
-> also site statistics report sent to Jae Kerr produced by this new script

30.3.2016 hpss prod. /etc/cron.d/hpssstat changed to /usr/local/bin/hpssstats2.py (new version downloaed from hpss wiki)
29.3.2016 hpss prod. tape mover on hpssmvr01 sent suspect. Reason drive (101) suspect status.
16.3.2016 hpss prod. tape mover on hpssmvr02 sent Suspect. Reason drive (201) suspect status.
Notice: Work done by Dorin. Reason to proved LSDF with tape Storage duo to lack of space on other libs.
10.3.2016 hpss prod. STK Library (CN_STK_LIB01) has been partitioned. The 2x STK-Drives of STK_01_PVR on hpssmvr02 changed drive address to 3,1,1,12 and 3,1,1,0.
4.3.2016 test: upgrade HPSS Testsystem vrom hpss-7.4.3p2 to hpss-7.4.3p3 including DB2 Conversation

23.2.2016: test: FUSE und HPSS client rpms updated on archive-tgftp.lsdf.kit.edu to hpssfs-fuse-2.0.1-0.el6.x86_64.rpm. But also updating HPSS client software to 7.4.3p2-1 was also needed to be able to update the fuse package.

22.2.2016: hpss prod. SFTP-01/02 yum-autoupdate has been installed with email to root,hpss-admin@lists.kit.edu in /etc/sysconfig/yum-autoupdate -> Dorin
- Check if BWDAHUB has been updated by Frank!
(!!!)TODO: - powerMovers mvr05 and mvr07 still to be updated(!!!)
22.2.2016: hpss prod. cr01/02, mvr01, 02,03,04, SFTP-01/02, OS update duo to glibc security bug (CVE-2015-7547)
19.2.2016: test: hpsstcr03 and hpsstmvr OS update duo to glibc security bug (CVE-2015-7547)
rm: cannot remove `file_50MB.txt.copy.61991': Invalid argument
19.2.2016 prod. Problem: After deletion of 23 files under /home/ahmad2/* deleting them from Trash .Trash/root/* -> error:
see. code sftp-01:/root/coding_ahmad
TODO: still 10 files of cosid 1204.
19.2.2016 prod. changecos of zerofiles: rm zero file, touch zerofile, chmod user. zerofile, rm ./Trash/root/zerofile
now permission set currectly:
ddf-s-005# -rwxr--r--. 1 root root 13028 Jul 19 2014 /var/hpss/ghi/etc/ghi_backup_migration.ksh
(!!)TODO: check and keep an eye on it to see if every thing is OK!
11.2.2016 prod. GHI. Problem of missing policy backup and migration solved: /var/hpss/ghi/etc/ghi_backup_migration.ksh had no x-permission.
-> Email from Support/tobias (Re: [hpss-scc] KIT HPSS call - agenda for 2016-02-10) -> sent on 11.2.2016.
10.2.2016 Test File deletion from TrashCan not possible due to removed COS (104) that the file belongs to. log - Critical.
10.2.2016 hpss prod. - GHI policy backup and migration still not running. -> Email send an support/Telco. to look at (-> tobias/scott)
(!) GHI:- OS still on old 6.4. but scott said, when we go productiv ddf-s nodes should be upgraded to RHEL 6.7.
- for the hpss re-compilation and setup of ghi will be needed. s. howto!
4.2.2016 hpss prod. RHEL upgrade 6.4 to 6.7 Done. HPSS Upgrade 743p1-743p2 done. 8xddf-s-* GHI nodes upgrade to hpss-743p2 done. (Downtime for one week) -> ahmad, dorin
(!) RDAC migration for Testsystem was tested by Tobias IBM and continued for testCore by Dorin. in Jan/16

3.2.2016 hpss prod. Migration from RDAC to Multipath for disk storage for all HPSS Server done. -> started in mid Jan/16. -> Dorin
17.12.2015 hpss prod. API logging on FrontEnds archive-sftp-01/-02 disabled. /var/hpss/tmp/hpss.api.resource.disabled
- Ahmad adapted the changes as discribed by IBM/Scott (10.12) and copied both files to all nodes (ddf-s-001-008)
- Started GHI again:
ddf-s-005]# initctl start start_ghi_run
# initctl start ghi_iom_hpss
# /opt/hpss/bin/ghistartup -g

14.12.2015 hpss prod. GHI:
/var/hpss/ghi/policy/hpss/backup_migration.policy
/var/hpss/ghi/policy/hpss/migrate.policy
RULE 'scratch_exclude' EXCLUDE WHERE path_name like '%/scratch/.ghi/%'
Should be changed to:
RULE 'scratch_exclude' EXCLUDE WHERE path_name like '%/scratch/%'

10.12.2105 hpss prod. GHI: Scott found out, not whole scratch dir was excluded: (s. email)

10.12.2015: hpss test: RDAC-Multipath: IBM/Tobias Elpelt Tested migration from Disk-Drivers-RDAC-Multipath. hpsstmvr shutdown by Ahmad, and restarted by tobias after work done. tobias->Email (how-to RDAC to dm-multipath migration 11.12.2015)
8.12.2015 hpss prod. After adding tape volumess -> Major erros, All Disks 100% full, PVL for PVRs -> Down. GHI stopped, Ticket to IBM Support. Uwe, Jae solved> DB2 user (hpss) was not existing.
8.12.2015 hpss prod. New Tape volumes imported and created into PVRs STK_01 and IBM_IU1_01. (TK005000-TK014900 and ?-TS009900)
7.12.2015: hpss prod. HPSS Downtime for one week for maintenance. OS/RHEL6.4-6.7 and HPSS 743p1->743p2 Upgrade. Email sent to users.

2.12.2015 hpss prod. ChangeCos for zero files issue closed. TODO: recreate zeot files in neew COS 1205 after deleting from old COS 1204.
25.11.2015: hpss prod. GHI: Scott checked and responded. See email and ghi-problems-scott.howto. Now GHI is OK! But still some migration errors. -> Scott.
24.11.2015: hpss prod. OK: Mover03 and drives markted repaired afer cancel jobs. -> Dorin
23.11.2105; prod. Suspect: Mover03 and suspect drives of the IBM Lib -> broken air condition at weekend!
19.11.2015: hpss prod. GHI: still showed error messages. Email sent to IBM Support -> Scott
- /opt/hpss/bin/htar.ksh was not on any of GHI nodes.
- scott upgraded GHI-HTAR to the latest version for 2.4,GHI-HTAR 5.0.0.1g, for bug fixes.
- Scott: patch released for GHI 2.4.0.1. for bug fixes icluding a Sev 1 fix. Postponed for later.
19.11.2015 hpss prod. GHI Scott findings:
18.11.2015 hpss prod. GHI: Due to error messages email sent to IBM Support-> Scott repsponded and found out 19.11.2015:
- delete old/create new Hier and COS 1102, 1103 via GUI for ghi policies.
- on ddf-s-001 # mmchfs hpss -z Yes (see. 2.11.2015)
# mmdsh chmod +x /opt/hpss/bin/ghi_iom
- mmmount /hpss -a
# chmod +x /var/hpss/ghi/etc/start_ghi_run
- change log path in /etc/init/start_ghi_run.conf (>> /data/ghi_log/start_ghi_run.log)
- initctl start start_ghi_run
- had to start ghi_iom manually: (Update: I should have used# initctl start ghi_iom_hpss)
# ddf-s-001: # mmdsh /opt/hpss/bin/ghi_iom hpss 8012 (see. ddf-s-005: /etc/init/ghi_iom_hpss.conf)
- /opt/hpss/bin/ghistartup -g
- on ddf-s-005 # chmod +x /var/hpss/ghi/etc/ghi_backup_migration.ksh

17.11.2015 hpss prod. GHI:
16.11.2015 Test. RHEL-OS Upgrade from 6.4 -> 6.7 on hpsstcr03 and hpsstmvrand recomplie rdac DCS3700 both Servers.
Reason: Find out (9.11.15) that after the OS Upgrade (2xweeks ago) from 2.6.32-358.el6.x86_64 to 2.6.32-358.23.2.el6.x86_64
/lib/modules/2.6.32-358.23.2.el6.x86_64/kernel/drivers/scsi/
mppUpper.ko mppVhba.ko (were missing)
Solution (IBM DCS3700 redbook):
- cp mvr1:~/Software/rdac-LINUX-09.03.0C05.0652-source.tar.gz mvrt:~Software/DCS3700/
- untar
- make uninstall
- make clean (in case *.o are compiled)
- make
- make install
- Add to /boot/grub/menu.list:
title Red Hat Enterprise Linux Server (2.6.32-358.23.2.el6.x86_64) with MPP
root (hd0,0)
kernel /vmlinuz-2.6.32-358.23.2.el6.x86_64 ro root=/dev/sda3 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet console=tty0 console=ttyS0,115200n81r
initrd /mpp-2.6.32-358.23.2.el6.x86_64.img
- reboot

5.11.2015: Test. After hpsstmvr reboot: got disk read error. Therefore disk-tmvr broken
- LOGs: CORE0068: Space usage in tablespace STORAGESEGTAPEABIX of database subsys1 has exceeded critical threshold of 90%; usage at 1
- CORE0069: Spacee usage in tablespace TAPESEGUNLINK of database subsys1 has exceeded warning threshold of 85%; usage at 89%
- Solution: (Jae IBM): Check both "global" and "subsystem" configurations.
Make sure the "Metadata Space Monitor Interval" value is set to 0 seconds.

4.11.2015 Test. Core Server GUI Broken Status:
- P8-TestCOSes (500, 700) disabled (GUI/Subsystems/Configure).
- Data purged from DSC 1,2,3. MPS should be stareted to see the SCs under GUI/Monitor?Sorage Classes active
- Duo to CoreServer Major and Critical messages, /opt/hpss/bin/rc.hpss stop/start
and then Purge/Start

4.11.2015 hpss test:
4.11.2015 prod. GHI: old Hierarchy 1401 and COS 1401 (named "GHI Metadata") deleted and new onces with same IDs created with new Tape SC 204 and 205 (DSC 31, TSC 204, 205). restart: core Server/PVL/MPS
(!) Before enable scripts and starting ghi you have to (mmchfs hpss -z Yes).
2.11.2015 prod. GHI: all daemons Stoped: On ddf-s-005: disable scripts (1. chmod -x /opt/hpss/bin/ghi_iom) and (2. chmod -x /var/hpss/ghi/etc/start_ghi_run) and (mmchfs hpss -z No # to mount hpps without ghi started!!)
28.10.2015 prod. on sftp-01/02 sftp logging set up. logfile: /var/log/sftp
28.10.2015 prod. API loging activated via rsource file as described by Scott (IBM). See email.
28.10.2015 test. Jonanathan (IBM) put P8 Disk Storage OFF to solve PVL major status.
23.10.2015 prod. Purge started from GUI manually for Disk SC 22. Duo to user errors and log errors (No space left on device). -> Scott adwise to purge since no segments nore because of fragmentation. HPSs Telco 21.10.2015.
20.10.2015 prod. changecos for logs files in /log march-14.10.2105 from COSID 1102 into COSID 1205. And reinstialize log daemon and client daemon23.10.2015 prod. Disk SC 22 Purge policy changed from Space exceeds 90% -> 70% and Stop Purge space left unchanged.
17.9.2015 prod. cronjobs on core server cr2 for accounting and site statistics. /etc/cron.d/(hpssstat, hpssacct)
11.9.2015 prod. cronjobs on core server cr1 for accounting and site statistics. /etc/cron.d/(hpssstat, hpssacct)
10.9.2015 prod. Trashcans activated at 16:03. GUI/Global/Trashcans Settings (5, 86400,864000, 3600)
7.9.2015 hpss prod. IBM_IU1_01 was still in Major. reporting Null mounts. Only PVR shutdown/start helped.-> OK
2.9.2015 hpss prod. IBM_IU1_01 PVR went Major. Cause: CN_IBM_N01 broken -> Martin Beizinger Maintenance, done 4.9.
2.9.2015 prod. STK-Drive change: broken STK-drive (id 202) connected to hpssmvr02 replaced with a new one and added to devices&drives. (old: mvr02:/dev/hpss/st.1b.1-T10000D-579004000755 -> new: mvr02:/dev/hpss/st.1b.1-T10000D-579004000661)
27.8.2015 prod. Class Of Service changed from 1102 to 1205 for LogDaemon GUI/Servers/Log Daemon/specific/Archive Class of Service
26.8.2015 prod. Max file size for 1205 changed to max 500GB. Based on new calculations came from HPSS Support. -> Scott (Email)
25.8.2015 prod. 8 PM end of changecos . But 33555 missing-Files still have COSID 1204. To be reported to HPSS Support.
24.8.2015 prod. GHI manager on ddf-s-005 disk 100% full duo to /var/hpss/ghi/log/start_ghi_run.log (12GB). temporary solution: copyed file to ddf-s-005:/data/ghi_log_backup/. empty file /var/hpss/ghi/log/start_ghi_run.log
20.8.2015 prod. 10 new tape volumes for STK_01_PVR added. 30 new tape volumes for IBM_IU1_01 added
20.8.2015 prod. STK drives for PVR STK_01 PVR went suspect status while change cos for tape vilume TK001500.
17.8.2015 prod. on sftp servers (sftp-01/02) unmount fuse 1204 an mount fuse with new cos 1205 but OLD PATH. /SFTP/KIT
17.8.2015 prod. 11 AM start of changecos from 1204 to 1205
13.8.2015 prod. Max. File size for COS 1204 and 1025 extended to 2TB. (OK came from Jos and HPSS Support-Scott)
Martin reported the drive. I Checked the drives in GUI as (Mark Repaired) => Green OK
# Suspect drives after firemware update
11.8.2015 hpss prod. T10:IBM|03592E07 0000078DB20E
13.8.2015 hpss prod. T10:IBM|03592E07 0000078DB1EE

GHI shutdown -g before upgrade and startup afterwards on ddf-s-005 successfully. (Ahmad)
After shutdown applypolicy jobs kept running. -.Scott said no worry after a timeout will stop.
GHI logfiles while applypolicy jobs made local disk full.
-> Scott said to disable applypolicy scripts since GHI not used. (he didn't confirm)
iom processes kept running. -> Scott not problem, automateclly startet. You can only restart with -i.

3.8.2015 hpss prod. Duo to GPFS-3.5-18 Bug upgrade to 3.5-25 on node ddf-s-006 done by (Ludmilla/Ursula)
29.7.2015 Test. Reformating HPSS_T_Cache1 Disk as preperation for the End2End Protection (T10PI). -> IBM Jonathan
core Server and PVL restart.
9.4.2015 prod. gui Migration/Purger server Restarted cause: Drives mounted for days. PVL job cancle but not effect, PVL (Major)
9.4.2015 prod. gui configure/Subsystems/Configure/Allow disabled for ID 1201, 1203, Error: "Disk migration Failure". (No Tape behind)
8.4.2015 Support Ticket submitted by IBM-DE Kay Jenke for broken Disk. (01V4NJL,724 for the Disk Alert)
7.4.2015 Test Drive failure Enclosure 99, Drawer 2, Slot 9

Jan/Feb. 2015 hpss prod. HPSS upgrade to 7.4.3p1

back to HPPS main page

hidden:System Logs (HPSS, TSM): Difference between revisions

Revision as of 18:52, 10 April 2017

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools