Exercise: Combine two CEs

From Gridkaschool

If you want to setup several Computing Elements with only one batch system behind at your site, e.g. to have a redundancy in case one host breaks or you need a short downtime for maintenance, you have to make some changes to have a reasonable setup.

Think about what changes in case of two or more CEs.

  • For example normally all Worker Nodes have the same software area mounted. What do you have to adjust to reflect this on the CEs? The software tags are set by the software admins of the VOs. They are stored in /opt/edg/var/info/<vo>/<vo>.list. Since the same software in installed behind all CEs the files should be equal on all CEs. One way to achieve this is to mount the directory via nfs on all CEs.
  • Also you want to ensure a consistent mapping for the users on your batch system and therefore on your CEs. This is important to prevent users steeling data from each other, e.g. a proxy. The important directory for this case is the gridmapdir under /etc/grid-security/gridmapdir. This can also be centrally mounted. There is also a development to swap this mapping to an external service called SCAS. This service can be deployed redundantly so you don't have a single point of failure.
  • On the CREAM CE you need a blparser for each instance with access to the batch server logs. This can be done by running an instance for each CE on the batch server listening on different hosts or by mounting the logfiles on each CE and have the blparser running on each CE. The load caused by the blparser is quite low since is mainly does a tail -f on the current log file to report the status changes of the jobs.
  • For each batch sub cluster only one CE should publish the number of physical and logical CPUs.

Now take two compute elements and try to implement all the changes.

Go back to gLite Administration Course