Thursday, July 9, 2009

Command-line scripting and utilities on Linux

A Linux systems administrator becomes more efficient by using command-line scripting with authority. This includes crafting loops and knowing how to parse data using utilities like awk, grep, and sed. There are many cases where doing so takes fewer keystrokes and lessens the likelihood of user errors.
For example, suppose you need to generate a new /etc/hosts file for a Linux cluster that you are about to install. The long way would be to add IP addresses in vi or your favorite text editor. However, it can be done by taking the already existing /etc/hosts file and appending the following to it by running this on the command line:

# P=1; for i in $(seq -w 200); do echo "192.168.99.$P n$i"; P=$(expr $P + 1);
done >>/etc/hosts

Two hundred host names, n001 through n200, will then be created with IP addresses 192.168.99.1 through 192.168.99.200. Populating a file like this by hand runs the risk of inadvertently creating duplicate IP addresses or host names, so this is a good example of using the built-in command line to eliminate user errors. Please note that this is done in the bash shell, the default in most Linux distributions.
As another example, let's suppose you want to check that the memory size is the same in each of the compute nodes in the Linux cluster. In most cases of this sort, having a distributed or parallel shell would be the best practice, but for the sake of illustration, here's a way to do this using SSH.

Assume the SSH is set up to authenticate without a password. Then run:

# for num in $(seq -w 200); do ssh n$num free -tm | grep Mem | awk '{print $2}';
done | sort | uniq


A command line like this looks pretty terse. (It can be worse if you put regular expressions in it.) Let's pick it apart and uncover the mystery.


First you're doing a loop through 001-200. This padding with 0s in the front is done with the -w option to the seq command. Then you substitute the num variable to create the host you're going to SSH to. Once you have the target host, give the command to it. In this case, it's:

free -m | grep Mem | awk '{print $2}'

That command says to:
  1. Use the free command to get the memory size in megabytes.
  2. Take the output of that command and use grep to get the line that has the string Mem in it.
  3. Take that line and use awk to print the second field, which is the total memory in the node.
Once you have performed the command on every node, the entire output of all 200 nodes is piped (|d) to the sort command so that all the memory values are sorted.

Finally, you eliminate duplicates with the uniq command. This command will result in one of the following cases:
  • If all the nodes, n001-n200, have the same memory size, then only one number will be displayed. This is the size of memory as seen by each operating system.
  • If node memory size is different, you will see several memory size values.
  • Finally, if the SSH failed on a certain node, then you may see some error messages.
This command isn't perfect. If you find that a value of memory is different than what you expect, you won't know on which node it was or how many nodes there were. Another command may need to be issued for that.
What this trick does give you, though, is a fast way to check for something and quickly learn if something is wrong. This is it's real value: Speed to do a quick-and-dirty check.

No comments:

Post a Comment