Unix Lab



Shell Scripting 2

In this module we continue looking at shell scripts. We will learn about another shell variable and how we can use it to randomize file names. We will also learn about storing files temporarily in the directory /tmp. An important additional thing that we will learn is the ability to build looping structures in scripts. At the end of this module you should see that the shell provides a full-features language with which we can write some very useful programs.

We will put together a script that will take one parameter, the name of a directory. This script will look in that directory for any file that has a .java extension AND contains a main method.

A script for finding files with certain characteristics

In this script we will put together a script that will let us ask a question such as: "Can you tell me the names of any .java files that contain a main method in the following directory?". Of course, the question is directed at the shell and our script will allow us to name a directory and the result should be a list of .java files that contain a main method.

Use a text editor and create a file called findmain containing the following lines of text (cut and paste)

#!/bin/sh

# file: findmain
# This script takes one parameter, the name of a directory,
# and displays all files in that directory that have a .java
# extension AND contain a main method.

if [ $# -ne 1 ]
then
   echo "Usage:"
   echo "  findmain dirName"
   echo "    where dirName is the name of a directory in which you "
   echo "    want to see a list of those .java files that have "
   echo "    a main method."
   exit 1
fi

This is just the beginning of the script that we will ultimately produce. Here all we are doing is showing the part at the beginning in which we check to see if the number of parameters is not 1

if [ $# -ne 1 ]

Remember that $# means the number of parameters, -ne is how we denote "not equal" and that the spaces before and after the [ and ] characters are required.

The other thing this piece of script shows is the way Unix script writers report an error to the user. There may be an error message that identifies the specific problem that there are not enough parameters (we haven't supplied one), but there will almost always be a message telling the user the correct way to use the script in the form starting with "Usage:" then the proper form of the command, then a short description of the parameter(s).

The last piece is the use of the
exit 1
command. This has the effect of stopping the execution of the script and returning a value of 1 to the shell (just like a method returns a value after it terminates). We will see later how we can use such values later. A value of 0 means that everything completed normally, a value other than 0 means there was an error.

Open a terminal window and in the directory containing your findmain file. Make sure you assign execute permission to the file, then type:

findmain

What do you see? Now re-issue the command, this time with a parameter supplied (at this point, it doesn't matter if that parameter is the name of a directory). This time you should see nothing except the prompt telling you the shell is waiting for your next command.

A standard programming philosophy in Unix is "No news is good news". Thus, the absence of output just means everything went according to plan but there just isn't anything to report.

Modify the findmain file and append the following lines:

ls $1 | grep "\.java" > /tmp/javaFiles$$
if [ ! -s /tmp/javaFiles$$ ] 
then
   echo "No files found"
   exit 0
fi
Then type:

findmain dir

where dir is the name of directory that contains files with a .java extension. Also do the same with the name of a directory that contains no files with a .java extension.

What you should find is that if no .java files exist in the directory you named, then you will see the message: "No files found". Otherwise you will see nothing.

Now that you have seen the result of the modifications, let's look at the file and see what each statement does.

Let's look at the statement:

ls $1 | grep "\.java" > /tmp/javaFiles$$

You should recognize the ls command which will produce a listing of the contents of the directory that was named in the first parameter of the command that we just typed.

The output of the ls command is piped directly to the grep command. From your knowledge of regular expressions with grep you should recognize that we are searching for the string denoted by "\.java". The backslash character in front of the '.' character tells the shell to treat the character as just the dot and not the metacharacter in regular expressions that stands for any single character.

Just what is grep looking at? It's the output of the ls command which is a list of the files in the named directory. Therefore, grep is looking at the single (intermediate) file that ls produced as output with those file names looking for any name that has an extension of .java. The '>' is the file re-direction character telling grep to send its output to the file /tmp/javaFiles$$. But what is this file?

The /tmp directory in Unix systems is used as "scratch" storage to be used by programs when they need to create temporary files that will later be discarded. What we're doing is re-directing the output of the grep command to a temporary file by the name of javaFiles$$ in the /tmp directory. Actually, the $$ is not part of the filename. It's another one of those Unix system variables like $1 or $#. If findmain were a script that was available to anyone that was using the system, it is possible that several users would generate a file with the same name in the /tmp directory. The expression $$ denotes the process id of the process that is executing the script. Every process has a unique id so this means that no matter how many persons were using a copy of this script, the filenames generated by the script would each be different. For example, if the process id of the script were 5634, then the script would create a file called javaFiles5634. Another person using a copy of the script might have a process id of 4332 and their file would be named javaFiles4332. It's a nice way to create file names that will not conflict with those of other users.

Use the current version of findmain and make sure you specify the name of a directory that contains some files with .java extensions.

Now check the contents of the /tmp directory. You should see the file placed there by the findmain method. Look at it's contents. What do you find? Are there any of these files from earlier runs ob the findmain script? What do those files look like in which no files with a .java extension were found?

Now look at the next statement in the findmain script file:

if [ ! -s /tmp/javaFiles$$ ]

The shell has a number of boolean expressions that we can use for testing files. These expressions are used in if statements just as we're doing here. The expression:
-s filename
will report true if the size (that's what the s stands for) of the file is positive. That is, if the file is not empty. Consequently the statement:
! -s /tmp/javaFiles$$

will tell us if the file /tmp/javaFiles$$ is empty. This would tell us that the grep utility found no files with a .java extension.

Append the following lines to complete your findmain file:

for file in `cat /tmp/javaFiles$$`
do
   grep "main(" $1/$file > /dev/null
   if [ $? -eq 0 ]
   then
      echo $file
   fi
done
exit 0

Here is our encounter with a looping structure: the for loop.
The basic syntax is:

for variable in list
do
  whatever_you_do
done
where variable is the loop variable, list is a list of strings, and whatever_you_do is a sequence of commands that you want to execute each time through the loop.

Each time through the loop, the loop variable will have the value of the next string in the list.

Generally, the sequence of commands in the body of the for loop will contain references to the loop variable as we shall see in our example.

We know what the cat command will do. In our case the cat command will display the named file which contains a list of all the files that have the .java extension. If you look closely, you will see that the command is surrounded by the back quote character: ` . The backquote character is actually a Unix command that tells the shell to execute the command within the backquoted string and replace the entire backquoted string with the output of the command. What this means here is that the output of the cat command will become the list of strings in the for loop. Each time through the loop, the variable named file will take on the next file name in the file /tmp/javaFiles$$.

And what do we do each time through the loop? The first thing is to execute the grep command in which we look for the string "main(" in the named file. $1 contains the name of the directory and $file contains the name of the file. The two together become the full path name of one of the .java files.

All we want to know is if the .java file contains a main method. We don't want to see the output of the grep statement itself. Here's how we do that. We re-direct the output of the grep command to this strange file name: /dev/null. In all Unix systems this is the name of a file that will simply swallow anything that you place into it and make it disappear. It's the proverbial "bit bucket". Then if the command successfully found the string "main(", then we want to display the name of the file (only). Otherwise, we go check the next file in the list.

The value of $? will be 0 if the previous command (the grep command) was successful and will be non-zero if the command was unsuccessful in finding a match. Remember those values returned by Unix scripts --- 0 for a successful execution and non-zero if an error occurred?

Make a copy of findmain and name it findX where you can decide what you want for X. For example, suppose you want to find all .java files that have a method named compareTo.

Notice that in findmain we look for the string main(. What if there were a blank between main and the ( character?

Modify the findmain script to fix this problem. Test it. Your script should be able to find a reference to a main method regardless of the number of spaces between main and the ( character.

Modify findmain by copying it into a file called findClass and then modifying the findClass file.

Let the findClass script take two parameters: a directory and a class name -- for example:

findClass /home/myaccount/project IOException

Then your script will report on those Java source files that reference the particular class type.


Click on to go back to the main directory.

Click on to take the quiz for this module.

These pages were developed by John Avila SJSU CS Department