Unix Lab



The tar and jar utilities



Archiving tools

In this module we will look at two tools that are used for archiving files. Just what is "archiving" anyway? To archive something means to save it in some kind of long-term storage. In the computer world, this usually means doing something like making copies of important files and storing them on a backup device such as tape or a spare disk drive.

We will look at two archiving tools. The first is a Unix utility called tar that has been around for a long time. The tar utility is designed to pull together Unix files and directories and wrap them up in such a way that they can easily be stored.

The second is a utility called jar that is used primarily to pull together Java class files and wrap them up in such a way that they can easily be stored.

Uses for archive utilities

So, what's useful about being able to wrap up files and directories?

Think about wanting to save some of the work you have been developing as part of a project for the past month. There are multiple files and directories that you want to backup. One way would be to simply copy the files one file at a time to a backup device such as a spare disk drive. Sometimes the spare disk is attached to another host computer so this may require having to move these files across a network.

Using tar allows you to create a single file that contains everything you want to backup. This single file can be copied to the backup device as one item with a single command even though the file may contain many files and directories within it. These special files created by the tar utility are called tar files.

Obviously, the original files must be modified in some way to be able to accomplish this magic. Moreover, if you create one of these special files, you would like to be able to recover the original files whenever you want. The tar utility takes care of all that.

A similar utility for Java class files is also available. This utility is called jar and its uses are similar to the uses of tar. Files that have been "archived" by the jar utility are called jar files. jar files have the nice property that the Java Virtual Machine can access the class files within them without having to "un-archive" the class files. This makes these files easy to use. In particular, they can be downloaded from a server as one file (which saves a lot of time that would be used if each class file had to be requested separately), and used in Web applications.

The tar utility

Creating tar files with the tar utility

In the directory you are working in, create a subdirectory called project.

Into that directory copy the files

  Patterns.java
  Patternmaker.java
from the directory: /handouts/cs46blab

Type:
cd project

Now type:
tar -cf patternBackup.tar Patterns.java PatternMaker.java
followed by:

ls

You should now see that in addition to the two java files in the project directory you now have a file called patternBackup.tar.

Let's analyze what happened.

The tar command takes many options. (We will only look at a few.) The c option tells tar that you want to create a tar file and the f option (the option characters are simply clumped together but you must not leave any space between options) is used to name the tar file that you are about to create (we called it patternBackup.tar).

Following the options we list the files that we want to wrap up into this tar file: We chose Patterns.java and Patternmaker.java.

You can also use wildcards in the file names as in:
tar -cf patternBackup.tar *.java

Finally, there is another option that you can use to see what tar is doing while it's doing it. This is the v option. For example, in the previous example suppose we add the option and type:
tar -cvf patternBackup.tar *.java

What you will see is:

PatternMaker.java
Patterns.java
This shows you the names of the files that are being placed in the tar file.

Extracting tar files

Keeping the tar file in the project directory, delete the two java files. We will now see how to extract these two files from the tar file.

Now type:
tar -xf patternBackup.tar Patterns.java

The x option tells tar to extract files from a tar file. The f option is as before; it names the tar file. The last parameter is the name of the file we want to extract from the tar file.

If you issue the ls command you will see that you have now the tar file and the Patterns.java file in the project directory.

Delete the java file again and type:
tar -xf patternBackup.tar

Now use ls to see what you have. What does tar do when you don't supply a list of files to extract?

The v option can (and should) be used with the x option? Go back and delete the java files and use the following command to recover them:
tar -xvf patternBackup.tar

tar and working with directories of files

The tar utility has the ability to store entire file subsystems of files. In this section we will see how that works.

Start with just the Patterns.java and the PatternMaker.java files in the project directory (delete the tar file from the exercises above). Now cd to the directory in which the project directory is contained. If you created the project directory from your home directory, then you should be in your home directory.

Now type:
tar -cvf project.tar project

As you know by now, the c option is to create a new tar file. The v option lets you see what tar is doing as it does it. The f option names the tar file being created. The last parameter: project names a directory to be archived.

When tar sees that you want to archive a directory, it will archive every file and every subdirectory in that directory.

What you should see when you type the previous command is:

project/
project/PatternMaker.java
project/Patterns.java

This shows you that tar archived the project directory including the two files within it.

Be sure you have the tar file in the same directory as the project directory. If you use the ls -F command, you should see:

...(other files or directories)   project/     project.tar 
...(other files or directories)

Delete the entire contents of the project directory and rmdir the project directory itself.

Now type:
tar -xvf project.tar

Now issue an ls command. What do you see?
Use the cd command to visit the project subdirectory. What do you see?

What you should see is that the subdirectory and its contents have been completely restored.

See if tar will archive directories within directories. For example: create a subdirectory in the project directory (call it data) and place a simple text file in the data subdirectory.

Now go back to the parent directory of project and delete the tar file that you created earlier. Now issue the appropriate command to archive the project subdirectory.

Delete the project directory contents starting with the data subdirectory and moving upwards.

Use the tar command to extract the project directory and check the contents of the project subdirectory. What happened?

Checking the contents of tar files

Often, software for Unix systems is made available in the form of a tar file. Before you extract the contents of the tar file, you should see what's contained within it. One of the options to the tar command lets you do this.

Pick the tar file from the previous exercise (project.tar) and issue the following command:
tar -tf project.tar

What you should see is a table of contents of the tar file. You can always view the contents of the tar file before you extract any files. You may decide to extract the entire contents or just a few files.

The jar utility

The jar utility is similar to the tar utility except that it is used for java class files. Actually, its primary purpose has been to pull together an applet and all its support files (including image files, for example) into a single archive file that can be downloaded all at once to a requesting Web client. Not only does the jar utility pull together these files, but also it compresses them so that there will be less data to download.

Creating jar files

Assuming you have created a project directory as in the material on the tar utility, go to the project directory and make sure it contains the two files (only) Patterns.java and PatternMaker.java.

The first thing we will do is to compile the java files to generate the corresponding class files:
javac PatternMaker.java Patterns.java

You should now see that the two class files are now present in the project directory.

Now type:
jar -cvf Pattern.jar PatternMaker.class Patterns.class

The options for the jar command have the same meaning as in the tar command. In particular, we are creating (c option) and viewing the process as jar carries it out (the v option) and naming the archive file (the f option) as Pattern.jar.

What you should see is (your numbers may be slightly different):

added manifest
adding: Patternmaker.class(in = 1379) (out=913)(deflated 33%)
adding: Patterns.class(in = 853) (out=562)(deflated 34%)

The term "manifest" refers to a special file that the jar utility places in the jar file that contains special information about the contents of jar file the file. We won't worry about that in this tutorial.

The other lines tell us as the jar utility compresses and includes each of the files we named in the command. For each file, we are told the initial size of the file (in bytes), it's compressed size, and the percentage size decrease that this represents.

Extracting files from jar archive files

Now let's see how to recover the compressed archive files. If you understood how to do this with the tar utility, you can do it with the jar utility.

Start by deleting the .class files in the project directory. Then type:
jar -xvf Pattern.jar

What you should see is something like:

  created: META-INF/
extracted: META-INF/MANIFEST.MF
extracted: PatternMaker.class
extracted: Patterns.class

If you use the ls command you will see that the class files are back but so is a new directory called META-INF. This is where information about the files would be placed if we had used this feature. In our case, the directory will contain a single file (MANIFEST.MF) with a couple of lines of information but otherwise empty. You can ignore this file and the directory in what you will be doing in this class.

Try using the t option to view the table of contents of the jar file.

Using jar with directories of files

Just as with the tar utility, with the jar utility we can pull together whole directories of files into a single jar file.

Start with the project directory (as above) with just the two .class files within it.

Now cd to the parent directory of the project directory and type:
jar -cf project.jar project

Use the -t option of the jar command to see the contents of the file: project.jar. What do you find?

Now delete everything in the project subdirectory and delete the directory itself (keep the project.jar file). See if you can reconstruct the project subdirectory again. What's the command?

jar files as containers of Java packages

One of the features of jar files is the ability to use them as containers of java packages. Java packages are classes that are logically connected. Think of them as libraries of classes that you can download and use in your Java applications.

In fact in the module on compiling, we used a jar file containing a Java package called archipelago (see the section on importing packages).

The important thing is to place these jar files in a location which the Java compiler and Virtual Machine will find. This is handled with the CLASSPATH variable.


Click on to go back to the main directory.

Click on to take the quiz for this module.

These pages were developed by John Avila SJSU CS Dept.