grep
utility
Suppose you know that a file you are looking for has a
specific text string within it but you don't remember the
name of the file. Suppose in a Java program consisting of
many files that you would like to find the source code
for a specific method. The grep
utility
can help you in both of these instances.
Copy the file /handouts/cs46blab/graph.usa
into your working directory. This file contains a list of pairs of
city names together with the distance in miles between them. Examine
the first few lines with the
head
command.
In its simplest form, the grep
command will look like:
grep string file(s)
where string is a string of characters you are looking for in the one
or more files you have named. For example, let's look for the string
Denver
in the file graph.usa
:
grep Denver graph.usa
Copy all the files with a .java extension from
/handouts/cs46blab
into your working directory.
If you look at the Patterns.java
file (using
more
,
for example) you will see that the update method is called
in the program. Suppose we want to find out where the update
method is defined in these files (in this case you could easily
determine that but pretend you had many files and you didn't know). Type:
grep update *
What is the role of the '*
' character? If you are not
sure, check out the
use of metacharacters. Which files contain mention of
update
? How can you tell?
Knowing the file may be useful, but it would be more useful to know
the line number within the file. Type:
grep -n update *
Compare the former output to this output and determine the line numbers of
the files where update
is mentioned. In which file and on what
line is the start of the definition of the update
method?
Use the man
command
to determine the purpose of the -i
option for grep
.
What about the -c
option?
Sometimes we need to be more specific about the string match that we are looking for.
As an example, consider the graph.usa
file again. Suppose we want
to find all lines containing the string Denver
but only if it
appears at the beginning of the line. We can use regular expressions for this.
If you are unfamiliar with regular expressions
please review that module.
To look for occurrences of Denver at the beginning of a line, type:
grep '^Denver' graph.usa
The use of the single quote character surrounding the regular expression is to
"protect" it from the shell so that the shell doesn't try to interpret the
metacharacter '^
' before grep
sees it. Compare this
output to the output from the previous grep
command earlier.
The '^' character at the beginning of the regular expression forces the match to be anchored at the beginning of the line. If the expression doesn't find a match at the beginning of the line, it's not a match.
There is also the use of the '$' character. If that appears at the end of a regular expression it forces a match to be anchored at the end of the line. If the expression doesn't find a match at the end of the line, it's not a match.
To look for all lines in which the distance between cities is less than 100 miles type:grep ' [1-9][0-9]$' graph.usa
Do you understand why this works?
How would you find out the line numbers for these lines?
Suppose we want to find all lines in graph.usa
that mention a
city whose name ends in the characters City
? In particular,
we will accept any alphabetic character followed by City
. What
is the grep
command to do this?
What if we want to find names of cities that end in an 's
' character?
(Hint: look for an s character followed by a blank.).
You can find a more comprehensive guide to regular expressions using:
man -s5 regex
(scroll down until you find the material). This will give you more flexibility
in how to specify strings for which you want to search.
Finally, you can use the more powerful regular expressions with a more powerful version of grep called egrep. If you find that grep doesn't understand your regular expression, try egrep.
Click on to go back to the main directory.
Click on to take the quiz for this module.
These pages were developed by John Avila SJSU CS Dept.