First
perl program:
#!/usr/bin/perl -w
print
("Hello, world!\n");
Introduction
Perl
is the "Practical Extraction and Report Language". System administrators
typically spend a good amount of time writing reports based on data taken
from various utilities, so why not have a language that aids in this? Bourne
and Korn shell are good for running processes in batch, but their main
weakness for system administration tasks is that they lack any facility
for pattern matching, string replacement and generating reports. Instead,
they must rely on external utilities to perform these tasks, grep(get
regular
expression), sed(stream
editor), and
awk(text formatting utility named after it's co-creators initials).
The problem with this, is that in non-heterogeneous environments (environments
not running identical OSes on fairly identical hardware), the behavior
of these utilities changes from vendor to vendor as some of them add proprietary
changes. Also, these languages are dependent on the shell's executable
search path containing all these utilities. Since the majority of the world
is a non-heterogeneous environment, this is indeed a problem for automating
tasks in a distributed manner for most system administrators. Other weaknesses
include no recursing functions, lack of dynamic data structures, and a
limit to the size of the data structures you do have!
Before you start
The
very first line of our very first perl program should look fairly familiar.
It informs the UNIX loader to run the contents of the file under the named
program (the absolute path listed in the example may not reflect the path
on your system!), and unlike other interpreted languages, perl allows command
line switches when invoked in this manner. And indeed, in our first case
we have a switch, "-w", which indicates to the perl interpreter to produce
more warning messages about the syntax of your program.
Data
The
simplest data type perl works with is the scalar data type. A scalar
is either a number (integers or floating point, perl calculates all numbers
as double-precision floating point numbers), or a string of characters
(and binary characters are okay too). "Hello, world!", "Hello\nworld\n",
"H", "6", "5.45", "-2.33e-17" are all scalars, and hex (ex. 0xef) and Octal
(0644) values are valid also.
Perl
provides typical math operators (+-/*), exponation (**), and modulus (%,
floating points are reduced to integers (5.7 becomes 5, 10,000**-7 becomes
0, etc.), modulus provides "remainder"). Logical comparison operators are
< <= == >= > !=, but these can only be used to compare numbers, lt
le eq ge gt ne are the respective string comparison operators. But numbers
can use the string operators, so while 45 > 5 evaluates to true, 45 gt
5 evaluates to false since 45 alphabetically less than 5. Other string
operators include string concatenation with "." ("hello" . "world" = "helloworld"),
string repetition with "x" ("foo" x 3 = "foofoofoo", (105 + 40) x 2 = "145145").
Scalar
variables are declared with a "$", the label is case sensitive
and can use letters, digits and hyphens and can be any length.
$a = 25;
$some_long_STRING = "hello world\n";
$Some_long_sTRING
= "foobar";
Scalar variable operators include pre and postfix auto increment and decrement (++$a
--$a
$a++ $a--) and the unary math operators (*= += -= and even /= if you want,
with $a += 3 equivalent to $a = $a + 3). Finally, two string related variable
operators that are useful are chop() and chomp(). chop( )
takes a string and removes the last character and returns this character
as a string. Chomp() is similar, but only removes the last character if
it is a newline, so:
$x = "hello world\n\n";
chop($x); # $x = "hello world\n"
chomp($x); # $x = "hello world"
$last = chop($x); # $last = "d"
# $x = "hello worl"
chomp($x);
# $x still equals "hello worl"
Different Strings
Finally
just like other shells, perl supports both interpolated and literal strings.
Interpolated
uses double quotes ("") and all variables are interpolated.
Literal
uses single quotes ('') and everything is printed out literally except
for "\'" which prints a single quote and "\\" which prints a "\".
Other
useful escape sequence characters:
newline
"\n", tab "\t", backspace "\b", CTRL-D "\cD".
What happens to undefined variables?
Scalars
without values are undefined and appear as 0 when treated as a number
and an empty string ("") when treated as a string.
Arrays
Array
variables ordered lists of scalars.
Array variables are declared with a @. Array counting begins with element 0. This is because you can access the last element by referencing -1.
@a
= ( "apples", "oranges", "pears" );
or
more simply:
@a
= qw(apples oranges pears);
With
lists of numbers, you can use list constructors:
@a = (2 .. 4); # (2,3,4)
@a
= (2.3 .. 5.1); # (2.3,3.3,4.3)
Array
assignment is allowed (@a = @b) and the resulting array is an exact copy
of the original array. And individual elements can be scalar variables
( @a = $a + $b ).
Other
array operations:
$a = @foo; # $a is the number of elements in @foo
($a)
= @foo; # $a is first element of @foo
You
access array elements by it's integer elements:
@foo = qw(high bye sigh);
$a
= $foo[1]; # $a = "bye"
Some
array functions include reverse():
reverse(@foo);
#@foo = ("sigh","bye","high")
sort(),
which orders strings in ascending ASCII order, and chomp() which
will trim the trailing newlines of every element of the array.
Control Structures
Simplest
control structure is the statement block:
{
statement;
statement;
statement;
}
if/else
statement
if (some condition)
statement block
else
statement
block
unless (some condition)
statement block
while
statement
while (some condition)
statement
block
until (some condition)
statement block
for
statement
for (initial expression; test expression; change expression)
statement
block
but
this is just a controlled while loop in disguise:
initial expression;
while(test expression)
{
statement;
statement;
change expression;
}
Perl
also has C-shells foreach statement if you like:
foreach $scalar (@array)
statement
block
Boolean values
Simply
put, any sclar variable, or function that returns a scalar value, can be
used in a test condition. If the value converts to an integer "0" (so 0.0
is true!) or is undefined or is an empty string, then it is "false", everything
else is true.
Breaking out of loops
Suppose
you want to have multiple conditions to break out of a loop. Use an if
statement coupled with a last statement:
while (some condition)
{
some statement;
some statement;
if (other condition)
{ last; }
}
#
if other condition is met, you'll end up here!
the
last statement can also be used with a label:
while (some condition)
{
some statement;
some statement;
if (other condition)
{ last OVER_HERE; }
}
some statement;
OVER_HERE:
some statement; # you'll end up here!
But
what if you only want to break out of the current iteration? Use the next
statement. Want to reevaluate the current iteration? Use the redo
statement!
A
while loop that doesn't use "while"!
{
if (some condition)
{ last; }
some statement;
some statement;
redo;
}
The simplest control statements
If
you are lazy, you can do the following:
this && that; #"this" gets executed, but "that"
# only gets evaluated if "this" is
#
true.
So
all three of these statements are identical!
if (this) { that;}
that if this;
this
&& that;
And
unless can be replaced with ||:
unless (this) { that; } # can be replaced with...
this
|| that;
Functions
Commonly
used code can be put in a function:
sub function
{
statement;
statement;
return $scalar; # or @array, or nothing
}
You
can access a function as a simple expression:
function();
or
as part of a larger expression (if there is a return value, otherwise the
return value will be undefined...):
$a = function() + 45;
if
(function())...
Function parameters
You
can send a list of values to a function and access them in the function
using the @_
array.
$a = "hello";
$b
= "world!";
myprint($a,$b); # could have also said
#
"myprint("hello","world");"
sub myprint {
print $_[0] . " " . $_[1] . "\n";
}
Variable scope
The
special variable @_ can only be used in the function "myprint". Using this
array outside of the function, or in another function, and it will have
an unexpected value (most likely undefined)! You can create variables (scalar
or array) that are local to given scope, by declaring them with the my()
function:
$name
= "tom";
sub
name {
my $name;
$name = "bob";
print $name; # prints "bob"
}
print
$name; # prints "tom"!
Now
we don't have to keep track of a bunch of global variables anymore! You
can make perl alot like C and C++ by including the following statement
at the beginning of all your programs:
use
strict;
now
you have to declare all of your variables using my(), and perl will warn
you when you declare a variable, but don't use it. Want a global variable?
Declare it outside any function and it's scope will be the entire program!
Basic I/O
How do
you read in user input? Use <STDIN>:
$a = <STDIN>; # read next line up to and including the newline
#
$a will be undefined if no more lines.
What to
do if undefined?
while (defined($a = <STDIN>))
{
# do stuff to $a }
Because
perl programmers are lazy, you can shorten this even more. Whenever you
use the input operator ("<...>"), perl automatically copies the current
line to a special scalar "$_". And "$_" can be implied, so you don't even
have to state it!
while(<STDIN>) # or "while(defined($_=<STDIN>))"
{
chomp; # or "chomp($_);"
...
}
What if
you just want to read in everything until an EOL in a buffer? Just copy
it into an array:
@buffer
= <STDIN>;
Command line
"<>"
works on files from the command line, returning one line at a time.
perl "cat":
#!/usr/local/bin/perl
while(<>)
{ print $_;}
invoke just like /bin/cat:
./cat.pl
file1 file2 file3
Output
To print
out use "print", or else use "printf" for formatted output:
printf ("%15s %5d %10.2f\n",$s,$n,$r)
# 15 character string, a space, 5 character integer, a space,
#
10 character float with 2 characters for mantissa, a newline.
Options
for perl printf are identical to those given in printf(3) manpage.
File I/O
<STDIN>
is just a filehandle for standard input. You can create you own file handles
using the open() function.
Open file
for output:
open (OUT, ">output");
print
OUT "This is the output";
What if
"output can't be opened?
unless (open (OUT, ">output")
{
die "Could not open file \"output\""; }
Or just
simply:
open
(OUT, ">output") || die "Could not open file \"output\"";
Open file
for reading:
open(IN, "<input")
while(<IN>)
{ # do something with $_ }
And just
like other shell languages, perl has file test switches for conditionals:
-r: File or directory is readable
-w: File or directory is writable
-x: File is executable
-o: File is owned by user
-e: File exists
-f: File is a plain file
-d: File is a directory
-l: File is a symbolic link
-T: File is ascii (no control characters)
-B: File
is binary
An example:
if( -r "output") { open(OUT, ">output"); }
else
{ print "you are not able to read file \"output\"\n";}
A special
switch:
-s: File exists and has nonzero length, return value is size
(if file doesn't exist or has length of zero, both will return
a value
of zero, and both will evaluate to false)
So:
if( $size = -s $some_file_name)
{
print "File:$some_file_name has $size bytes.\n";
}
else
{ print "Could not open $some_file_name\n"; }
ProcessI/O
You can
open processes for I/O similar to how you open a file.
Open process
for reading:
open(WHO,
"who |");
Open process
for writing:
open(LP,
"| lp -d myprinter");
And thus
to print out everybody who is currently logged in:
while(<WHO>)
{ print LP $_; }
File and Directory manipulation
Directory access
Use
chdir()
function to change current working directory:
chdir("/var/spool")
|| die "cannot cd to /var/spool";
Globbing
The following
will store every file in "/etc" as an entry in an array:
@files
= </etc/*>;
Removing a file
Use
unlink()
(named after the UNIX system call) function:
unlink("output");
unlink("/var/adm/messages");
unlink
<*.o>; # remove all object files in current working # directory
Renaming files
Use
rename()
function:
rename("output","output.old");
Creating a directory
use
mkdir():
mkdir("results",
0777);
Removing a directory
use
rmdir():
rmdir("results")
|| die "could not remove directory \"results\";
Modifying permissions
Use
chmod():
chmod(0666,"output");
# read and write for all!
Regular expressions
If
we were looking for every line in a file that contained the string "perl",
we would invoke grep
as follows:
grep
perl file
Similarily,
if we wanted to check if $_ (the special variable that contains
the last line returned by the input operator ("<...>")) contained the
string "perl", we would use the following conditional:
if (/perl/)
{
print $_; }
A program that behaves like "grep" -> grep.pl:
#!/usr/local/bin/perl -w
$PATTERN = $ARGV[0];
open(FILE,"<$ARGV[1]")
|| die "Could not open $ARGV[1].\n";
while(<FILE>)
{
if (/$PATTERN/)
{
print $_;
}
}
Patterns
If we wanted
to match all strings containing 4 letter words starting with "p" and ending
with "l", use:
if (/p..l/)
"." matches
any single character except for the newline ("\n").
If we wanted
to match all strings that contain a word starting with "p" and ending with
an "l" and of undetermined size, use:
if (/p.*l/)
"." matches ANY single character.
"a" matches a single letter "a".
".*" matches ANY character and any number of them (including no characters!).
"b*" matches zero or more consecutive "b"'s
".+" matches ANY character and any number of them, but at least one.
"c+" matches one or more consective "c"'s
".?" matches ANY character and either one or zero of them.
"\>?" matches
nothing or a single greater than symbol.
What if you
want to match 0 to 5 consecutive "%"'s?
if(/\%{0,5}/)
To match 4 to
6 consecutive "G"'s:
if(/G{4,6}/)
To match at
least 3 "-"'s:
if(/\-{3,}/
So "*" can be
replaced with "{0,}", "+" with "{1,}", and "?" with "{0,1}"!
To match a string
containing a word starting with "p", followed by a single lowercase vowel
and ending with "rl", use:
if(/p[aeiou]rl/)
The converse
of this is a string starting with a "p", followed with anything but a lowercase
vowel and ending with "rl", and is tested for as follows:
if(/p[^aeiou]rl/)
# will match "pErl", but not "perl"!
To match the
string "perl" followed by a single digit, we could use:
if(/perl[0123456789]/)
Or more simply:
if(/perl[0-9]/)
To match "perl"
followed by anything but a digit:
if(/perl[^0-9]/)
How would you
match a hex string (as defined in perl) of non determinate length?
Word boundaries
What
if we wanted to match "fred", but not "frederick"? Use "\b" to indicate
where a word should start or end:
if(/fred\b/)
# matches "fred" or "alfred", but not "frederick"
And:
if(/\bfred/) # matches "fred" or "frederick", but not "alfred"
if(/\bfred\b/)
# matches only "fred"!
Conversly,
if we wanted to match "frederick" but not "fred", use "\B" to indicate
where a word shouldn't start or end:
if(/fred\B/) # matches "frederick", but not "fred" or "alfred"
if(/\Bfred/) # matches "alfred", but not "fred" or "frederick"
if(/\Bfred\B/) # matches "alfrederick", but not "fred", "alfred",
#
or "frederick"
What
if we wanted to only match words ending in "fred", but not starting or
containing fred?
Matching on a variable besides "$_"
So
far we have only matched patterns implicitly on the "$_" scalar variable.
"~=" is a scalar operater that works on a scalar and a regular expression
and returns a boolean value depending on whether the scalar contains the
regular expression or not. So:
if(/perl/)
Could
be rewritten as:
if($_
=~ /perl/)
And
if we wanted to check if scalar $a contains the string "perl":
if($a
=~ /perl/)
A replacement for "sed": substitutions
So
you've found your scalar, now what? If you want to replace it with something
else, use the "s" operator:
$a = "sed";$b = "awk";
$a =~ s/sed/perl/; # $a now contains "perl";
$b
=~ s/sed/perl/; # $b still contains "awk";
What
happens with the following code?
$a = "sed sedative malposed horsedom";
$a
=~ s/sed/perl/;
Does
$a now contain "perl perlative malpoperl horperldom"? No, the "s" operator
will only catch and replace the first instance of "sed". To replace all
instances of "sed", we must also use the "g" operator:
$a
=~ s/sed/perl/g; # $a now contains "perl perlative malpoperl horperldom"
A replacement for awk: split() (and join())
Split
is a function that takes a regular expression and a scalar, and returns
a list. Split searches for all occurances of the regular expression, and
returns everything that doesn't match as a list. The following code splits
up a scalar containing a phrase of words, and returns a list of each parsed
out word:
$phrase = "The rain in spain, falls mainly on the spainards";
@words = split(/ /,$phrase);
#
@words is ("The","rain","in","spain,","falls","mainly"...)
The
converse of split is join which takes a scalar and a list
and "glues" together each element of the list with the scalar and the results
are returned as a scalar. Thus if we wanted to take our newly split up
array @words and glue it back to with pound signs ("#"):
$phrase = join("#",@words);
#$phrase
is "The#rain#in#spain,#..."
Of
course, we could have done the above five lines of code (I'm counting comments
as lines of code) with a simple global substitution:
$phrase = "The rain in spain, falls mainly on the spainards";
$phrase
=~ s/ /#/g;
The
following code reads in a line at a time from the password file, and associates
every user id (first field in /etc/passwd entry) with it's corresponding
GECOS string (5th field in a password entry):
open(PASSWORD, "</etc/passwd");
while(<PASSWORD>) # or "while $_ = <PASSWORD>"
{
@fields = split(/:/); # where's the scalar?!?
print "$fields[0] account is owned by $fields[4].\n";
}
close(PASSWORD);
# Close files when they aren't needed anymore!
Perl Lab
1. Write a "grep" that straddles lines.
Normal
grep takes a regular expression and a file, and returns any line that contains
the regular expression. But what to do if your editor has automatic word
wrapping (and maybe even hyphenation!)? Write a version of grep that checks
if the regular expression straddles two consecutive lines (don't forget
to discard trailing newlines and hyphens!) and returns both lines if the
pattern straddles them, but only the line the pattern is contained in if
it doesn't straddle.
2. Write a /etc/passwd user shell utility.
/etc/passwd
is a file that contains among other things, the user shell for each user
id (the shell is the seventh field). Write a script that reads /etc/passwd,
and lists each shell (minus the absolute path for it,
HINT: this
will require a split() followed by another split()) for each user. If the
user doesn't have a valid shell (7th field in the password file is nothing
or is a directory), mention that. So for an /etc/passwd file that looks
like:
root:x:0:1:Super-User:/:/sbin/sh
daemon:x:1:1::/:/usr/bin/true
mfroomin:x:1001:10:Marty Froomin:/export/home/froomin:/bin/sh
wleung:x:1002:10:Winnie
Leung:/export/home/wleung:/bin/csh
The
output should be:
User root uses "sh".
User daemon uses "true".
User mfroomin doesn't have a valid shell: directory!
User wleung doesn't have a valid shell: undefined!