Perl Primer

First perl program:
 

 

#!/usr/bin/perl -w

print ("Hello, world!\n");
 

 

Introduction

Perl is the "Practical Extraction and Report Language". System administrators typically spend a good amount of time writing reports based on data taken from various utilities, so why not have a language that aids in this? Bourne and Korn shell are good for running processes in batch, but their main weakness for system administration tasks is that they lack any facility for pattern matching, string replacement and generating reports. Instead, they must rely on external utilities to perform these tasks, grep(get regular expression), sed(stream editor), and awk(text formatting utility named after it's co-creators initials). The problem with this, is that in non-heterogeneous environments (environments not running identical OSes on fairly identical hardware), the behavior of these utilities changes from vendor to vendor as some of them add proprietary changes. Also, these languages are dependent on the shell's executable search path containing all these utilities. Since the majority of the world is a non-heterogeneous environment, this is indeed a problem for automating tasks in a distributed manner for most system administrators. Other weaknesses include no recursing functions, lack of dynamic data structures, and a limit to the size of the data structures you do have!
 

 

Before you start

The very first line of our very first perl program should look fairly familiar. It informs the UNIX loader to run the contents of the file under the named program (the absolute path listed in the example may not reflect the path on your system!), and unlike other interpreted languages, perl allows command line switches when invoked in this manner. And indeed, in our first case we have a switch, "-w", which indicates to the perl interpreter to produce more warning messages about the syntax of your program.
 

 

Data

The simplest data type perl works with is the scalar data type. A scalar is either a number (integers or floating point, perl calculates all numbers as double-precision floating point numbers), or a string of characters (and binary characters are okay too). "Hello, world!", "Hello\nworld\n", "H", "6", "5.45", "-2.33e-17" are all scalars, and hex (ex. 0xef) and Octal (0644) values are valid also.
 

 

Perl provides typical math operators (+-/*), exponation (**), and modulus (%, floating points are reduced to integers (5.7 becomes 5, 10,000**-7 becomes 0, etc.), modulus provides "remainder"). Logical comparison operators are < <= == >= > !=, but these can only be used to compare numbers, lt le eq ge gt ne are the respective string comparison operators. But numbers can use the string operators, so while 45 > 5 evaluates to true, 45 gt 5 evaluates to false since 45 alphabetically less than 5. Other string operators include string concatenation with "." ("hello" . "world" = "helloworld"), string repetition with "x" ("foo" x 3 = "foofoofoo", (105 + 40) x 2 = "145145").
 

 

Scalar variables are declared with a "$", the label is case sensitive and can use letters, digits and hyphens and can be any length.
 

 

$a = 25;

$some_long_STRING = "hello world\n";

$Some_long_sTRING = "foobar";
 

 
 
 

Scalar variable operators include pre and postfix auto increment and decrement (++$a

--$a $a++ $a--) and the unary math operators (*= += -= and even /= if you want, with $a += 3 equivalent to $a = $a + 3). Finally, two string related variable operators that are useful are chop() and chomp(). chop( ) takes a string and removes the last character and returns this character as a string. Chomp() is similar, but only removes the last character if it is a newline, so:
 

 

$x = "hello world\n\n";

chop($x); # $x = "hello world\n"

chomp($x); # $x = "hello world"

$last = chop($x); # $last = "d"

# $x = "hello worl"

chomp($x); # $x still equals "hello worl"
 

 

Different Strings

Finally just like other shells, perl supports both interpolated and literal strings.
 

 

Interpolated uses double quotes ("") and all variables are interpolated.
 

 

Literal uses single quotes ('') and everything is printed out literally except for "\'" which prints a single quote and "\\" which prints a "\".
 

 

Other useful escape sequence characters:
 

 

newline "\n", tab "\t", backspace "\b", CTRL-D "\cD".
 

 
 
 
 
 

What happens to undefined variables?

Scalars without values are undefined and appear as 0 when treated as a number and an empty string ("") when treated as a string.
 

 

Arrays

Array variables ordered lists of scalars. Array variables are declared with a @. Array counting begins with element 0. This is because you can access the last element by referencing -1.
 

 

@a = ( "apples", "oranges", "pears" );
 

 

or more simply:
 

 

@a = qw(apples oranges pears);
 

 

With lists of numbers, you can use list constructors:
 

 

@a = (2 .. 4); # (2,3,4)

@a = (2.3 .. 5.1); # (2.3,3.3,4.3)
 

 

Array assignment is allowed (@a = @b) and the resulting array is an exact copy of the original array. And individual elements can be scalar variables ( @a = $a + $b ).
 

 
 
 

Other array operations:
 

 

$a = @foo; # $a is the number of elements in @foo

($a) = @foo; # $a is first element of @foo
 

 

You access array elements by it's integer elements:
 

 

@foo = qw(high bye sigh);

$a = $foo[1]; # $a = "bye"
 

 

Some array functions include reverse():
 

 

reverse(@foo); #@foo = ("sigh","bye","high")
 

 

sort(), which orders strings in ascending ASCII order, and chomp() which will trim the trailing newlines of every element of the array.
 

 

Control Structures

Simplest control structure is the statement block:
 

 

{

statement;

statement;

statement;

}
 

 

if/else statement
 

 

if (some condition)

statement block

else

statement block
 

 

unless (some condition)

statement block

while statement
 

 

while (some condition)

statement block
 

 

until (some condition)

statement block

for statement
 

 

for (initial expression; test expression; change expression)

statement block
 

 

but this is just a controlled while loop in disguise:
 

 

initial expression;

while(test expression)

{

statement;

statement;

change expression;

}
 

 

Perl also has C-shells foreach statement if you like:
 

 

foreach $scalar (@array)

statement block
 

 
 
 

Boolean values

Simply put, any sclar variable, or function that returns a scalar value, can be used in a test condition. If the value converts to an integer "0" (so 0.0 is true!) or is undefined or is an empty string, then it is "false", everything else is true.
 

 

Breaking out of loops

Suppose you want to have multiple conditions to break out of a loop. Use an if statement coupled with a last statement:
 

 

while (some condition)

{

some statement;

some statement;

if (other condition)

{ last; }

}

# if other condition is met, you'll end up here!
 

 

the last statement can also be used with a label:
 

 

while (some condition)

{

some statement;

some statement;

if (other condition)

{ last OVER_HERE; }

}

some statement;

OVER_HERE: some statement; # you'll end up here!
 

 
 
 

But what if you only want to break out of the current iteration? Use the next statement. Want to reevaluate the current iteration? Use the redo statement!
 

 
 
 
 
 
 
 
 
 

A while loop that doesn't use "while"!
 

 

{

if (some condition)

{ last; }

some statement;

some statement;

redo;

}
 

 
 
 
 
 

The simplest control statements

If you are lazy, you can do the following:
 

 

this && that; #"this" gets executed, but "that"

# only gets evaluated if "this" is

# true.
 

 
 
 
 
 
 
 

So all three of these statements are identical!
 

 

if (this) { that;}

that if this;

this && that;
 

 
 
 
 
 
 
 

And unless can be replaced with ||:
 

 

unless (this) { that; } # can be replaced with...

this || that;
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Functions

Commonly used code can be put in a function:
 

 

sub function

{

statement;

statement;

return $scalar; # or @array, or nothing

}
 

 
 
 
 
 
 
 

You can access a function as a simple expression:
 

 

function();
 

 
 
 
 
 
 
 

or as part of a larger expression (if there is a return value, otherwise the return value will be undefined...):
 

 

$a = function() + 45;

if (function())...
 

 
 
 
 
 
 
 
 
 

Function parameters

You can send a list of values to a function and access them in the function using the @_ array.
 

 

$a = "hello";

$b = "world!";
 

 

myprint($a,$b); # could have also said

# "myprint("hello","world");"
 

 

sub myprint {

print $_[0] . " " . $_[1] . "\n";

}
 

 
 
 
 
 
 
 
 
 
 
 
 
 

Variable scope

The special variable @_ can only be used in the function "myprint". Using this array outside of the function, or in another function, and it will have an unexpected value (most likely undefined)! You can create variables (scalar or array) that are local to given scope, by declaring them with the my() function:
 

 

$name = "tom";
 

 

sub name {
 

 

my $name;

$name = "bob";

print $name; # prints "bob"

}
 

 

print $name; # prints "tom"!
 

 

Now we don't have to keep track of a bunch of global variables anymore! You can make perl alot like C and C++ by including the following statement at the beginning of all your programs:
 

 

use strict;
 

 

now you have to declare all of your variables using my(), and perl will warn you when you declare a variable, but don't use it. Want a global variable? Declare it outside any function and it's scope will be the entire program!
 

 

Basic I/O

How do you read in user input? Use <STDIN>:
 

 

$a = <STDIN>; # read next line up to and including the newline

# $a will be undefined if no more lines.
 

 
 
 
 
 

What to do if undefined?
 

 

while (defined($a = <STDIN>))

{ # do stuff to $a }
 

 
 
 

Because perl programmers are lazy, you can shorten this even more. Whenever you use the input operator ("<...>"), perl automatically copies the current line to a special scalar "$_". And "$_" can be implied, so you don't even have to state it!
 

 

while(<STDIN>) # or "while(defined($_=<STDIN>))"

{

chomp; # or "chomp($_);"

...

}
 

 

What if you just want to read in everything until an EOL in a buffer? Just copy it into an array:
 

 

@buffer = <STDIN>;
 

 
 
 

Command line

"<>" works on files from the command line, returning one line at a time.
 

 

perl "cat":
 

 

#!/usr/local/bin/perl

while(<>) { print $_;}
 

 
 
 
 
 

invoke just like /bin/cat:

./cat.pl file1 file2 file3
 

 
 
 

Output

To print out use "print", or else use "printf" for formatted output:
 

 

printf ("%15s %5d %10.2f\n",$s,$n,$r)

# 15 character string, a space, 5 character integer, a space,

# 10 character float with 2 characters for mantissa, a newline.
 

 
 
 

Options for perl printf are identical to those given in printf(3) manpage.
 

 
 
 

File I/O

<STDIN> is just a filehandle for standard input. You can create you own file handles using the open() function.
 

 

Open file for output:
 

 

open (OUT, ">output");

print OUT "This is the output";
 

 

What if "output can't be opened?
 

 

unless (open (OUT, ">output")

{ die "Could not open file \"output\""; }
 

 

Or just simply:
 

 

open (OUT, ">output") || die "Could not open file \"output\"";
 

 
 
 

Open file for reading:
 

 

open(IN, "<input")

while(<IN>) { # do something with $_ }
 

 

And just like other shell languages, perl has file test switches for conditionals:
 

 

-r: File or directory is readable

-w: File or directory is writable

-x: File is executable

-o: File is owned by user

-e: File exists

-f: File is a plain file

-d: File is a directory

-l: File is a symbolic link

-T: File is ascii (no control characters)

-B: File is binary
 

 
 
 

An example:
 

 

if( -r "output") { open(OUT, ">output"); }

else { print "you are not able to read file \"output\"\n";}
 

 

A special switch:
 

 

-s: File exists and has nonzero length, return value is size

(if file doesn't exist or has length of zero, both will return

a value of zero, and both will evaluate to false)
 

 

So:
 

 

if( $size = -s $some_file_name)

{

print "File:$some_file_name has $size bytes.\n";

}

else { print "Could not open $some_file_name\n"; }
 

 
 
 

ProcessI/O

You can open processes for I/O similar to how you open a file.
 

 

Open process for reading:
 

 

open(WHO, "who |");
 

 

Open process for writing:
 

 

open(LP, "| lp -d myprinter");
 

 

And thus to print out everybody who is currently logged in:
 

 

while(<WHO>) { print LP $_; }
 

 
 
 
 
 

File and Directory manipulation

Directory access

Use chdir() function to change current working directory:
 

 

chdir("/var/spool") || die "cannot cd to /var/spool";
 

 
 
 

Globbing

The following will store every file in "/etc" as an entry in an array:
 

 

@files = </etc/*>;
 

 
 
 

Removing a file

Use unlink() (named after the UNIX system call) function:
 

 

unlink("output");

unlink("/var/adm/messages");

unlink <*.o>; # remove all object files in current working # directory
 

 

Renaming files

Use rename() function:
 

 

rename("output","output.old");
 

 

Creating a directory

use mkdir():
 

 

mkdir("results", 0777);
 

 

Removing a directory

use rmdir():
 

 

rmdir("results") || die "could not remove directory \"results\";
 

 

Modifying permissions

Use chmod():
 

 

chmod(0666,"output"); # read and write for all!
 

 
 
 
 
 

Regular expressions

If we were looking for every line in a file that contained the string "perl", we would invoke grep as follows:
 

 

grep perl file
 

 

Similarily, if we wanted to check if $_ (the special variable that contains the last line returned by the input operator ("<...>")) contained the string "perl", we would use the following conditional:
 

 

if (/perl/)

{ print $_; }
 

 

A program that behaves like "grep" -> grep.pl:

#!/usr/local/bin/perl -w
 

 

$PATTERN = $ARGV[0];

open(FILE,"<$ARGV[1]") || die "Could not open $ARGV[1].\n";
 

 

while(<FILE>)

{

if (/$PATTERN/)

{

print $_;

}

}
 

 
 
 

Patterns

If we wanted to match all strings containing 4 letter words starting with "p" and ending with "l", use:
 

 

if (/p..l/)
 

 

"." matches any single character except for the newline ("\n").
 

 

If we wanted to match all strings that contain a word starting with "p" and ending with an "l" and of undetermined size, use:
 

 

if (/p.*l/)
 

 
 
 
 
 
 
 
 
 
 
 
 
 

"." matches ANY single character.

"a" matches a single letter "a".

".*" matches ANY character and any number of them (including no characters!).

"b*" matches zero or more consecutive "b"'s

".+" matches ANY character and any number of them, but at least one.

"c+" matches one or more consective "c"'s

".?" matches ANY character and either one or zero of them.

"\>?" matches nothing or a single greater than symbol.
 

 

What if you want to match 0 to 5 consecutive "%"'s?
 

 

if(/\%{0,5}/)
 

 

To match 4 to 6 consecutive "G"'s:
 

 

if(/G{4,6}/)
 

 

To match at least 3 "-"'s:
 

 

if(/\-{3,}/
 

 

So "*" can be replaced with "{0,}", "+" with "{1,}", and "?" with "{0,1}"!
 

 

To match a string containing a word starting with "p", followed by a single lowercase vowel and ending with "rl", use:
 

 

if(/p[aeiou]rl/)
 

 

The converse of this is a string starting with a "p", followed with anything but a lowercase vowel and ending with "rl", and is tested for as follows:
 

 

if(/p[^aeiou]rl/) # will match "pErl", but not "perl"!
 

 
 
 

To match the string "perl" followed by a single digit, we could use:
 

 

if(/perl[0123456789]/)
 

 

Or more simply:
 

 

if(/perl[0-9]/)
 

 

To match "perl" followed by anything but a digit:
 

 

if(/perl[^0-9]/)
 

 

How would you match a hex string (as defined in perl) of non determinate length?
 

 

Word boundaries

What if we wanted to match "fred", but not "frederick"? Use "\b" to indicate where a word should start or end:
 

 

if(/fred\b/) # matches "fred" or "alfred", but not "frederick"
 

 

And:
 

 

if(/\bfred/) # matches "fred" or "frederick", but not "alfred"

if(/\bfred\b/) # matches only "fred"!
 

 

Conversly, if we wanted to match "frederick" but not "fred", use "\B" to indicate where a word shouldn't start or end:
 

 

if(/fred\B/) # matches "frederick", but not "fred" or "alfred"

if(/\Bfred/) # matches "alfred", but not "fred" or "frederick"

if(/\Bfred\B/) # matches "alfrederick", but not "fred", "alfred",

# or "frederick"
 

 

What if we wanted to only match words ending in "fred", but not starting or containing fred?
 

 

Matching on a variable besides "$_"

So far we have only matched patterns implicitly on the "$_" scalar variable. "~=" is a scalar operater that works on a scalar and a regular expression and returns a boolean value depending on whether the scalar contains the regular expression or not. So:
 

 

if(/perl/)
 

 

Could be rewritten as:
 

 

if($_ =~ /perl/)
 

 

And if we wanted to check if scalar $a contains the string "perl":
 

 

if($a =~ /perl/)
 

 
 
 
 
 

A replacement for "sed": substitutions

So you've found your scalar, now what? If you want to replace it with something else, use the "s" operator:
 

 

$a = "sed";$b = "awk";

$a =~ s/sed/perl/; # $a now contains "perl";

$b =~ s/sed/perl/; # $b still contains "awk";
 

 
 
 

What happens with the following code?
 

 

$a = "sed sedative malposed horsedom";

$a =~ s/sed/perl/;
 

 

Does $a now contain "perl perlative malpoperl horperldom"? No, the "s" operator will only catch and replace the first instance of "sed". To replace all instances of "sed", we must also use the "g" operator:
 

 

$a =~ s/sed/perl/g; # $a now contains "perl perlative malpoperl horperldom"
 

 
 
 

A replacement for awk: split() (and join())

Split is a function that takes a regular expression and a scalar, and returns a list. Split searches for all occurances of the regular expression, and returns everything that doesn't match as a list. The following code splits up a scalar containing a phrase of words, and returns a list of each parsed out word:
 

 

$phrase = "The rain in spain, falls mainly on the spainards";

@words = split(/ /,$phrase);

# @words is ("The","rain","in","spain,","falls","mainly"...)
 

 

The converse of split is join which takes a scalar and a list and "glues" together each element of the list with the scalar and the results are returned as a scalar. Thus if we wanted to take our newly split up array @words and glue it back to with pound signs ("#"):
 

 

$phrase = join("#",@words);

#$phrase is "The#rain#in#spain,#..."
 

 

Of course, we could have done the above five lines of code (I'm counting comments as lines of code) with a simple global substitution:
 

 

$phrase = "The rain in spain, falls mainly on the spainards";

$phrase =~ s/ /#/g;
 

 

The following code reads in a line at a time from the password file, and associates every user id (first field in /etc/passwd entry) with it's corresponding GECOS string (5th field in a password entry):
 

 

open(PASSWORD, "</etc/passwd");

while(<PASSWORD>) # or "while $_ = <PASSWORD>"

{

@fields = split(/:/); # where's the scalar?!?

print "$fields[0] account is owned by $fields[4].\n";

}

close(PASSWORD); # Close files when they aren't needed anymore!
 

 

Perl Lab


 
 

1. Write a "grep" that straddles lines.

Normal grep takes a regular expression and a file, and returns any line that contains the regular expression. But what to do if your editor has automatic word wrapping (and maybe even hyphenation!)? Write a version of grep that checks if the regular expression straddles two consecutive lines (don't forget to discard trailing newlines and hyphens!) and returns both lines if the pattern straddles them, but only the line the pattern is contained in if it doesn't straddle.
 

 
 
 

2. Write a /etc/passwd user shell utility.

/etc/passwd is a file that contains among other things, the user shell for each user id (the shell is the seventh field). Write a script that reads /etc/passwd, and lists each shell (minus the absolute path for it, HINT: this will require a split() followed by another split()) for each user. If the user doesn't have a valid shell (7th field in the password file is nothing or is a directory), mention that. So for an /etc/passwd file that looks like:
 

 

root:x:0:1:Super-User:/:/sbin/sh

daemon:x:1:1::/:/usr/bin/true

mfroomin:x:1001:10:Marty Froomin:/export/home/froomin:/bin/sh

wleung:x:1002:10:Winnie Leung:/export/home/wleung:/bin/csh
 

 

The output should be:
 

 

User root uses "sh".

User daemon uses "true".

User mfroomin doesn't have a valid shell: directory!

User wleung doesn't have a valid shell: undefined!