Fonts--General information
Some Typesetting Terminology
To
specify the appearance of text, you need to specify
· a typeface This is Times New Roman
· a size This is 14 point
· a style This is roman; This is
italic;
This is bold; This is bold italic)
When
all three are specified, you have specified a font. That is, the printer would know which pieces
of lead type he had to use
(in the old days of lead type).
Among
typefaces, we
can distinguish those which use serifs (like this one) and
those which do not (like this one). A serif is a tiny cross-stroke, for instance
at the bottom of an “i” or “m”, or a half cross-stroke, such as at the top of
an “i”.
About
font size: A “printer’s point” is
0.013837 of an inch. Following the point system devised by Pierre Simon
Fournier, it is common practice to approximate a point as 1/72 inch.
Some Computer-specific
Terminology
There
are raster fonts, stroke fonts, True
Type fonts, and Open Type fonts.
I
will explain these one at a time.
Raster fonts are specified
pixel-by-pixel. Each character is given
by
a double array describing which pixels in a “character box” need to be inked in order to produce that
character. In other words, each
character is given by a monochrome bitmap.

Originally
all computer fonts were raster fonts.
But the problem with raster fonts is that they do not scale well. How would we make this character 1.5 times
larger? However we do it, it won’t look
very good. We can scale it to 2,4, or 8 times bigger reasonably well, but not to 2.5 times
bigger or 0.8 times bigger.
One
of the main difficulties this causes is that printed output does not match what
you see on the screen. People want
WYSIWYG output, and raster fonts do not provide it.
Stroke fonts are specified as vector
graphics. That is, a character is
represented by a list of line segments (vectors). These fonts scale well to any
size you like.
The problem with stroke fonts is that they are
slow to display, since the vector graphics has to be converted into pixel
information at run-time.
True Type fonts are meant to solve these
problems. True Type fonts are specified
in vector-graphic form, like stroke fonts, but with an additional improvement: the components can be not only line segments,
but either a line or a curve. The curves
are specified by some points and some tangent lines. (They are Bezier splines
if you know what that is.) If you have
used Adobe Illustrator you know about such curves. These fonts scale very well (to help them do so, the font file
contains “hints” about how to adjust the curves at different sizes).
But when they are first “loaded”, they are converted
to raster form at the specified size.
These pixel arrays are stored in memory until the font is destroyed, and
the font can then be rendered (drawn on the screen or printed) as fast as a raster font, but with the accuracy of a stroke font.
Note
that in essence, this
amounts to rendering a stroke font on a memory DC and then bit-blting it to the screen as required. But no effort on the programmer’s part is
required to make this happen. Windows
takes care of it, in
the operating system, and no doubt takes care to maximize the efficiency of
these operations.
Nowadays, enough True Type
fonts are available that a good rule of thumb is that you should use only
True Type fonts in your programs.
Open Type fonts are like True Type fonts
except that they can also use PostScript definitions of the characters. You don’t need to think about this--just
treat Open Type fonts like True Type fonts.
Selecting Available Fonts
When
you write a program,
you are able to select appropriate fonts to display information
to the user. However, you must consider whether the font you
want to use will be available on your user’s computer.
You
have two choices:
1. Limit yourself to the fonts
that come with every copy of Windows (in the target natural language).
2. Supply the fonts when you
distribute your program. (Or, in the case
of non-commercial programs,
tell the user where s/he can download the fonts.)
In
most cases, choice
1 is the way to go.
Glyphs
Technically,
we should distinguish between a character,
such as “a”, and
a glyph, which is the concrete
representation of a character in a specific font (with typeface, style, and
size). What is stored in a raster font
is a glyph as bitmap. What is stored in
a True Type font is information from which glyphs can be constructed when the
font is loaded. That is, the glyph is
specified by the stored information, but the glyph itself is actually
constructed only at run-time.
Fonts and Character Sets
There
is often some confusion about the difference.
A character set is an assignment of characters to numbers, for example the ISO-LATIN1 character set assigns certain characters to
the numbers 0-255, and is the character
set in use by default in English and most Western European language versions of
Windows. The numbers 0-127 are assigned
according to the ASCII code, and the the numbers
128-255 contain the accented and special characters required for all Western
European languages except Greek. Thus
all the fonts you see available in the font selection box of Microsoft Word
(except the Symbol font) use the same character set. The symbol font uses a different character
set to produce mathematical symbols.
Note, however, that the numbers are still in the range 0-255. So the actual character produced by rendering
“character number 137”
will depend on what character set
the selected font is based on.
This
picture shows an application in Windows Accessories, called Character Map, which allows you to
inspect the character set of each installed font. 
Unicode
If
you want to use Chinese, Japanese, Korean, Hindi, or other Asian languages, you
will need two bytes per character. Unicode is a standard which assigns a two-byte
number to every character in every known human language (ancient and
modern). Now if we actually want to
print in Urdu, we will need a font that embodies a certain character set
containing the Urdu characters. Font
development lagged for some years after the definition of the Unicode standard,
but now fonts do exist with appropriate character sets for many languages. Windows 95 and 98 were not fully able to use
Unicode; but Windows 2000, NT 4.0, and XP are able to use Unicode. It is still not quite easy to write programs
using Unicode, however.
Character Sets Used by Fonts
All
fonts use a character set. A character set contains punctuation marks,
numerals, uppercase and lowercase letters, and all other printable characters.
Each element of a character set is identified by a number.
Most
character sets used in are supersets of the U.S. ASCII character set, which
defines characters for the 96 numeric values from 32 through 127. There are
five major groups of character sets:
·
Windows
·
Unicode
·
OEM (original equipment manufacturer)
·
Symbol
·
Vendor-specific
Windows
Character Set
The
Windows character set is the most commonly used character set in Win32
programming. It is essentially equivalent to the ANSI character set. The blank
character is the first character in the Windows character set. It has a
hexadecimal value of 0x20 (decimal 32). The last character in the Windows
character set has a hexadecimal value of 0xFF (decimal 255).
Many
fonts specify a default character. Whenever a request is made for a character
that is not in the font, the system provides this default character. Many fonts
using the Windows character set specify the period (.) as the default
character. TrueType and OpenType fonts typically use
an open box as the default character.
Fonts
use a break character called a quad to separate words and justify text. Most
fonts using the Windows character set specify that the blank character will
serve as the break character.
OEM
Character Set
The
OEM character set is typically used in full-screen MS-DOS® sessions for screen
display. Characters 32 through 127 are usually the same in the OEM, U.S. ASCII,
and Windows character sets. The other characters in the OEM character set (0
through 31 and 128 through 255) correspond to the characters that can be
displayed in a full-screen MS-DOS session. These characters are generally
different from the Windows characters.
Symbol
Character Set
The
Symbol character set contains special characters typically used to represent
mathematical and scientific formulas.
Vendor-Specific
Character Sets
Many
printers and other output devices provide fonts based on character sets that
differ from the Windows and OEM sets — for example, the Extended Binary Coded
Decimal Interchange Code (EBCDIC) character set. To use one of these character
sets, the printer driver translates from the Windows character set to the
vendor-specific character set.
Kerning and String Width
In ancient
days (twenty years ago), fonts were “fixed-width” or “monospaced”. This paragraph is written in a monospaced
font.
Notice that
there is a lot of space around an “i” and that “m” takes the same amount of
space as does “i”.
Now
look at this sentence, which is not monospaced: a clear improvement.
But
there are certain letter combinations that still would look bad if each
character were just printed one after another.
For example,
“ij” or “ff”. Note carefully that the curly tail (“descender”)
of the j goes under the i. Note that the curly top of the first f
intrudes on the character box of the second f.
This
is called kerning. It is accomplished by having a table of
pairs of characters telling what space adjustment is required when that pair
occurs as adjacent characters. Of
course, this kerning table is font-dependent, and is determined by the font
designer and stored with the character designs as part of the font.
Why
do you as a programmer have to know this typesetting detail (that is, kerning)? Because:
the width of a string is not equal to the sum of the widths of the
characters, due to kerning.
That
is why the FCL provides the MeasureString function, a member of the Graphics class, which can take a String and return the width (and height) required to render it in
the currently selected font.
There
are other functions which can get you the “average character width” of a font, but
you cannot get the width of a string
by multiplying the average character width by the number of characters. It won’t even be a useful approximation. This is not because of kerning, but because most
fonts are not monospaced. But because of kerning, you can’t even get it by
adding up the widths of the individual characters.
Thus: you can’t know how much space a string will
take to display until you have a Graphics
object and call MeasureString.
Character Height
The
concept of “font height” is fairly complex:

tmInternalLeading is for accents and
diacritical marks.
tmExternalLeading is for the interline
spacing.
tmHeight goes from the lowest descender to the top of the
M,
and
a little bit beyond--it includes the internal leading space.
The
total space between lines, the baseline-to-baseline distance,
is
equal to tmHeight + tmExternalLeading.
The
maximum ascent and descent are different from the typographic ascent and
descent. In TrueType and OpenType fonts, the
typographic ascent and descent are typically the top
of the "f" glyph and bottom of the "g" glyph.
Some
manual entries mention “cell height”, which is tmHeight.
Some
manual entries mention “character height” which is tmHeight
- tmInternalLeading.
Here’s
an example of output produced by various fonts and careful attention to the
size and placement of text:

All
this output is done character-by-character, switching fonts from italic to
roman to symbol, and calculating the coordinates for each character to be
printed.