Software 
Uniqueness: How and Why 
 
Puneet Mishra and Mark Stamp
Department of Computer Science
San Jose State University
San Jose, California
  
 
ABSTRACT: In this paper we consider some of the possible security implications of unique software. We are interested in the case where each instance of a piece of software is functionally equivalent, but the internal structure is as distinct as possible. The potential security benefits of such software include improved resistance to viruses and an increased difficulty of certain types of attacks based on reverse engineering the code. The reverse engineering problem is particularly relevant in the field of digital rights management (DRM) and, more generally, whenever we must hide a secret in software. We discuss secret hiding and other possible security applications for unique code and we briefly consider the use of this type of uniqueness as a software watermark. We then present a partial taxonomy (in summary form) of several types of code transformations and we briefly discuss the problem of automatically generating unique instances of code using these transformations. We also mention several research topics and future work.
Keywords: software uniqueness, digital rights management, DRM, virus protection, digital watermark, key hiding
1. Introduction
In this paper we discuss code transformations that can be applied to arbitrary C [1] or C++ [2] source code to yield distinct instances of the input code. We require that the resulting distinct copies are functionally equivalent to the original code. Our goal is to make instances of the transformed code as unique as possible. If instances of the code are sufficiently distinct, we expect that the work required to reverse engineer N instances will be linear in N.
To illustrate the potential usefulness of unique software, consider the following simple example. Suppose we have software that is used to authenticate a user. Regardless of the authentication method employed, the result is ultimately a 1-bit answer, say, 1 for “success” and 0 for “failure.” An attacker can reverse engineer this authentication software and determine where the software makes this 1-bit comparison. Then by simply setting this comparison bit to “1”, the attacker can gain access regardless of the authentication information supplied.
Now if each instance of this authentication software is identical, then an identical 1-bit attack can be applied to each instance (i.e., set the comparison bit to 1). On the other hand, if each instance of the software were distinct (though functionally equivalent), an attack on one instance of the software would only apply to that particular instance. An attack on another instance of the software would require another reverse engineering effort.
Perhaps after reverse engineering multiple instances of the software the attacker could gain enough insight into the transformation process to develop a general attack that would succeed against any instance of transformed software. However, this is almost certainly a far greater challenge than reverse engineering a single instance of the software. Provided that this is the case, we have increased the work required by an attacker to break an arbitrary instance of the software. Of course, the precise level of difficulty depends on the way that the uniqueness occurs.
The approach being advocated here is to make each instance of a particular piece of software unique. Note that the goal is not to prevent an individual instance of the software from being compromised. Instead, the goal is to have each instance of the software be a distinct reverse engineering problem for an attacker. In order for this approach to succeed, we propose to develop a large collection of elementary transformations and apply these in sufficiently random combinations so that each instance of code is unique with a high probability. Furthermore, the uniqueness must be variable enough so that diagnosing the uniqueness process from a number of reverse engineered instances of the code is extremely difficult. Since we are applying transformations to source code, we must also insure that the transformations will survive an optimizing compiler.
We have selected 
the C language as the language in which these transformations will be 
applied. One reason for selecting C is that it is inherently procedural. 
The use of C is not a significant restriction, since similar transformations 
could be developed for other programming languages. However, from a 
security standpoint, it would be preferable to develop these transformations 
in assembly code [3]. For simplicity and clarity, we have chosen a high-level 
language. In the future, we intend to extend this work to the assembly 
language case. 
Plagiarism 
detection tools provide one possible method for testing the robustness 
of uniqueness transformations. We could apply such a test to several 
unique instances of a particular piece of code. Another interesting 
code comparison technique compares programmer’s style and uses this 
as the basis for identifying programs written by a particular individual 
[4]. There are many other possible tests for program similarity.  
Program equivalence 
is known to be an NP-hard problem. However, we are claiming that each 
of our transformations produces the same output for any given input 
as the parent code would produce. Thus, we only need to verify that 
the individual transformations result in code that is functionally equivalent 
to the original code. If this is the case then the program equivalence 
issue can be avoided. 
2. Potential Applications
In this section 
we discuss some of the possible applications of program uniqueness. 
The listed applications are not intended to be exhaustive. In fact, 
we view uniqueness as an extremely general tool that has potential applications 
in many areas of computer security. (See, for example, [5] for a discussion 
of similar ideas in the context of operating systems.) 
2.1 Key Protection
A good candidate 
application for uniqueness is software that stores a key or other secret 
information. In a typical (non-unique) scenario if an attacker manages 
to extract the key from one instance of the software, the same attack 
can be used to extract the key from any other instance of the software. 
However, if each copy of the software derives from distinct source code─which 
results in significantly different executable code─it would make an attack on any instance 
of the software nearly as difficult as the original attack. 
2.2 Digital Rights Management
Digital Rights 
Management (DRM) is an important technology being promoted by major 
software companies such as Microsoft®. Software uniqueness has been touted 
as a useful ingredient in software-based DRM [6]. 
Recently, several 
DRM systems have failed dramatically when exposed to attacks (see [6] 
for a discussion of failed DRM products from Adobe® and Microsoft®, among others). None of these failed 
systems employed any uniqueness. Consequently, hackers who were able 
to reverse engineer a single instance of the software were assured that 
any attack they developed would apply universally to all instances of 
that particular product.  
If, on the 
other hand, a DRM system employs software uniqueness, it is highly unlikely 
that a single reverse engineering effort will be sufficient to break 
all instances of the software. Consequently, it is unlikely that an 
attacker who breaks one instance of such DRM software will have broken 
the entire system. Furthermore, it is likely that the work factor required 
to break the entire system will be dramatically higher than in the non-unique 
case. Of course, this same argument holds for many other software systems. 
But the implications in the DRM field are perhaps more readily apparent. 
Any DRM system designed to enforce copyright that is prone to systemic 
failure─as 
opposed to sporadic losses due to attack─is not likely to be trusted to protect 
valuable digital content. 
As an aside, 
we mention the Trusted Computing Platform Association (TCPA) [7], which 
is an effort by several leading IT companies to develop a hardware-based 
DRM solution. Presently, there is a rancorous debate over the privacy 
and fair use implications of such technology [8]. If DRM cannot be made 
more robust in software, it seems likely that the TCPA effort will eventually 
succeed since many powerful economic forces are pushing the technology; 
see [6] and [9] for much more information on this and related DRM issues. 
2.3 Software Watermarks
Unique code 
could also serve as a watermark to distinguish versions of the code. 
In principle, we could create a unique version of a piece of software 
based on some key. Then if a pirated binary version of the code were 
discovered, we could read the watermark to determine the source. In 
our scheme, the uniqueness is inserted at the source code level but 
we would like to read the watermark from the object code. Consequently, 
there are issues to be resolved. We are currently actively investigating 
this possible use of software uniqueness. 
Another potentially 
useful watermarking idea is to insert arbitrary values at certain points 
in unique instances of code. These unique values could serve as a runtime 
watermark. For example, at various points we could compute a hash of 
the watermark values, which would result in a unique identification 
key for a program. Of course, a similar thing can be done in the non-unique 
case. But in the unique case the identifier would be more robust, since 
each instance of the software would require a distinct watermark removal 
attack. 
2.4 Code Obfuscation
Code obfuscation 
is a technique used to make the reverse engineering of software more 
difficult [10]. For example, this technique could be used to make it 
more difficult for competitors to easily gaining an insight into an 
application. Obfuscation could also be employed in an effort to prevent 
malicious users from easily understanding or modifying an application. 
Note, however, that obfuscation does not imply uniqueness, since a non-unique 
piece of software could be highly obfuscated. On the other hand, our 
usage of uniqueness is intended to provide a degree of obfuscation and 
therefore can be viewed as yet another obfuscation technique. Also, 
when applying uniqueness transformations, it would be reasonable to 
further obfuscate each unique versions of software by applying obfuscating 
transformations, such as those discussed in [10]. 
2.5 Anti-Piracy
The software 
industry has been battling piracy for many years. It is estimated that 
in 1998 the U.S. software industry lost over $2.9 billion in sales within 
the U.S. and $11 billion internationally due to software theft [11]. 
This number has undoubtedly grown in the last few years making the stakes 
even higher in the field of anti-piracy technologies. 
Tamper resistant 
software (TRS) [12,13,14] is one idea that has emerged from research 
in anti-piracy. A TRS system employs a variety of techniques that cause 
an application to crash or otherwise misbehave if it detects that someone 
is trying to analyze the code while it is executing. Uniqueness can 
potentially improve the robustness of such systems by making attacks 
on the TRS system itself more difficult to duplicate. 
2.6 Virus Protection
A computer 
virus─analogous 
to a biological virus─is a malicious program that creates 
havoc on the host computer and replicates itself in order to attach 
itself to other programs. The goal of the virus is to infect other hosts.  
A computer 
virus can display polymorphism [15], i.e., it can evolve as it spreads. 
Typically, anti-virus software scans for viruses by using an elementary 
pattern matching approach. Polymorphic viruses attempt to avoid detection 
by changing their pattern when they replicate. Thus, as in a natural 
biological system, the “genetic diversity” of the virus population 
increases. Software uniqueness offers the potential to employ genetic 
diversity to defend against viruses. A virus will usually attach itself 
to a specific position in the code it is attacking. If the software 
under attack has no genetic diversity (as is nearly universally the 
case today) then the virus will be able to attack every instance of 
the software in precisely the same manner. However, if the software 
under attack displays genetic diversity (i.e., uniqueness), then the 
virus cannot simply attach itself to a specific point in the software 
since its desired entry point will appear at a different point─and 
in a different form─in each instance of the software. In 
this situation, the virus has a much more difficult task to infect a 
particular piece of software. It is likely that a large fraction of 
instances of unique software will have a natural immunity to any given 
virus. The analogy to biological systems is substantial and bears further 
investigation. 
3. TRANSFORMATIONS
In this section we present a brief list of transformations that can be applied to C code to obtain unique versions of code. These transformations are merely a representative sample of the possible transformations and are not intended to provide an exhaustive list. The transformations have been classified into categories based on their functional similarity and complexity. For each of these categories we list several examples of representative transformations.
III. COMPILER BASED
Compilers are 
a critical element in the implementation of source code transformations. 
Many potential source code transformations would be removed in the compilation 
process.  Such transformations would have no effect on the resulting 
binary code. This would, of course, defeat the purpose of the code transformations. 
Ideally, we would write our own compiler that only performs optimizations 
that do not affect our desired transformations. Such a compiler would 
be the logical location to insert the transformations as well. 
IV. FLOW AND STRUCTURE
V. Miscellaneous
For example, insertion of operating 
system calls at various points in the code could be used to add uniqueness. 
The negative aspect of this technique is that it would make the resulting 
code platform dependent.   
4. GENERATING UNIQUE CODE
In this section we discuss 
the issues that must be confronted when trying to automatically generate 
distinct instances of code. A complete implementation of just the limited 
set of code transformations discussed above is a non-trivial task. The 
simpler of these transformations (for example, the insertion of dummy 
code) do provide us with uniqueness. But is not clear which transformations 
(or combination of transformations) are most likely to cause the greatest 
difficulty for a motivated attacker. We discuss this further in the 
next section. 
We are currently developing a tool that will implement a substantial subset of the transformations listed above. Our tool is essentially a parser [16] that looks for particular patterns in the code and inserts a transformation when an appropriate location is found.
4. CONCLUSION
The ideas of software uniqueness discussed in this paper appear to have significant potential utility in many applications, including key hiding software, DRM systems and software watermarking. However, it appears to be difficult to quantify the strength of this approach based solely on theoretical considerations. Consequently, the automated implementation that we are developing will be valuable in making such an assessment. When this tool is completed, we expect that we will be able to produce substantial empirical data to further bolster our claims of the security benefits of unique software. Various methods can then be employed to measure the variability in the resulting unique code and realistic testing can be conducted to determine the difficulty of attacking systems based on such code.
Mark Stamp 
has spent well over a decade working in security. He can neither confirm 
nor deny that he spent seven years as a cryptanalyst at the National 
Security Agency. However, he can confirm that he recently spent two 
years developing a DRM product for MediaSnap, Inc., a small Silicon 
Valley startup company. Dr. Stamp’s current research interests include 
security, networks, algorithms and DRM. 
Puneet Mishra received his Bachelor of Engineering, Computer Technology degree from India and attended the National Center for Software Technology, Mumbai, India. His primary area of interest is in the field of computer security.