Version spaces Version spaces are used to learn a concept, based on positive and negative examples. In other words, a predicate or binary classification is being learned. This is supervised learning, since the instances must be labeled. However, the learner can generate the instances. If the domain of the predicate has size n, then there are 2**n possible concepts. If all possible concepts are to be learned, then n instances will need to be seen. This would be rote learning, and would not allow generalization. So the space of allowable concepts needs to be restricted. One reasonable way to do this is linguistically -- by restricting the set of possible concept descriptions. This can be done by allowing only conjunctions, and perhaps negations, of primitive features. Note that allowing disjunction as well may make every concept representable. In a playing card example, each of the numbers and suits could be a feature, along with red, black, face, and numbered. The nice thing about this set is that the suits and colors are independent of the others. In other words, the set of concepts factorizes. A version space is a set of concepts consistent with a collection of positive and negative instances. In Mitchell's candidate elimination algorithm, version spaces are represented by their maximally general elements and their maximally specific elements. These two sets may be called G and S. Any element of the version space is thus between some member of G and some member of S. The algorithm depends on being able to compute the immediate specializations and immediate generalizations of any concept. This is easier if the version space factorizes. Given a positive instance: we update G by removing any members that don't cover the instance. we update S by replacing its elements by their immediate generalizations, until every element in S covers the positive instance (and is more specific than an element of G). The treatment of a negative instance is completely symmetric. Observations A partial solution can classify correctly some test instances. It's relatively easy to generate test instances -- try to divide the space in half. It's easy to tell when no further instances are required (unlike other algorithms). The algorithm may be used as a component in other learning systems. The algorithm doesn't work in the presence of noise. It depends on being able to efficiently find immediate specializations & generalizations.