A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow machine learning with r book pdf the algorithm to correctly determine the class labels for unseen instances. Determine the type of training examples. Before doing anything else, the user should decide what kind of data is to be used as a training set.
In the case of handwriting analysis, for example, this might be a single handwritten character, an entire handwritten word, or an entire line of handwriting. The training set needs to be representative of the real-world use of the function. Thus, a set of input objects is gathered and corresponding outputs are also gathered, either from human experts or from measurements. Determine the input feature representation of the learned function. The accuracy of the learned function depends strongly on how the input object is represented.
Typically, the input object is transformed into a feature vector, which contains a number of features that are descriptive of the object. Determine the structure of the learned function and corresponding learning algorithm. Run the learning algorithm on the gathered training set. Some supervised learning algorithms require the user to determine certain control parameters.
Evaluate the accuracy of the learned function. After parameter adjustment and learning, the performance of the resulting function should be measured on a test set that is separate from the training set. A wide range of supervised learning algorithms are available, each with its strengths and weaknesses. Imagine that we have available several different, but equally good, training data sets. The prediction error of a learned classifier is related to the sum of the bias and the variance of the learning algorithm. Generally, there is a tradeoff between bias and variance.
A learning algorithm with low bias must be “flexible” so that it can fit the data well. But if the learning algorithm is too flexible, it will fit each training data set differently, and hence have high variance. If the true function is simple, then an “inflexible” learning algorithm with high bias and low variance will be able to learn it from a small amount of data. A third issue is the dimensionality of the input space. If the input feature vectors have very high dimension, the learning problem can be difficult even if the true function only depends on a small number of those features.
This is because the many “extra” dimensions can confuse the learning algorithm and cause it to have high variance. Hence, high input dimensionality typically requires tuning the classifier to have low variance and high bias. In practice, if the engineer can manually remove irrelevant features from the input data, this is likely to improve the accuracy of the learned function. When either type of noise is present, it is better to go with a higher bias, lower variance estimator. Presence of interactions and non-linearities.
Linear methods can also be applied, but the engineer must manually specify the interactions when using them. Tuning the performance of a learning algorithm can be very time-consuming. Given fixed resources, it is often better to spend more time collecting additional training data and more informative features than it is to spend extra time tuning the learning algorithms. Empirical risk minimization seeks the function that best fits the training data. The learning algorithm is able to memorize the training examples without generalizing well.