AMH - g_ud.18_sup

Identify your problem

Machine learning can be used in many situations. For example, it may be used to predict housing prices based on attributes of other houses that have recently sold or to make recommendations on products to a customer based on other products they have purchased. You may take two very different approaches to solving these problems.

To predict housing prices, you can use data such as location, number of bedrooms, number of bathrooms, number of floors, age of the house, and so on to create a function. The function takes in all the information and outputs a price. By observing actual housing prices of recently sold houses, you can train the function to give particular outputs for a given set of inputs.

To make recommendations to customers, you may be using data that has a row of items that are purchased by each customer. For example, if you are make a recommendation system for a grocery store and Suzy buys apples, potato chips, and coffee, and Tim buys bananas and paper plates, then two rows would be

Suzy:
Tim:

apples, potato chips, coffee
bananas, paper plates

In this case, you may want to see which items are frequently purchased with other items. If people often buy peanut butter with apples or cream with coffee, then those items could be recommended to Suzy. If plastic forks are often purchased with paper plates, then plastic forks may be recommended to Tim.

Supervised Learning

Supervised learning is like learning with a teacher. You are given some data \(X\) and asked to make a prediction, \(y.\) The goal is to create a set of rules that allows you to make good predictions. The set of rules give you a mapping, or function, of sets of inputs to outputs. Let \(f\) be the function that you find. The goal in supervised learning is then to find a function \(f\) so that \[f(X) = y\] is accurate. To know whether \(f\) is accurate, you need to know whether \(f(X)\) is equal to \(y\) or is close to \(y.\) The way you measure accuracy depends on the application.

Predicting the prices of houses described above is an example of a problem you can solve with supervised learning. You map a set of data to an predicted price.

Unsupervised Learning

Unsupervised learning is learning without feedback. The goal is to find patterns. For example, a machine may be given data about income and hours worked for a large set of people and asked to group them based on wages earned. The machine may find its own cut offs for lower, middle, and upper classes. In this case, we cannot tell the machine it is right or wrong because the divisions are not well defined. Are the top \(10\%\) of earners considered upper class? The top \(1\%?\) The top \(0.1\%?\)

Building a recommendation system is an example of a problem you can solve with unsupervised learning. If someone comes to the store knowing what they want to buy and with only enough money to buy what they way, the recommendation system has not failed if the customer does not buy what we recommend. In fact, if the customer buys apples and we recommend peanut butter, they may think about buying peanut butter next time. The system looks for patters of what items customers usually buy together.