Hi everyone. Welcome back. In today's lecture, we'll be talking about specific machine learning algorithms. In the last lecture, we covered all the theory that underpins both supervised and unsupervised classification techniques. And today, we'll be doing much more practical stuff. We'll be learning about how to do dimensionality reduction, as well as how to actually implement some of these Algorithms. Some motivating questions to get us started-- so one of the things that we have to grapple with in machine learning is that we're dealing with large multi-dimensional data sets. And sometimes there are 10 or 20 or 30 different characteristics or features in our data sets that are independent input variables, and we need to understand them a little bit better. Ultimately, we need humans to be able to understand what the signal and what is driving the signal. And so we do this thing called dimensionality reduction. So we'll talk about techniques for it and motivate it. The next question is how can we group records together that don't contain labels and form predictions using unsupervised classification. So here, we'll be looking, again, at trying to understand what are the groupings inherent in a data without knowing anything else about it. Then we'll go on to supervised machine learning algorithms. We'll look at some common examples and just understand them better. Finally, we'll ask, how do we measure the performance of these algorithms in practice? The first thing that we're going to focus on in this segment is dimensionality reduction. And it's actually going to play a big role in unsupervised classification. And remember, just as a quick review, unsupervised learning refers to the process of finding and extracting patterns from multi-dimensional data sets. And we're going to be doing that without any labels. So we'll be learning, which means we'll be inferring previously unknown patterns from the data set. And it's unsupervised, because we have no labels. We have no reference to the class or the outcome value. And we don't have a correct answer in mind when we're trying to infer those unknown patterns from the input data. So we don't have any characteristic as the output right offhand. And why is this important? This is one of the things that we'll have to think about and why we have to grapple with dimensionality reduction. So dimensionality reduction refers to the idea of reducing the number of features that are included in your analysis. And as I said before, we might have many different features that are available in an input data set. If you think about an order data set for a company, there might be hundreds of different possible characteristics that we can use to actually try and make inferences about any given order. Even though we might only have a few off the top of our heads, there may be many that can be linked using a relational database tables that can be characteristics that can be brought to bear to make predictions. And so the focus of this section is really understanding how to whittle our signal down and our input variables down just to a few really important motivating and driving characteristics or input features that are leading to our outcome variables. And why is this important? One of the big pieces of machine learning and the big benefits of it is that, well, we should be able to throw these really large data sets. They should have lots of records in them and they should have lots of different features or characteristics. And machine learning should be able to sort it out. Well, at the end of it, humans need to interpret the results. And these need to be tractable and they need to be understandable. So dimensionality reduction can really help us there. And it can help us understand correlations among our variables much more easily, and also reduce down to just the important set of results or features that are important in driving our signal. The other reason that this is important is that the more features you include in your analysis, the larger the sample size of records needs to be. And it doesn't increase linearly, it increases exponentially. So for example, if we wanted to study the height, the weight, the education level, and the income of a large group of individuals, we'd need a whole lot more records of individuals than a study of just height and weight. So you can imagine, you can actually describe all of the different characteristics and all of the different correlations, for example, among these variables more easily by taking combinations of the correlated variables. So for example, we know that height and weight are correlated, and education level and income are correlated. So we can think about how those two components can be used-- those two very heavily correlated components, height and weight and education level and income, can be used-- rather than thinking about how all four of these things relate to one another. And this allows us to come up with a smaller sample set that will actually capture most of the variance in a large group of people. And a quick word to the wise-- trying to reduce dimensionality randomly or manually can lead to really poor results. In the last slide, we talked about height and weight being correlated and income and educational attainment being correlated. But it turns out, there are things that are actually correlated and those that are separate from those two big components. Income is actually positively correlated with height. So something like that could be really important if you're trying to reduce the number of dimensions you want to use, but you leave out some pretty important correlation that exists that you might not have expected. And I like to think of this is all about perspective. So you might think this picture captures everything that's going on in this scene, but, well, it's actually somebody taking a picture of the scene. So it's really important to let features be included or not included in your model based on analysis of the actual correlations of the variables. These shouldn't be done on instinct. We need to use real dimensionality reduction techniques, such as principal component analysis, which we'll cover in the next segment. And this is also called PCA. Summary statistics are also a really handy means of describing a distribution of data or a set of records. These can also help us reduce our dimensionality. So we can reduce dimensionality to this summary statistic to help describe a large number of data points with just a few numbers. But don't forget, statistics can be really misleading for all but the most common distributions, say, of the normal distribution. So if you think about statistics in terms of the population of the average city, the average city in the United States has about 8,000 people. So you might think based on this mean number, that most cities are really small in the US. But that's not really the case. The average human on Earth is female, because we have more than 50% of humans on Earth who are female. You might try and make an assumption based on that, but it wouldn't be very promising. Right? And crime rates in small cities can be highly variant. So any given year, a crime rate might be zero, or it might be 20%, depending on how small the city is, just by the change of a few small crimes occurring. So some key points from this lesson-- dimensionality reduction is often needed for analysis that includes many different features. What we'll be doing is really whittling down the number of dimensions and the number of key correlations among those features to actually drive our signal. There is a need for the ability to confidently reduce the number of features that are included in analysis without losing information. And this is primarily because we want to make sure that we have a manageable number of records that are needed for analysis that actually confidently help us drive the signal. Recall that example with height, weight, education level, and income. Correlations between those attributes should be leveraged. We know that factually, there is a correlation between height and weight, education level and income. We even know that there is this lesser correlation between height and income. We should leverage those and make sure that we're including those correlations between the attributes when we're making our characterizations. With that, we'll end this segment here. And take a look at the quick questions, and we'll be back in just a bit.