Hi everyone.
Welcome back.
In today's lecture, we'll be talking about specific machine learning algorithms.
In the last lecture, we covered all the theory that underpins both supervised and unsupervised classification techniques. And today, we'll be doing much more practical stuff. We'll be learning about how to do dimensionality reduction, as well as how to actually implement some of these Algorithms. Some motivating questions to get us started-- so one of the things that we have to grapple with in machine learning is that we're dealing with large multi-dimensional data sets. And sometimes there are 10 or 20 or 30 different characteristics or features in our data sets that are independent input variables, and we need to understand them a little bit better. Ultimately, we need humans to be able to understand what the signal and what is driving the signal. And so we do this thing called dimensionality reduction. So we'll talk about techniques for it and motivate it. The next question is how can we group records together that don't contain labels and form predictions using unsupervised classification.
So here, we'll be looking, again,
at trying to understand what are the groupings
inherent in a data without knowing anything else about it.
Then we'll go on to supervised machine learning algorithms.
We'll look at some common examples
and just understand them better.
Finally, we'll ask, how do we measure
the performance of these algorithms in practice?
The first thing that we're going to focus on in this segment
is dimensionality reduction.
And it's actually going to play a big role
in unsupervised classification.
And remember, just as a quick review, unsupervised learning
refers to the process of finding and extracting patterns
from multi-dimensional data sets.
And we're going to be doing that without any labels.
So we'll be learning, which means
we'll be inferring previously unknown patterns
from the data set.
And it's unsupervised, because we have no labels.
We have no reference to the class or the outcome value.
And we don't have a correct answer in mind
when we're trying to infer those unknown patterns from the input
data.
So we don't have any characteristic as the output
right offhand.
And why is this important?
This is one of the things that we'll have to think about
and why we have to grapple with dimensionality reduction.
So dimensionality reduction refers
to the idea of reducing the number of features that
are included in your analysis.
And as I said before, we might have many different features
that are available in an input data set.
If you think about an order data set for a company,
there might be hundreds of different possible
characteristics that we can use to actually try and make
inferences about any given order.
Even though we might only have a few off the top of our heads,
there may be many that can be linked
using a relational database tables that
can be characteristics that can be brought
to bear to make predictions.
And so the focus of this section is really
understanding how to whittle our signal down and our input
variables down just to a few really important motivating
and driving characteristics or input
features that are leading to our outcome variables.
And why is this important?
One of the big pieces of machine learning
and the big benefits of it is that, well, we
should be able to throw these really large data sets.
They should have lots of records in them
and they should have lots of different features
or characteristics.
And machine learning should be able to sort it out.
Well, at the end of it, humans need to interpret the results.
And these need to be tractable and they
need to be understandable.
So dimensionality reduction can really help us there.
And it can help us understand correlations
among our variables much more easily,
and also reduce down to just the important set
of results or features that are important in driving
our signal.
The other reason that this is important
is that the more features you include
in your analysis, the larger the sample size of records
needs to be.
And it doesn't increase linearly,
it increases exponentially.
So for example, if we wanted to study the height, the weight,
the education level, and the income of a large group
of individuals, we'd need a whole lot more records
of individuals than a study of just height and weight.
So you can imagine, you can actually
describe all of the different characteristics
and all of the different correlations, for example,
among these variables more easily
by taking combinations of the correlated variables.
So for example, we know that height and weight
are correlated, and education level and income
are correlated.
So we can think about how those two components can be used--
those two very heavily correlated components,
height and weight and education level and income, can be used--
rather than thinking about how all four of these things
relate to one another.
And this allows us to come up with a smaller sample set that
will actually capture most of the variance
in a large group of people.

And a quick word to the wise--
trying to reduce dimensionality randomly or manually
can lead to really poor results.
In the last slide, we talked about height and weight being
correlated and income and educational attainment
being correlated.
But it turns out, there are things that are actually
correlated and those that are separate from those two
big components.
Income is actually positively correlated with height.
So something like that could be really important
if you're trying to reduce the number of dimensions
you want to use, but you leave out
some pretty important correlation
that exists that you might not have expected.
And I like to think of this is all about perspective.
So you might think this picture captures everything that's
going on in this scene, but, well, it's
actually somebody taking a picture of the scene.
So it's really important to let features be included
or not included in your model based
on analysis of the actual correlations of the variables.
These shouldn't be done on instinct.
We need to use real dimensionality reduction
techniques, such as principal component analysis, which we'll
cover in the next segment.
And this is also called PCA.
Summary statistics are also a really handy means
of describing a distribution of data or a set of records.
These can also help us reduce our dimensionality.
So we can reduce dimensionality to this summary statistic
to help describe a large number of data points
with just a few numbers.
But don't forget, statistics can be really misleading for all
but the most common distributions, say,
of the normal distribution.
So if you think about statistics in terms
of the population of the average city,
the average city in the United States has about 8,000 people.
So you might think based on this mean number,
that most cities are really small in the US.
But that's not really the case.
The average human on Earth is female,
because we have more than 50% of humans on Earth who are female.
You might try and make an assumption based on that,
but it wouldn't be very promising.
Right?
And crime rates in small cities can be highly variant.
So any given year, a crime rate might be zero,
or it might be 20%, depending on how
small the city is, just by the change of a few small crimes
occurring.
So some key points from this lesson--
dimensionality reduction is often
needed for analysis that includes
many different features.
What we'll be doing is really whittling down
the number of dimensions and the number
of key correlations among those features
to actually drive our signal.
There is a need for the ability to confidently reduce
the number of features that are included in analysis
without losing information.
And this is primarily because we want
to make sure that we have a manageable number of records
that are needed for analysis that actually confidently
help us drive the signal.
Recall that example with height, weight, education level,
and income.
Correlations between those attributes should be leveraged.
We know that factually, there is a correlation
between height and weight, education level and income.
We even know that there is this lesser correlation
between height and income.
We should leverage those and make
sure that we're including those correlations
between the attributes when we're
making our characterizations.
With that, we'll end this segment here.
And take a look at the quick questions,
and we'll be back in just a bit.