We still need it's not that you can always capture all the information with just one dimension it's very very rare right so if you have 100 dimensions maybe you can get it down to maybe seven-eight or ten right and that that first ten will capture say 90% and then you're good with it so PC when will capture maximum but not 100% so that's why you have to go to PC 2 and then PC 3 and PC for right so that's how so every time you have to project to calculate new values of yes and how do you index see again.

So okay so that is more about feature selection your information gain and Gini index is more about feature selection where what you do you take the original features and say that these first five features are the my most important one I'm throwing away ten but here the major difference between feature selection and feature extraction is you are not throwing away any data actually all data points are being captured in PC one all data points are being captured in a PC to bear you throw away some of the features itself feature selection and feature extraction means remember that in feature selection we throw away the original data points some apart some part of it but in feature extraction we try to keep as much information as possible from all the data points that are there in the underlying data so that's a major difference right here we have f1 and f2 both in PC one so even if we keep PC one we still have a representation from f1 and f2 so that's a major difference yes yes yes so if you have 100 dimensions suppose you have 100 dimensions you will have from pc1 to pc2 so you will have 200 new pcs then what you do you will find out what are the SSDs for those 100 people pcs and see how much variants each of them is capturing in descending order and then you say okay.

I want to capture 80% what are the top five I mean if your top five pcs are capturing 80% then you can throw away the 95 95 pcs and just keep the first five because they are capturing 95% of the sorting 80% of the variance or whatever the usually we try to capture 90% of the information that's the usually the good how varied the data is how they vary I mean are they correlated I mean take away the correlation take away the Codel correlation how varied that the data is are they moving you know independently all the data points concerning all the data points yes we will keep doing it so initially we will capture all the pcs mathematically will capture all hundreds if under dimension will capture all 100 and then we say okay.

I want to capture 90% of the variance how many pieces do I need to keep and answer maybe you keep six if you keep six these six pc-12 PC six you capture 91% and that's okay go ahead I'm good to let me throw away the 99-94 pcs correct exactly again it's a PC too is also a combination first and second variable it's again it's a ratio of first and second with that but a different ratio and PC 3 also is a ratio of all the variables F 1 and F 2 in our case again a different ratio is they all of them are function linear functions of both s 1 and F 2 are independent although piece L F 1 F 2 might be related to each other it should be 90 perpendiculars so if less two to three is a perpendicular line I mean if it's critical yes it can be.

I mean again that's why we convert it into unit vectors because then you don't need to worry about two to three to five to seven you can say 12.5 12.0 whatever the rate of that but here it is not perpendicular right yes one point five should come up it's the same thing I mean it basically you know if you say minus three by one point five you can go either way we can go think that X should be one point five in a why it should be three in their PC - easy-peasy - correct let me draw a bit off we can again yeah that could be the case but ultimately it has to be perpendicular.

If this is the line and we see one so if this is 3 and this is one point five that's what we are saying so we have a perpendicular line what perpendicular means I think perpendicular will mean is it a more like a square or something or right so you can calculate it this should be one point five on the y-axis on x-axis it will be minus 3 right otherwise it will not meet right so that's how I mean again we don't need to do it we will see how it gets done mathematically that's what if it is highly correlated in getting capture in PC one itself it'll get captured in peace even itself that's why all the correlation part you know that's why we are doing all the rotation yes that's what yes we are what we are doing by all the rotation minimum if you say if you look at the projections this distance should be minimum but this will be maximum SSD we said is from the origin right.

It's the same thing it's the same thing we do every names for the same thing I mean it's a yes I mean that's what it does it basically takes away all your you know correlation part and then makes it features independent and who doesn't like this but but that the data should be linearly separate I mean if it's if it's on lean it's a still a linear function f 1 multiplied by some value plus F 2 multiplied by someone it's a linear function it's not a nonlinear function if your data is nonlinear it may struggle we remember that right otherwise it could have been that they're wannabe we are looking for right so it works on linear linear data okay all right so is it clear what is species ok guess go at least so supervised landing goal is to predict something what we are telling it to predict when we give the label predict there's an elephant in it predict there's a boy in it predict there's a background in it whatever we want right maybe here what we are saying that a can you give me you know can you reduce my data size you know and we I'm not telling you ha you know you know I want to use these these these features.

I don't know can you tell it can you give me you know a reduced data which which will still work well for my training in unsupervised just throwing beta at PC PC a and coming up with the reduced data set that's why this is comes under unsupervised learning approach it gives me it gives me you know if I have reduced data what is the advantage of reduced data first of all but benefit we talked about noise is thrown out first of all right which we which may may may not think of initially first of that second thing is we I can visualize my data to see how the data looks is there any relation between you know second third is I can you know if machine has if we need machine needs to work with smaller data it can run faster and usually with smaller days more and less number of dimensions what we have seen machines accuracy improves so all those benefits you can derive from it yes I mean yes rather than be saying like you know is rather than your X domain expert coming and saying that okay you just remove these these data points or combine these data points we let machine figure it out we didn't know there is there is we will we will not go into detail but I'll show you one example how to use them okay.