It's very hard but what we will learn that you know even if we start with 100 of 100 dimensions and if we reduce it to three-four we still get to keep maybe ninety ninety-five percent of the information which might be pretty good added not just very good might be actually desirable might be desirable to throw away some information why it might be desirable to throw away some information that you get by reducing the dimension but why throw away information exactly.

We will usually all our data set to have some noise and you know we are not trying to capture that noise so by reducing you know several dimensions using PCA we can actually throw away the noise so it's not a bad thing right you so yeah not only you get you to know fewer data points you actually get better data set all right so this, so it doesn't think every data has you know perfect, did I mean information in it there's a lot of noise present and with this approach, we can actually throw within a throwaway the noise now what we have the PC valve, so we got a new feature.

We need to see how much information is being captured by this new feature right we will learn about that because we need to understand hey what you know mathematically how much information got kept by saying ok we are going to capture 90% how is that 90% coming about right, so we will just hold on to that we will come back to that but is everybody familiar or comfortable with how we are calculating PC one it's a combination of existing features yes please maybe sup ok suppose you have two dimensions market cap market capitalization and revenue so what you are doing you're creating another dimension called PC one another data point you know so this is for example company ABC right that right suppose you had a thousand five hundred nine hundred something and then six-five threesome values.

So what you are doing you're combining these and creating a new value say 50 to 60 173 know what exactly it means's hard to say but ultimately it's a combination of existing data points 50% of the market cap 30% of revenue and maybe 20% of several employees whatever it is it tries to build a new dimension for you that's that is so okay that's where this role okay now it's hard to say what now your model will depend on say PC one suppose you say it's influenced by PC suppose you have multiple PC we will talk about multiple ACS so question from her is hey what if I have these new dimensions.

If I train my machine on these new dimensions how do I explain it to my employer or my customer right hey what is my model doing what feature it is dependent upon because PC one is like nobody cares about PC one but so how do we explain that an if suppose it's highly dependent on PC one for example know how to explain what we see one is PC one I can explain saying that a PC one is 50% market cap and 50% revenue or like some we'll look at the ratio so the ratio can tell you what feature is important for PC one right here f1 is twice important then f2 right so you can get that you're using PC you can get down to the features that you started with it's not that you are losing them be using that information so it's not complete it's a linear combination anyway right.

So this is what you know that's how you can explain it yes please how does it's okay I can value is signifying the ratio of the features I saw the eigenvalue okay we'll talk about eigenvalue in a minute again took to basically bring it forward is there has to be a reason otherwise you know it's just waste of time, so owe'll look at that in in a minute it was yes yes we will talk about that yes good question I mean a how do I know I should use only two out of this ten right I mean I want to keep if I'm saying I'm going to reduce the number of features so how many should I have right and if I and what is the trade-off what is the trade-off right ultimately we are throwing away some data some information although not probably not the data because we are capturing what we are doing they are creating new features using all our data X existing data so it's not completely throwing away as is but what are we capturing right.

We will talk about that but everybody understands what is PC one it's a new feature which is a combination of existing features and that combination is given by the ratio between the mineral between the features right so here so that's and that's called eigenvector off PC one all right so let's now we have PC one right this is a piece this is supposed our PC one what we will do will draw a line perpendicular to it this time they are not going to rotate it enough rotations you know enough rotations done.

Now we can just quickly move and draw a line perpendicular to our PC one right PC one that becomes PC - this is our new feature and sup sorry so if you have supposed I mean we started only with two features in our original data where we only started withstood, soo we will have only two PCs pc1 & pc2 suppose you had 100 features to start with then you can go up to 100 pcs you know that again we come back 200 right what is the point you started with 100 features and you did all these calculation rotations and you come back with 100 features again what is the use right so so what so we have to look at the role of what is what but eigen value brings to the table I can I gain vector we understand to give us the new features but what does eigen value means what is eigen value mathematically SSD sir right it is a sum of squared distances right so what I gain value does is basically suppose we have two features so pc1 & pc2 we got because we started with only two features.

We could get only two pcs and suppose we got 8 & 3 18 & 3 as SSDs right a 18 and 3 as SSD and if I write it in percentage term if I write it in percentage term it will be like 85 first PC one is 85.5 the SSD for a and PC 2 will be 14.5 percent now what does this percentage mean importance yes I yes it basically means how much variance in the original data is captured by that respective PC and the because we do all the rotation in the beginning pc1 usually is the pc which captures the most variants the same one will always capture the most variance in the data variance is the information in the variance is a representation of information in the data how varied is my data right so that information is captured by the eigen value or the SSD right so now and I'm now now of course the 14.5% in this case is the variance captured by PC too right so if you have 100 100 features to start with you come with come up with 100 pcs that means 100s is these had 100 eigen values all of them will add up to hundred percent variance or the information in the data now we have to take a call right we have to take a call so what we do okay so this is the new feature I mean pc1 & pc2 we talked about let's let's hold on to this he they are there they will be they will be one way whether you you should get it on Olympus okay.