Correlation analysis - Some caveats
Correlation analysis is an integral part of many EDAs given that it is so easy to execute, visualize and interpret by technical and non-technical audeinces alike. It is a very intuitive metric to compute, and it also happens to be exceedingly useful. An introductory correlation analysis can help us understand what features may be interesting and deserve a closer look, point to potential multicollinearity concerns, and help in identifying opportunities for dimension reduction. BUT... there is always a but! As with any other metric, correlation comes with its caveats and gotchas which are helpful to keep in mind to get the most out of it - without misusing it. Here are a few pointers to note: 1. The most commonly used correlation method, Pearson Correlation, is a measure of the strength of the linear relationship between two variables. The keyword 'linear' is important; just because two variables do not have a high correlation coefficient does not mean that there is no relati...