top of page

Applied Regression

  • Writer: Insignia Partners
    Insignia Partners
  • Dec 16, 2024
  • 2 min read

Updated: Jan 15


Among the many sophisticated statistical methods we can use today, amplified by the availability of data and processing power, there is one relatively simple technique that should not be overlooked: the 'good and old' linear regression.


I must admit, when I first applied regressions at work (almost 20 years ago...), I didn’t fully understand all the different concepts. But I did hear some "absolute truths" from someone:


  • R² must be greater than 80%

  • You need more than 80 data points for the regression to be valid


These "truths" came from somewhere, and they’re not entirely wrong, but… that’s not exactly how things work, as I learned during my MBA (with the great Professor David Juran).


It is important to understand the meaning of each element in a regression, with the key ones being:


  • R² (Coefficient of Determination) measures how much of the variation in the dependent variable is explained by the variation in the independent variable(s). For example, an R² of 80% means that 80% of the variation in one thing is explained by the variation in another(s). And adding to that, a high R² doesn’t always mean the model is good, as it could indicate overfitting to the data, especially when there are many independent variables.


  • P-value shows how statistically significant the identified relationship is. The standard is to use a 95% confidence interval – meaning that a p-value of less than 5% indicates the relationship is significant with 95% confidence.


Therefore, there are situations where a low R² can be good (as long as the p-value is under 5%). In cases where there is a strong influence of "random" factors (such as cultural factors, for example), a variable that explains "only" 20% of another can be extremely useful.


On the other hand, having too few data points can increase the p-value, which, within the standard confidence interval, compromises the quality of the regression.


A Practical Example: During my MBA, we conducted a study on the "theoretical value" a person has for a casino. We wanted to understand what explained this value (the method for calculating the "theo" isn’t relevant here). We analyzed a series of variables: gender, age, frequency of visits… After running a multivariable regression, many factors turned out to be irrelevant to the dependent variable. And the relevant factors (mainly gender and age) explained "only" 3% (R²).


At first, it seemed like the regression "wasn’t good enough," but even with a low R², there were statistically significant differences between men and women. This means that, although we couldn’t predict the value of a specific individual, we could make inferences with confidence about larger groups – such as the next 10,000 women or 10,000 men who enter the casino. In this case, women were "more valuable" and could be a target group for marketing efforts, for example.


Contact us at contact@insignia.partners and discover how we can contribute to the success of your strategy.



ree

Bruno Bullio

Associate Partner


Bruno brings 15 years of experience in strategic consulting, specializing in retail and consumer goods, with a strong track record across Brazil and Latin America.

Comments


bottom of page