- In a multiple regression model, the variance inflation factor (VIF) is a measure of multicollinearity among the independent variables.
- Multicollinearity must be detected because, while it has no effect on the model’s explanatory power, it does diminish the statistical significance of the independent variables.
- A strong variance inflation factor (VIF) on an independent variable implies a highly collinear relationship with the other variables, which should be taken into account or corrected for in the model’s structure and independent variable selection.
What is a high VIF?
Inflation factors for variance range from one to ten. The numerical number for VIF indicates how much the variance (i.e. the standard error squared) is inflated for each coefficient (in decimal notation). A VIF of 1.9, for example, indicates that the variance of a specific coefficient is 90% higher than what you’d expect if there was no multicollinearity that is, if there was no connection with other variables.
The exact size of a VIF that causes problems is a point of contention. What is known is that when your VIF increases, your regression results will become less dependable. In general, a VIF greater than 10 shows substantial correlation and should be considered concerning. Some authors recommend a threshold of 2.5 or more as a more conservative level.
A high VIF is not always a cause for concern. For example, if you use products or powers from other variables in your regression, such as x and x2, you can achieve a high VIF. It is usually not a problem to have large VIFs for dummy variables representing nominal variables with three or more categories.
What is an appropriate inflation variance factor?
The majority of research studies use a VIF (Variance Inflation Factor) > 10 as a criterion for multicollinearity, however some use a lower threshold of 5 or even 2.5.
When deciding on a VIF threshold, keep in mind that multicollinearity is less of an issue with a big sample size than it is with a small one.
As a result, here is a list of references for various VIF thresholds that are recommended for detecting collinearity in a multivariable (linear or logistic) model:
What does a VIF of one indicate?
A VIF of 1 indicates that the jth predictor and the remaining predictor variables have no association, and hence the variance of bj is not inflated at all.
What does it mean to have a low VIF?
Small VIF values suggest low correlation among variables under ideal conditions VIF. VIF is the reciprocal of the tolerance value. If the number is fewer than ten, though, it is allowed.
Is a VIF of 1 a good number?
There are various rules we can follow to see if our VIFs are within acceptable limits. A popular rule of thumb in practice is that if the VIF is greater than ten, the multicollinearity is high. We’re in good shape in our scenario, with values around 1, and we can continue with our regression.
What is the difference between tolerable tolerance and VIF?
Multicollinearity may arise if the coefficients of variables are not individually significant that is, they cannot be rejected in a t-test but can together explain the variance of the dependent variable with rejection in the F-test and a high coefficient of determination (R2). It’s a technique for detecting multicollinearity.
Another often used approach for detecting multicollinearity in a regression model is VIF. It determines how much collinearity has inflated the variance (or standard error) of the predicted regression coefficient.
Use of Variance Inflation Factor
When regressing the ith independent variable on the remaining ones, Ri2 is the uncorrected coefficient of determination. Tolerance is defined as the reciprocal of VIF. Depending on personal taste, VIF or tolerance can be employed to detect multicollinearity.
The variance of the other independent variables cannot be predicted from the ith independent variable if Ri2 is equal to 0. As a result, when VIF or tolerance is equal to 1, the ith independent variable is unrelated to the others, implying that multicollinearity is absent in this regression model. The variance of the ith regression coefficient is not increased in this example.
In general, a VIF greater than 4 or a tolerance less than 0.25 suggests the presence of multicollinearity, and further analysis is required. There is severe multicollinearity that needs to be adjusted when VIF is greater than 10 or tolerance is less than 0.1.
There are, however, cases where high VFIs can be safely ignored without causing multicollinearity. Three such scenarios are as follows:
1. High VIFs are found only in control variables, not in the variables of interest. The variables of interest and the control variables are not collinear in this example. The regression coefficients are not influenced in any way.
2. Multicollinearity has no detrimental effects when large VIFs are created by the inclusion of the products or powers of other variables. A regression model, for example, comprises both x and x2 as independent variables.
3. Multicollinearity does not always exist when a dummy variable representing more than two categories has a high VIF. Regardless of whether the categorical variables are associated to other factors, the variables will always have high VIFs if there is a limited portion of cases in the category.
Correction of Multicollinearity
Multicollinearity inflates coefficient variance and generates type II errors, thus detecting and correcting it is critical. The following are two basic and widely used methods for correcting multicollinearity:
One (or more) of the highly associated variables should be removed initially. Because the variables give duplicate information, their elimination will not have a significant impact on the coefficient of determination.
2. Instead of using OLS regression, utilize principal components analysis (PCA) or partial least square regression (PLS). PLS regression can condense a large number of variables into a smaller number with no correlation between them. PCA generates new uncorrelated variables. It reduces information loss and enhances a model’s predictability.
What does it signify in a linear regression model when a variable’s VIF variance inflation factor is high?
The variance inflation factor (VIF) is a metric for determining how much multicollinearity there is in a set of multivariate regression variables. The VIF for a regression model variable is equal to the ratio of the total model variance to the variance of a model that just includes that single independent variable in mathematics. For each independent variable, this ratio is determined. A high VIF shows that the linked independent variable has a high degree of collinearity with the model’s other variables.
What can I do about a high VIF?
If multicollinearity is an issue in your model if a factor’s VIF is at or above 5, for example the solution may be straightforward. Consider one of the following:
- Remove predictors that are highly linked from the model. Remove one of the factors with a high VIF if you have two or more. Because they provide redundant information, eliminating one of the linked components seldom reduces the R-squared significantly. To eliminate these variables, consider utilizing stepwise regression, best subsets regression, or specialized knowledge of the data set. The model with the highest R-squared value should be chosen.
- Use regression approaches like Partial Least Squares Regression (PLS) or Principal Components Analysis, which reduce the number of predictors to a smaller number of uncorrelated components.
It’s simple to utilize the tools in the Stat > Regression menu in Minitab Statistical Software to quickly test several regression models to identify the best one. If you haven’t tried Minitab yet, we offer you to do so for free for 30 days.
Have you ever had to deal with multicollinearity issues? How did you come up with a solution to the problem?
What does r2 mean in VIF?
Each model generates an R-squared number that represents the percentage of variation explained by the set of IVs in the individual IV. As a result, greater R-squared values suggest greater multicollinearity. These R-squared values are used in VIF calculations.