16 Statistical Inference

Statistical Inference¶

Deals with the problem of uncertainty in estimates due to sample variability

Does not deal with

Whether model specification is correct
Whether $x$ has causal effect on $y$
Whether model is good for describing causal effect of $x$ on $y$

Hypotheses Testing¶

Selective Inference¶

Assessing strength of evidence after obtaining the ‘best model’ through searching from a large number of models

If not taken into account, the effects of selection can greatly exaggerate the apparent strengths of relationships.

Also called as Post-Selection Inference

Context¶

To conduct statistical inference for procedures that involve model selection, such as forward stepwise regression or the lasso, it is tempting to look only at the final selected model. However, such inference is generally invalid

The problem is essentially the same as those of specification search and data-snooping: an observed correlation of 0.9 between x and y may be noteworthy. However, if x is found by searching over 100 variables looking for the one with the highest observed correlation with y, then the finding is no longer as impressive and could well be due to chance

Solution: Conditional Coverage¶

Make the inference conditional on the model selected

Construct CI for $\beta_{j, M}$ conditional on model $M$ being selected: $$ P(\beta_{j, M} \in C_{j,m} \vert M \text{ selected}) \ge 1 - \alpha $$

Last Updated: 2024-12-26 ; Contributors: AhmedThahir, web-flow