Rule-Based Classifier¶
Knowledge about dataset is stored in the form of if-then rules, in a rule database \(R\)
Rule¶
LHS | RHS | |
---|---|---|
Contains | Condition/Conjunct with attributes | Class label |
Alternate Names | Antecedant Pre-Condition | Consequent |
If precondition of rule \(r\) matches attributes of record \(x\)
- \(r\) covers \(x\)
- \(x\) fires/triggers \(r\)
Quality of Classification Rule¶
Consider \(r: A \to y\), where
- \(r\) is the rule
- \(A\) is the antecedent
- \(D\) is the dataset
- \(|A|\) is the number of records covered by rule
- \(|D|\) is the total number of records
Quality Measure | Formula | |
---|---|---|
Coverage\((r)\) | Fraction of records covered by rule | \(\dfrac{\vert A\vert}{\vert D \vert}\) |
Accuracy\((r)\) Confidence Factor | Fraction of records for which the rule correctly predicted | \(\dfrac{\vert A \cap y \vert}{\vert A \vert}\) |
Steps¶
- Find rule(s) that match(es) antecedent of record
- Next steps:
Number of rules triggered | Rules have same class label | Steps |
---|---|---|
0 | N/A | Add default rule Fallback to default class |
1 | N/A | Assign consequent of rule as class label of test record |
Multiple | β | Assign consequent of rules as class label of test record |
Multiple | β | - Use the highest-priority ordered rule (computationally-expensive for training) or - Use majority voting scheme using unordered rules (computationally-expensive for testing) |
Default Rule¶
Default Class¶
Majority class represented by records not covered by rules in rulebase
Desired Propertes of Rule-Based Classifier¶
Desired Property | Meaning |
---|---|
Rules are Mutually-Exclusive | Only 1 rule is triggered for any record |
Rules are Exhaustive | \(\exists \ge 1\) rule(s) that covers every record |
Types of Rules¶
Ordered | Unordered |
---|---|
Priority assigned based on - coverage - accuracy | No priority |
I missed 15min¶
Extraction from Decision Tree¶
One rule is created for each path from root leaf
Keep taking the edges and use βandβ as the conjuction
Why?¶
Rules are easier to understand than larger trees
Sequential Covering Algorithm¶
- Start with empy decision list \(R\), training records \(E\), class \(y\)
- Learn-One-Rule function is used to extract the best rule for class \(y\) that covers the current set of training records
- Remove training records covered by the rule
- New rule is added to the bottom of the decision list \(R\)
- Repeat Steps 2, 3, 4 until stopping criterion is met
- Algorithm proceeds to generate rules for the next class
During rule extraction, all training records for class \(y\) are considered to be +ve examples, while those that belong to other classes are considered to be -ve examples
Rule is desirable such that it covers most of the +ve examples and none/very few -ve examples
Learn-One-Rule Function¶
General-to-Specific¶
Initial Seed Rule¶
Keep refining this initial seed rule, by adding more conjucts
Specific-to-General¶
Keep refining this initial seed rule, by removing conjucts
Metrics¶
Foilβ Information Gain¶
First order inductive learner
Higher value \(\implies\) better rule
Likelihood Ratio Statistic¶
Let
- \(k\) be number of classes
- \(f_i\) be observed frequency of class \(i\) examples, that are covered by rule
- \(e_i\) be expected frequency of rule that makes random predictions
- Probability of \(i\) x Number of records covered by rule
Higher value \(\implies\) better rule
Types of Rule-Based Classifier Algorithms¶
Direct | Indirect | |
---|---|---|
Extract rules from | data directly | other classification models |
Example | - Ripper - CN2 - 1R | - C4.5 Rules |