Skip to content

Text Classification

  • Devise features by hand: Does the message contain “church”. Does the email contain an Indian organization’s domain
  • Bag of words: Count of occurrences off each word of a pre-defined ‘vocabulary’

Pre-Processing

  • Stemming: only keep the root of the word
  • “slowly” and “slow” both mapped to “slow”
  • Filtering
  • Stopwords: articles
  • Filler words
  • rare words
Last Updated: 2024-05-14 ; Contributors: AhmedThahir

Comments