Skip to content

01 Intro

Many times very high-quality professionals are not able to produce well, as they are usually incentivized to use complex methodologies. But data science is best when you actually solve the problem at hand, and help make decisions.

Fields Overview

Analytics AI/ML Statistical Inference
Goal Descriptive Predictive Prescriptive
Decisions None Large scale repetitive
(with uncertainty)
Small scale
(with uncertainty)

Data Roles

img

Types of Analysis

Type Topic Nature Time Comment Examples
Descriptive/
Positive
What is happening? Objective Past No emotions/explanations if good or bad Increasing taxes will lower consumer spending
Increasing interest rate will lower demand for loans
Raising minimum wage will increase unemployment
Diagnostic Why is it happening? Objective/Subjective Past Helps in understanding root cause
Predictive What will happen if condition happens Subjective Future Understanding future, using history
Prescriptive/
Normative
What to do Subjective Future what actions to be taken Taxes must be increased

The complexity increases as we go down the above list, but the value obtained increases as well

Project Lifecycle

flowchart TB

subgraph Scoping
    dp[Define<br/>Project] -->
    me["Define Metrics<br/>(Accuracy, Recall)"] -->
    re[Resources<br/>Budget] -->
    ba["Establish<br />Baseline"]
end

subgraph Data
    d[(Data Source)] -->
    l[Label &<br />Organize Data]
end

subgraph Modelling
  pre[Preprocessing] -->
    s[Modelling] -->
    train[Training] -->
  pp[Post<br />Processing] -->
    vt[Validation &<br />Testing] -->
    e[Error Analysis] -->
    pre
end

subgraph Deploy
    dep[Deploy in<br />Production] -->
    m[Monitor &<br />Maintain] & dss[Decision<br />Support System]
end

Scoping --> Data --> Modelling --> Deploy

https://www.youtube.com/watch?v=UyEtTyeahus&list=PLkDaE6sCZn6GMoA0wbpJLi3t34Gd8l0aK&index=5

Data Mining

Generate Decision Support Systems

Non-trivial extraction of implicit, previously-unknown and potentially useful information from data

Automatic/Semi-automatic means of discovering meaningful patterns from large quantities of data

Predictive Tasks

Predict value of target/independent variable using values of independent variables

  • Regression - Continuous
  • Classification - Discrete

Descriptive Tasks

Goal is to find

  • Patterns
  • Associations/Relationships

Association Analysis

Find hidden assocations and patterns, using association rules

Applications

  • Gene Discovery
  • Market Baset Data Analysis Find items that are bought together

Clustering/Cluster Analysis

Grouping similar customers

Metrics

  • Similarity
  • Dissimilarity/Distance Metrics

Applications

  • Grouping similar documents

  • Clustering documents

  • Vocabulary - All terms(key words) from all docs

  • Generate document-term frequency matrix

    Document \vert Term T1 T2 … Tn
    D1
    D2
    …
    Dm

Deviation/Outlier/Anomaly Detection

Outlier is a data point that does not follow the norms.

Don’t mistake outlier for noise.

Application

  • Credit Card Fraud Detection

    • Collect user profile such as Name, Age, Location
    • Collect user behavior data
  • Network Intrusion Detection

  • Identify anomalous behavior from surveillance camera videos
Last Updated: 2024-05-12 ; Contributors: AhmedThahir

Comments