Machine Learning Concepts (Part 2)

Content

Content
How to Choose a Feature Selection Method For Machine Learning?
1. Feature Selection Algorithms
2. Feature Selection Checklist
3. Filter Method
4. 2. Statistics for Filter-Based Feature Selection Methods
5. Correlation Statistics
6. Selection Method
7. Other
8. Assumptions:
9. Worked Examples of Feature Selection
10. Other methods
Different Statistical tests for feature selection
1. ANOVA
2. t-test
3. CHI Square test
Student t-distribution
Z-test
1. More on Z and t statistics
Kendall’s Tau (Kendall Rank Correlation Coefficient)
How autoML works?
1. What is AutoML?
What are the parameters in training a decision tree?
What is the philosophy behind Decision Tree?
How to build decision tree?
1. What are the criteria for splitting at a node in decision trees ?
What is the formula of Gini index criteria?
1. How is it decided that on which features it has to split?
What is the formula for Entropy criteria?
What is KL Divergence?
Divergence
Kolmogorov Complexity
How do you calculate information gain mathematically?
Pros and Cons of Decision Trees:
Philosophy behind Bagging?
Ensemble methods:
What is the advantage with random forest ?
1. Why ensemble is good?
2. Ensemble Learning algorithm
Characteristics of Different Learning Methods
Boosting algorithms
Do you know about Adaboost algorithm ? How and why does it work ?
How does gradient boosting works ?
1. Difference of AdaBoost, Gradient Boost and XGBoost
  1. Bagging boosting difference:
XGBoost: Extreme Gradient Boosting
1. Objective Function
2. System design
Logistic Regression
1. What is the loss function for logistic regression?
2. Comparison of SVM and Logistic Loss?
Why is logistic regression considered as a linear model?
SVM Summary:
1. Algorithm
2. Hard Margin
3. Soft Margin
4. Formulate SVM with loss function and solve by gradient decent
5. What sort of optimization problem would you be solving to train a support vector machine?
6. What are the kernels used in SVM ?
7. What is the optimization technique of SVM?
8. Why bring Lagrange Multiplier for solving the SVM problem?
9. KKT Condition for SVM?
10. Geometric analysis of Lagrangian, KKT, Dual
  1. How does SVM learns non-linear boundaries ? Explain.
11. SVM: Regularized Loss Function View
Constrained optimization (Lagrangian)
Talking about unsupervised learning? What are the algorithms ?
1. How do you decide K in K-Means clustering algorithm ?Tell me at least
2. How do you seed k-means algorithm,i.e. how to decide the first k clusters?
3. When K-means will fail?
What other clustering algorithms do you know?
What is DB-SCAN algorithm ?
How does HAC (Hierarchical Agglomerative clustering) work ?
The Inductive Biases of Various Machine Learning Algorithms
How do you deploy Machine Learning models ?
How the model varies in KNN for $K=1$ and $K=N$?
Generative model vs Discriminative model.
Scenario based Question
Why is naive bayes called “naive”? Tell me about naive bayes classifier?
Logistic Regression loss function?
What do you mean by mutable and immutable objects in python ?
1. What are the data structures you have used in python ?
Difference between Multi-Class and Multi-Label classification
1. Multi-Class Classification
2. Multi-Label Classification
How do you handle multi-class classification with unbalanced dataset ?
How do you select between 2 models (Model Selection techniques)?
1. How does it work mathematically? Explain the intuition behind BIC or AIC ?
What is precision and recall? Which one of this do you think is important in medical diagnosis?
1. Type I and Type II Errors
ROC Curve Analysis
1. ROC curve
2. AUC: Area Under the ROC Curve
3. What does AUC-ROC curve signify ?
  1. How do you draw AUC-ROC curve ?
  2. How will you draw ROC for multi class classification problem
What is random about Random Forest?
Metric to measure multi-class classification result?
How is using a logistic regression different from using a random forest ?
Which model would you use in case of unbalanced dataset: Random Forest or Boosting ? Why ?
How to prepare for ML Interview?
Question source:
Exercise

How to Choose a Feature Selection Method For Machine Learning?

Feature selection is the process of reducing the number of input variables when developing a predictive model.

Feature-based feature selection methods involve evaluating the relationship between each input variable and the target variable using statistics and selecting those input variables that have the strongest relationship with the target variable. These methods can be fast and effective, although the choice of statistical measures depends on the data type of both the input and output variables.

Feature Selection Algorithms

There are three general classes of feature selection algorithms: filter methods, wrapper methods and embedded methods.

Filter Methods: Filter feature selection methods apply a statistical measure to assign a scoring to each feature. The features are ranked by the score and either selected to be kept or removed from the dataset. The methods are often univariate and consider the feature independently, or with regard to the dependent variable. Example: Chi squared test, information gain and correlation coefficient scores.

Wrapper Methods: Wrapper methods consider the selection of a set of features as a search problem, where different combinations are prepared, evaluated and compared to other combinations. A predictive model is used to evaluate a combination of features and assign a score based on model accuracy. Example: sklearn.feature_selection.RFE

Main difference of filter method with wrapper method is that, in filter method, before applying the model we are filtering the feature. This is quite helpful if running the model is a costly affair and also our data set is quite huge. Because in wrapper method to evaluate each combination of feature, we need to build and train the model first and only then we can evaluate.

Embedded Methods: Embedded methods learn which features best contribute to the accuracy of the model while the model is being created. The most common type of embedded feature selection methods are regularization based methods. They help to attain automatic feature selection.

Feature Selection Checklist

Do you have domain knowledge?
1. If yes, construct a better set of ad hoc features
Are your features commensurate (i.e comparable)?
1. If no, consider normalizing them.
Do you suspect interdependence of features?
1. If yes, expand your feature set by constructing conjunctive features or products of features, as much as your computer resources allow you.
Do you need to prune the input variables (e.g. for cost, speed or data understanding reasons)?
1. If no, construct disjunctive features or weighted sums of feature
Do you need to assess features individually (e.g. to understand their influence on the system or because their number is so large that you need to do a first filtering)?
1. If yes, use a variable ranking method; else, do it anyway to get baseline results.
Do you need a predictor? If no, stop
Do you suspect your data is dirty (has a few meaningless input patterns and/or noisy outputs or wrong class labels)?
1. If yes, detect the outlier examples using the top ranking variables obtained in step 5 as representation; check and/or discard them.
Do you know what to try first ?
1. If no, use a linear predictor. Use a forward selection method with the “probe” method as a stopping criterion or use the 0-norm embedded method for comparison, following the ranking of step 5, construct a sequence of predictors of same nature using increasing subsets of features. Can you match or improve performance with a smaller subset?
2. If yes, try a non-linear predictor with that subset.
Do you have new ideas, time, computational resources, and enough examples?
1. If yes, compare several feature selection methods, including your new idea, correlation coefficients, backward selection and embedded methods. Use linear and non-linear predictors. Select the best approach with model selection
Do you want a stable solution (to improve performance and/or understanding)?
1. If yes, subsample your data and redo your analysis for several bootstrap.

Machine Learning Concepts (Part 2)

Content

How to Choose a Feature Selection Method For Machine Learning?

Feature Selection Algorithms

Feature Selection Checklist

Filter Method

2. Statistics for Filter-Based Feature Selection Methods

Correlation Statistics

Selection Method

Other

Assumptions:

Worked Examples of Feature Selection

Other methods

Different Statistical tests for feature selection

ANOVA

t-test

CHI Square test

Student t-distribution

Z-test

More on Z and t statistics

Kendall’s Tau (Kendall Rank Correlation Coefficient)

How autoML works?

What is AutoML?

What are the parameters in training a decision tree?

What is the philosophy behind Decision Tree?

How to build decision tree?

What are the criteria for splitting at a node in decision trees ?

What is the formula of Gini index criteria?

How is it decided that on which features it has to split?

What is the formula for Entropy criteria?

What is KL Divergence?

Divergence

Kolmogorov Complexity

How do you calculate information gain mathematically?

Pros and Cons of Decision Trees:

Philosophy behind Bagging?

Ensemble methods:

What is the advantage with random forest ?

Why ensemble is good?

Ensemble Learning algorithm

Characteristics of Different Learning Methods

Boosting algorithms

Do you know about Adaboost algorithm ? How and why does it work ?

How does gradient boosting works ?

Difference of AdaBoost, Gradient Boost and XGBoost

Bagging boosting difference:

XGBoost: Extreme Gradient Boosting

Objective Function

System design

Logistic Regression

What is the loss function for logistic regression?

Comparison of SVM and Logistic Loss?

Why is logistic regression considered as a linear model?

SVM Summary:

Algorithm

Hard Margin

Soft Margin

Formulate SVM with loss function and solve by gradient decent

What sort of optimization problem would you be solving to train a support vector machine?

What are the kernels used in SVM ?

What is the optimization technique of SVM?

Why bring Lagrange Multiplier for solving the SVM problem?

KKT Condition for SVM?

Geometric analysis of Lagrangian, KKT, Dual

How does SVM learns non-linear boundaries ? Explain.

SVM: Regularized Loss Function View

Constrained optimization (Lagrangian)

Talking about unsupervised learning? What are the algorithms ?

How do you decide K in K-Means clustering algorithm ?Tell me at least

How do you seed k-means algorithm,i.e. how to decide the first k clusters?

When K-means will fail?

What other clustering algorithms do you know?

What is DB-SCAN algorithm ?

How does HAC (Hierarchical Agglomerative clustering) work ?

The Inductive Biases of Various Machine Learning Algorithms

How do you deploy Machine Learning models ?

How the model varies in KNN for $K=1$ and $K=N$?

Generative model vs Discriminative model.

Scenario based Question

Why is naive bayes called “naive”? Tell me about naive bayes classifier?

More on `Z` and `t` statistics

Constrained optimization (`Lagrangian`)

How do you `seed` k-means algorithm,i.e. how to decide the first `k` clusters?

How the model varies in `KNN` for $K=1$ and $K=N$?