An uncommon approach in tackling class imbalance

Class imbalance. Source. In supervised learning, one challenged faced by data scientists is classification class imbalance, where in a binary classification problem, instances in one class severely outnumbers instances in the other. This poses a problem as model performances may be misleading: a naive example would be to always predict negative in a 10% positive-90% … Continue reading An uncommon approach in tackling class imbalance →

Seven tips for working on analytics delivery projects

Image not relevant to content below. Source. Following are seven tips / tricks / hacks that I came to learnt (some of them the hard way) and compiled as a data scientist / delivery consultant / data science consultant. In brief, they are: You to yourself Develop a strategyKeep a delivery journalPlan your daily activitiesFrontload … Continue reading Seven tips for working on analytics delivery projects →

Paper review: To Tune or Not to Tune the Number of Trees in Random Forest

Plotting different performance metrics against the number of trees in random forest. Source. I came across the following paper during my Masters coursework that addresses a practical issue in the use of the random forest model, and in general, any other bootstrap aggregating ensembles: Probst, P. & Boulestix, A-L. (2018). To Tune or Not to … Continue reading Paper review: To Tune or Not to Tune the Number of Trees in Random Forest →

My Master of Science in Statistics programme in NUS

National University of Singapore I have gotten quite a couple of questions regarding my current MSc Statistics programme in NUS. Here are some broadstroke information about the programme and how I am approaching it. I'm doing the MSc by Coursework programme, which means a research thesis is not part of my curriculum. A MSc by … Continue reading My Master of Science in Statistics programme in NUS →

The Machine Learning Life Cycle – how to run a ML project

The Machine Learning Life Cycle from DataRobot. I recently came across this page in the DataRobot Artificial Intelligence Wiki. If you don't already know, DataRobot is currently one of the top automated machine learning platform in the market, with emphasis on supervised learning and citizen data science. I am quite a big fan of their … Continue reading The Machine Learning Life Cycle – how to run a ML project →

A repertoire of data scientist interview questions – with a twist

Machine Learning. Nothing to do with my intended topic, just a random xkcd comic that I thought is funny. Source. (This post will be continually updated so as to capture more questions and answers along the way. To be honest I don't think I have the best answers to some of these questions as well. … Continue reading A repertoire of data scientist interview questions – with a twist →

Why ensemble modelling works so well – and one often neglected principle

Putting models together in an ensemble learning fashion is a popular technique amongst Data Scientists Ensemble learning is the simultaneous use of multiple predictive models to arrive at a single prediction, based on a collective decision made together by all models in the ensemble. It's a common and popular technique used in predictive modelling, especially … Continue reading Why ensemble modelling works so well – and one often neglected principle →

First Post – What’s this blog about?

Girth matters? Data science is slowly becoming an irrational competition of computing power. Source. Hi there - not sure how you ended up here, but this is the first post of my blog, and my second attempt at starting a blog - my first attempt around late 2016 to mid 2017 lacked a consistent theme, was … Continue reading First Post – What’s this blog about? →