Deep Dive on Attrition

Stuart Miller August 14, 2019

# import libraries
library(knitr)
library(tidyverse)
library(naniar)
library(Hmisc)
library(GGally)
library(gridExtra)
library(RColorBrewer)
library(gplots)
library(corrplot)
library(ggthemes)

# import helper functions
source('../helper/data_munging.R')
source('../helper/visual.R')

# read in data
train <- read_csv('../data/CaseStudy2-data_train.csv')

# create a vector of numeric features
features.numeric <- c('DailyRate', 'DistanceFromHome', 'Age', 'HourlyRate', 'MonthlyIncome', 'MonthlyRate',
           'NumCompaniesWorked','PercentSalaryHike', 'TotalWorkingYears', 'TrainingTimesLastYear',
           'YearsAtCompany','YearsInCurrentRole','YearsSinceLastPromotion', 'YearsWithCurrManager')

# create a vector of numeric features
features.factor <- c('BusinessTravel', 'Department', 'Education', 'EducationField',
                     'EnvironmentSatisfaction', 'Gender', 'JobInvolvement', 'JobLevel',
                     'JobRole', 'JobSatisfaction', 'MaritalStatus', 'OverTime',
                     'PerformanceRating', 'RelationshipSatisfaction', 'StockOptionLevel',
                     'WorkLifeBalance', 'Attrition')

# factor categorical variables
train[, features.factor] <- lapply(train[, features.factor], as.factor)

Analysis of Attrtion

The objective is to find where high attrition is correlated. Based on previous EDA, the following variables (or levels of variables) seem to be associated with attrition. The top 3 factors associated with attrition are numbered. The top 3 factors associated with attrition are examinated in association to other relative factors.

Factor

  • JobInvolvement - 1
  • WorkLifeBalance - 2
  • OverTime - 3
  • JobRole
  • JobLevel
  • JobSatisfaction

Numeric

  • MonthlyIncome
  • TotalWorkingYears
  • YearsInCurrentRole
  • YearsWithCurrmanager

Job Involvement

Job involvement appears to have a high impact on employee attrition. Nearly 50% with the lowest job involvement leave the position. It is expected that there would be an interaction with JobRole, JobLevel, and OverTime.

train %>% ggplot(aes(x = JobInvolvement, fill = Attrition)) +
  geom_bar(position = 'fill') +
  coord_flip() + 
  scale_fill_few(palette = 'Dark') + 
  theme_few()

Job Role Interaction

Almost all job roles show high attrition with low job involvement. With Human Resources and Sales Representative having the worst rates. Notibly, attrition rates of Manufacturing Director and Research Director do not seem to be affected by job involvement.

train %>% ggplot(aes(x = JobInvolvement, fill = Attrition)) +
  geom_bar(position = 'fill') +
  coord_flip() +
  facet_wrap( ~ JobRole) + 
  scale_fill_few(palette = 'Dark') + 
  theme_few()

Job Level Interaction

Attriton is high for all job levels where job level is low. It is especially high (over 50%) in level 1 and 5.

train %>% ggplot(aes(x = JobInvolvement, fill = Attrition)) +
  geom_bar(position = 'fill') +
  coord_flip() +
  facet_wrap( ~ JobLevel) + 
  scale_fill_few(palette = 'Dark') + 
  theme_few()

Overtime Interaction

Attrition is higher for eac level of job involvement, but it is also generally higher for Overtime workers.

train %>% ggplot(aes(x = JobInvolvement, fill = Attrition)) +
  geom_bar(position = 'fill') +
  coord_flip() +
  facet_wrap( ~ OverTime) + 
  scale_fill_few(palette = 'Dark') + 
  theme_few()

Work-Life Balence

Work-life balence appears to have a high impact on employee attrition. Nearly 50% with the lowest job involvement leave the position. It is expected that there would be an interaction with JobRole, JobLevel, and OverTime.

train %>% ggplot(aes(x = WorkLifeBalance, fill = Attrition)) +
  geom_bar(position = 'fill') +
  coord_flip() + 
  scale_fill_few(palette = 'Dark') + 
  theme_few()

Job Role Interaction

Sales Executives and Laboratory Technitions with low work-life balence have high attrition rates.

train %>% ggplot(aes(x = WorkLifeBalance, fill = Attrition)) +
  geom_bar(position = 'fill') +
  coord_flip() +
  facet_wrap( ~ JobRole) + 
  scale_fill_few(palette = 'Dark') + 
  theme_few()

Job Level Interaction

Low work-life balence appears to have an affect on lower job levels, especially level 1.

train %>% ggplot(aes(x = WorkLifeBalance, fill = Attrition)) +
  geom_bar(position = 'fill') +
  coord_flip() +
  facet_wrap( ~ JobLevel) + 
  scale_fill_few(palette = 'Dark') + 
  theme_few()

Overtime Interaction

Attrition is higher for each level of job involvement, but it is also generally higher for Overtime workers.

train %>% ggplot(aes(x = WorkLifeBalance, fill = Attrition)) +
  geom_bar(position = 'fill') +
  coord_flip() +
  facet_wrap( ~ OverTime) + 
  scale_fill_few(palette = 'Dark') + 
  theme_few()

Overtime

Job involvement appears to have a high impact on employee attrition. Nearly 50% with the lowest job involvement leave the position. It is expected that there would be an interaction with JobRole, JobLevel, and MonthlyIncome.

train %>% ggplot(aes(x = OverTime, fill = Attrition)) +
  geom_bar(position = 'fill') +
  coord_flip() + 
  scale_fill_few(palette = 'Dark') + 
  theme_few()

Job Role Interaction

Sales Executives and Laboratory Technitions with low work-life balence have high attrition rates.

train %>% ggplot(aes(x = OverTime, fill = Attrition)) +
  geom_bar(position = 'fill') +
  coord_flip() +
  facet_wrap( ~ JobRole) + 
  scale_fill_few(palette = 'Dark') + 
  theme_few()

Job Level Interaction

Low work-life balence appears to have an affect on lower job levels, especially level 1.

train %>% ggplot(aes(x = OverTime, fill = Attrition)) +
  geom_bar(position = 'fill') +
  coord_flip() +
  facet_wrap( ~ JobLevel) + 
  scale_fill_few(palette = 'Dark') + 
  theme_few()

MonthlyIncome Interaction

Monthly income appears to be lower for employees who are working overtime with attrition.

train %>% 
  ggplot(aes(x = Attrition,
             y = MonthlyIncome,
             fill = Attrition)) +
  geom_boxplot() + 
  facet_wrap(~ OverTime) +
  scale_fill_few(palette = 'Dark') + 
  theme_few() +
  ggtitle('MonthlyIncome by Attrition and Overtime')