Sitemap

Power BI— Job change of data scientist

7 min readMay 18, 2022

--

Which factors lead a person to leave their current job?

This is the question the companies usually ask their employees and potential candidates.

If this question could be answered, the company may be able to reduce the number of employees who leave the organization and decrease the cost of finding a new employee.

In this article, I will show how I use Power BI to analyze factors that could affect job change decisions.

💽 Dataset

Data used in this article comes from

This dataset was collected by a company that is active in Big Data and Data Science and wants to hire data scientists among people who successfully pass some courses which conduct by the company.

The data contains 19,158 candidates and 14 columns

Columns Detail

  • enrollee_id: Unique ID for the candidate
  • city: City code
  • city_ development _index: Development index of the city (scaled)
  • gender: Gender of the candidate
  • relevent_experience: Relevant experience of the candidate
  • enrolled_university: Type of University course enrolled if any
  • education_level: Education level of candidate
  • major_discipline: Education major discipline of the candidate
  • experience: Candidate’s total experience in years
  • company_size: No of employees in current employer’s company
  • company_type: Type of current employer
  • last_new_job: Difference in years between previous job and current job
  • training_hours: training hours completed
  • target: 0 — Not looking for a job change, 1 — Looking for a job change

📊 EDA (Exploratory Data Analysis)

EDA steps

  1. Missing data detecting
  2. Univariate analysis
  3. Multivariate analysis

Step 1: Missing data detecting

The objective of this step is to quantify the number of missing values in each column before further analysis. For instance, we may exclude columns with too many missing values or try to fill in missing values.

After loading data from a csv file, click transform data to open Power Query Editor.

In Power Query Editor, you could know a ratio of missing values in each column from a color below a column name, grey is the ratio of missing values in that column. An example of a gender column is illustrated below.

Company size and company type are 2 columns that have the most missing value. The reason may be that some candidates are not working for the company yet (studying), so they don’t know how to answer these questions. My recommendation for the next questionnaire is to include the option “not working yet” in the questions asking about the company.

For this dataset, the number of missing values is not high, so we can use all columns for analysis.

Step 2: Univariate analysis

This step is like expanding all columns on the table to see an overview picture and ideate further analysis methods.

This is what I have done.

  1. Count the number of rows of entire data and interested data, who want to change a job in this case. Firstly, I create 2 new measurements: Count_people and Count_Change_Job Count_people = count(aug_train[enrollee_id]) Count_Change_Job = sum(aug_train[Is_Change_Job]) Then, use Cards to visualize.
  2. Know every value exists in each text column. Just drag the column on the dashboard.
  3. Plot distribution of the number columns. Use a clustered column chart.

Fortunately, the text columns of this data don’t contain many distinct values, so it is not difficult to plot.

Step 3: Multivariate analysis

In this step, I would like to know which column correlates with a change job decision.

I create 2 new measurements: Count_Not_Change_Job and percent_change_job.

Count_Not_Change_Job = count(aug_train[enrollee_id])-sum(aug_train[Is_Change_Job])

percent_change_job = aug_train[Count_Change_Job]*100/(aug_train[Count_Change_Job]+aug_train[Count_Not_Change_Job])

  1. Company size

In the first chart, I use a stacked bar chart to plot the number of people who want to change jobs and who don’t want to change jobs. The advantage of this graph type is that you can see the total number of people in each category. However, it is difficult to compare the change job portion of each category, so I plot a 100% stacked bar chart together to illustrate another dimension.

In the second chart, a 100% stacked bar chart changes all categories to 100%. It will be easier to find which company size their employees want to change jobs the most.

From these 2 graphs, we could observe that most people work in a company size of 50–499 employees. People are likely to change jobs if they work for a small company but it is not that distinctive from a larger company. The blank answer has the highest number of people who want to change jobs. The reason may be that they don’t work for any company yet, so, normally, they want to have a new job. Anyway, it is difficult to conclude this way since people may not answer this question because they just don’t want to expose their data.

2. Company type

Most people work for a private company. People who work for an early-stage startup are more likely to change jobs than those who work for a funded startup (a funded startup seems to be more stable). In my opinion, company type is not a useful feature in detecting people who want to change jobs since most people work for private companies anyway.

3. Education level

Most respondents have graduate degrees and they want to change jobs the most. Followed by people who have Master’s degrees.

4. Working year in the last new job

Most people just started their new jobs. The less they work for a company, the higher probability they want to change jobs. The reason may be that if they work longer for a company, they like their current jobs and don’t want to move.

5. Work experience

Most people who join training have more than 20 years of experience!!

Excluding people with more than 20 years of experience, there is a trend that fewer experience employees want to change jobs. Maybe they want to have more experience.

6. Relevant experience

Most respondents have related experience in data science. However, people who have no related experience interested in changing careers than those with experience.

7. Company size and education level

Matrix and conditional formatting are implemented to understand the correlation between more than 2 variables.

The plot above shows the percentage of people who want to change jobs based on their company size and education level. High green contrast means a higher percentage of people who want to change jobs. People with graduate degrees and no company size specified have the highest interest in changing jobs (maybe they just graduated and have no job).

I also plot histograms along the table to see a distribution of the population.

8. Number columns: city development index and training hours

To find a correlation between integer variables, a scatter plot is used.

In the above picture, I plot the percent change job and city development index. I specify bubble size by several people.

You may observe that people who live in a country with a low city development index are more likely to change jobs.

For training hours, people are clustered in the low training hour zone. No trend between training hours and percent change jobs.

🐚 Conclusion

By using Power BI, you could find the important factors that affect job change decisions. The next time you want to find a new data scientist employee, Try to search for candidates with have one of the following properties.

  • Working for the latest company for a short period
  • Low working experience
  • No relevant experience in data science
  • Just graduate and have no job right now

Please keep in mind that EDA is a quick process to get insight from data. If you want more accurate forecasting, data modeling is a better choice.

Hope you enjoy reading this article. Please let me know if you have any recommendations.

--

--

No responses yet