Data Preparation in R

Click to View Rpubs Report
Click to View Github Code and Dataset

OBJECTIVE

This Mini Project involves data preparation of a dataset which includes income ranges and various attributes of people belonging to a region in order to make it fit for futher analysis and model building.

Data preparation is the vital and most time consuming activity which takes care of making the data fit for model building and analysis according to the business requirements.

TAGS

Dummy variables, flag variables , library dplyr

PROJECT METHODOLOGY

The project involves the following steps:

  • 1.Creating dummy variable for character variables.
  • 2.Grouping similar category variables and making dummies.
  • 3.Dealing with flag variables.(for numeric variables)
  • 4.Converting the target Variable.(Y)

DATA DICTIONARY

  • census_income.csv is a csv file containing 32561 obs and 15 variables.
  • It describes the income ranges of a population with their characteristic attributes.
  • The income ranges of people is >50k and <=50k which is stored in target variable Y.
  • We need to prepare data for the remaining (14) variables which can be further usefull in building models and analysis.

Feel Free To Get In Touch With Me!