This Mini Project involves data preparation of a dataset
which includes income ranges and various attributes of people belonging to a region
in order to make it fit for futher analysis and model building.
Data preparation is the vital and most time consuming activity which
takes care of making the data fit for model building and analysis according to the business requirements.
TAGS
Dummy variables, flag variables , library dplyr
PROJECT METHODOLOGY
The project involves the following steps:
- 1.Creating dummy variable for character variables.
- 2.Grouping similar category variables and making dummies.
- 3.Dealing with flag variables.(for numeric variables)
- 4.Converting the target Variable.(Y)
DATA DICTIONARY
- census_income.csv is a csv file containing 32561 obs and 15 variables.
- It describes the income ranges of a population with their characteristic attributes.
- The income ranges of people is >50k and <=50k which is stored in target variable Y.
- We need to prepare data for the remaining (14) variables which can be further usefull in building models and analysis.