By: Karthik Janar in data-science Tutorials on 2018-05-22
While doing data analysis, it is highly recommended to use proper naming conventions for files, variables and especially column names. This is very important for two reasons:
- When merging multiple datasets, if the column names are consistent, the merge happens seamlessly
- As with any programming language naming convention, it is good to have clean names without spaces and special characters etc.
The make.names() function in R does exactly that. To demonstrate the use of make.names() function, let us use a simple data frame.
Create a simple employee data frame using four variables and 4 rows of values.
vcode <- c(20001,20002,20003,20004) vFname <- c("Brian","Jeff","Roger","Karthik") vLname <- c("Caffo","Leek","Peng","Janar") vSal <- c(10000,15000,18000,20000) emp <- data.frame(vcode,vFname,vLname,vSal) str(emp)
## 'data.frame': 4 obs. of 4 variables: ## $ vcode : num 20001 20002 20003 20004 ## $ vFname: Factor w/ 4 levels "Brian","Jeff",..: 1 2 4 3 ## $ vLname: Factor w/ 4 levels "Caffo","Janar",..: 1 3 4 2 ## $ vSal : num 10000 15000 18000 20000
As you can see the str shows the column names as the name of the vectors we created earlier. So let us first add some column names as below. We have included some spaces and brackets purposely to show how make.names() converts them.
names(emp) <- c("Code","First Name","Last Name", "Salary(SGD)") str(emp)
## 'data.frame': 4 obs. of 4 variables: ## $ Code : num 20001 20002 20003 20004 ## $ First Name : Factor w/ 4 levels "Brian","Jeff",..: 1 2 4 3 ## $ Last Name : Factor w/ 4 levels "Caffo","Janar",..: 1 3 4 2 ## $ Salary(SGD): num 10000 15000 18000 20000
Now let us call makes.names() to clean the column names.
names(emp) <- make.names(names(emp)) emp
## Code First.Name Last.Name Salary.SGD. ## 1 20001 Brian Caffo 10000 ## 2 20002 Jeff Leek 15000 ## 3 20003 Roger Peng 18000 ## 4 20004 Karthik Janar 20000
Now the spaces and brackets are removed and replaced with dots and looks much cleaner. Make it a habit to always clean the column names of all data frames that you read from different file sources as a first step to data cleaning.
This policy contains information about your privacy. By posting, you are declaring that you understand this policy:
- Your name, rating, website address, town, country, state and comment will be publicly displayed if entered.
- Aside from the data entered into these form fields, other stored data about your comment will include:
- Your IP address (not displayed)
- The time/date of your submission (displayed)
- Your email address will not be shared. It is collected for only two reasons:
- Administrative purposes, should a need to contact you arise.
- To inform you of new comments, should you subscribe to receive notifications.
- A cookie may be set on your computer. This is used to remember your inputs. It will expire by itself.
This policy is subject to change at any time and without notice.
These terms and conditions contain rules about posting comments. By submitting a comment, you are declaring that you agree with these rules:
- Although the administrator will attempt to moderate comments, it is impossible for every comment to have been moderated at any given time.
- You acknowledge that all comments express the views and opinions of the original author and not those of the administrator.
- You agree not to post any material which is knowingly false, obscene, hateful, threatening, harassing or invasive of a person's privacy.
- The administrator has the right to edit, move or remove any comment for any reason and without notice.
Failure to comply with these rules may result in being banned from submitting further comments.
These terms and conditions are subject to change at any time and without notice.
Most Viewed Articles (in data-science )
Latest Articles (in data-science)
- Data Science
- React Native
- Cloud Computing
- Java Beans
- Mac OS X
- Office 365
- Tech Reviews