BIA_T09_Classification.Rmd 1.67 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
title:  'Tutorial 9: Classification'
output: html_notebook
editor_options: 
  chunk_output_type: inline
---

This file is part of the lecture Business Intelligence & Analytics (EESYS-BIA-M), Information Systems and Energy Efficient Systems, University of Bamberg.


```{r Load libraries}
library(FSelector) #for feature selection
library(party) #for classification algorithm decision trees
library(class) #for classification algorithm kNN
library(e1071) #for classification algorithm SVM
library(randomForest) #further random forest
```



```{r Load and prepare data}
# Load data
load("../data/classification.RData")

# Derive and investigate the dependent variable "number of residents"
adults <- as.integer(ifelse(customers$residents.numAdult=="5 oder mehr",
                            "5",customers$residents.numAdult))
children <- as.integer(ifelse(customers$residents.numChildren=="5 oder mehr",
                              "5",customers$residents.numChildren))

table(ifelse(is.na(children), adults, adults+children))
# think in classes. we have some very rare classes of number of residents (>5)

customers$pNumResidents <- sapply(ifelse(is.na(children), adults, adults+children), 
                                       function(a) {
  if(a==0 || is.na(a)){
    return(NA)
  } else if(a==1){
    return("1 person")
  } else if(a==2){
    return("2 persons")
  } else if(a<=5){
    return("3-5 persons")
  } else {
    return(">5 persons")
  }
})

customers$pNumResidents <- ordered(customers$pNumResidents, 
                                      levels=c("1 person", "2 persons", 
                                               "3-5 persons", ">5 persons"))
table(customers$pNumResidents)
```