BIA_T09_Classification _exercise.Rmd 3.13 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
---
title:  'Tutorial 9: Classification'
output: html_notebook
editor_options: 
  chunk_output_type: inline
---

This file is part of the lecture Business Intelligence & Analytics (EESYS-BIA-M), Information Systems and Energy Efficient Systems, University of Bamberg.


```{r Load libraries}
Weigert, Andreas's avatar
Weigert, Andreas committed
12
library(FSelector) #for feature selection / you need Java installed to load this package
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
library(party) #for classification algorithm decision trees
library(class) #for classification algorithm kNN
library(e1071) #for classification algorithm SVM
library(randomForest) #further random forest
```



```{r Load and prepare data}
# Load data

# Derive and investigate the dependent variable "number of residents"

```

```{r Detailed analysis of the independent variables}
# Descriptive analysis of load traces -------------------------------------
# Plot some load curves from households to get familiar with the data

household <- 8

```


```{r Feature extraction}
# Define and implement 10 features from SMD (e.g. mean consumption, mean 
# consumption in the evening)

Weigert, Andreas's avatar
Weigert, Andreas committed
41
42
43
44
45
46
47
48
calcFeatures.smd <- function(SMD){
  #SMD: the load trace for one week (vector with 672 elements)
  
  #create a matrix with 7 columns for each day
  dm15=matrix(as.numeric(SMD),ncol=7)

  # define some times
  weekday <-   1:(5*4*24)
Weigert, Andreas's avatar
Weigert, Andreas committed
49
50
51
52
53
54
  weekend <-   
  night <-     
  morning <-   
  noon <-      
  afternoon <- 
  evening <-   
Weigert, Andreas's avatar
Weigert, Andreas committed
55
56
57
58
59
60
  
  #data.frame for the results
  D=data.frame(c_week=mean(dm15, na.rm = T))
  
  #calculate consumption features
  D$c_night <-     mean(dm15[night,     1:7], na.rm = T)
Weigert, Andreas's avatar
Weigert, Andreas committed
61
62
63
64
  D$c_morning <-   mean()
  D$c_noon <-      mean()
  D$c_afternoon <- mean()
  D$c_evening <-   mean()
Weigert, Andreas's avatar
Weigert, Andreas committed
65
66
  
  #calculate statistical features
Weigert, Andreas's avatar
Weigert, Andreas committed
67
68
69
70
  D$s_we_max <- max()
  D$s_we_min <- min()
  D$s_wd_max <- max()
  D$s_wd_min <- min()
Weigert, Andreas's avatar
Weigert, Andreas committed
71
72
73
74
  
  #calculate relations
  D$r_min_wd_we <- D$s_wd_min / D$s_we_min #division by 0 leads to NaN!
  D$r_min_wd_we <- ifelse(is.na(D$r_min_wd_we), 0, D$r_min_wd_we)
Weigert, Andreas's avatar
Weigert, Andreas committed
75
76
  D$r_max_wd_we <- 
  D$r_max_wd_we <- 
Weigert, Andreas's avatar
Weigert, Andreas committed
77
78
79
  
  return(D)
}
80
81

#calculate the features for one household
Weigert, Andreas's avatar
Weigert, Andreas committed
82
calcFeatures.smd(smd[2,])
83

Weigert, Andreas's avatar
Weigert, Andreas committed
84
85
86
87
features <- calcFeatures.smd(smd[1,])
for(i in 2:nrow(smd)){
  features <- rbind(features, calcFeatures.smd(smd[i,]))
}
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
```


```{r Feature selection}
# Feature filtering  -------------------------------------------------------
# Combine all features in one data frame and apply feature selection methods from the FSelector package. 
# a) Which features are selected? 
# b) Can you explain why those features might be selected?

#combine all datasets


#simple call of the feature selection function


#correlation based filter (2 similar ways to call the method)

#further feature filter

```


```{r Classification Basic evaluation approach}

Weigert, Andreas's avatar
Weigert, Andreas committed
112
## decison tree
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129

#train the model

#predict test cases

#create confusion matrix and calculate accuracy

## random forest

#train the model

#predict test cases

#create confusion matrix and calculate accuracy

## kNN

Weigert, Andreas's avatar
Weigert, Andreas committed
130
# predict test cases from training data (lazy learning algorithm has no explicit training step!)
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146

#create confusion matrix and calculate accuracy

## SVM

#train the model

#predict the test cases

#create confusion matrix and calculate accuracy
```