Commit 95f5574f authored by Hopf, Konstantin's avatar Hopf, Konstantin
Browse files

New directory structure for module relaunch WS2019/20

parent 1c9d281c
---
title: "Clustering with smart meter data"
output: html_notebook
editor_options:
chunk_output_type: inline
---
```{r load and format data}
# load and prepare the data ----
SMD_Readings=read.csv2("../data/SMD_BIA_Data.csv")
# we use the data for a single week
time_stamps <- as.numeric(as.character(unique(SMD_Readings$Timestamp)))
single_week <- as.character(time_stamps[time_stamps>25000 & time_stamps<25700])
length(single_week)
# select the measurements for one single week
SMD_Readings <- SMD_Readings[SMD_Readings$Timestamp %in% single_week, ]
# the reshape package helps us to transform the data innto a more condensed table form
#install.packages("reshape2")
library("reshape2")
Formatted_Readings <- dcast(SMD_Readings, ID ~ Timestamp, value.var = "Consumption")
head(Formatted_Readings)
# remmove the IDs from the matrix
IDs <- Formatted_Readings[,1]
Formatted_Readings[,1] <- NULL
```
```{r first k-Means clustering}
# simple k-means clustering ----
set.seed(1)
Cluster1 <- kmeans(Formatted_Readings, centers=3)
```
```{r function to plot the results}
# Creating cluster visualization ----
plotcl <- function(SMD_Readings, clusters,lim=c(0,2), ...){
plot(colMeans(SMD_Readings[,1:48]), type="n", ylim=lim, ...)
for(i in unique(clusters)){
lines(colMeans(SMD_Readings[clusters==i,1:48]),col=i, lwd=3)
}
}
plotclWeek <- function(SMD_Readings, clusters,lim=c(0,2), ...){
plot(colMeans(SMD_Readings[,]),type="n", ylim=lim, ...)
for(i in unique(clusters)){
lines(colMeans(SMD_Readings[clusters==i,]),col=i, lwd=3)
}
}
plotcl(Formatted_Readings, Cluster1$cluster, ylab="Consumption (kWh)")
plotclWeek(Formatted_Readings, Cluster1$cluster, ylab="Consumption (kWh)")
```
```{r second k-Meand clustering}
# Improve the clustering ----
#Normalize the values and then run the k-means again
set.seed(2)
Data_Norm <- Formatted_Readings/rowMeans(Formatted_Readings)
Cluster2 <- kmeans(Data_Norm, centers=3)
plotcl(Formatted_Readings, Cluster2$cluster, ylab="Consumption (kWh)", main="Max-normaized consumption", lim=c(0,1.2))
#Transform the values to a more normally distributed form and run the k-means agaein
Datas <- sqrt(Formatted_Readings)
Datasn <- Datas/rowMeans(Datas)
set.seed(4)
Cluster3 <- kmeans(Datasn, centers=3)
plotcl(Formatted_Readings, Cluster3$cluster, ylab="Consumption (kWh)", main="Sqrt and Max-normaized consumption", lim=c(0,1.2))
```
```{r obtain the optimal number of clusters}
set.seed(2)
Clusters <- list()
Clusters[[1]] <- kmeans(Datasn, centers=2)
Clusters[[2]] <- kmeans(Datasn, centers=3)
Clusters[[3]] <- kmeans(Datasn, centers=4)
Clusters[[4]] <- kmeans(Datasn, centers=5)
Clusters[[5]] <- kmeans(Datasn, centers=6)
Clusters[[6]] <- kmeans(Datasn, centers=7)
Clusters[[7]] <- kmeans(Datasn, centers=8)
Clusters[[8]] <- kmeans(Datasn, centers=9)
#the total sum of squares
tot.withinss <- sapply(Clusters, function(v){return(v$tot.withinss)})
plot(2:9, tot.withinss, xlab="Num. of clusters", ylab="Total sum of squares", type="b")
#the min / max sum of squares in the clusters
min.withinss <- sapply(Clusters, function(v){return(min(v$withinss))})
max.withinss <- sapply(Clusters, function(v){return(max(v$withinss))})
plot(2:9, max.withinss, xlab="Num. of clusters",
ylab="Within clusters sum of squares", type="b", ylim=c(min(min.withinss), max(max.withinss)))
lines(2:9, min.withinss, type="b", col=2)
legend("topright", c("Max. WSS", "Min. WSS"), col=c(1,2), lty=1)
numOneElemClusters <- sapply(Clusters, function(v){return(sum(v$size==1))})
barplot(numOneElemClusters, names.arg = 2:9, main="Single-element clusters",
xlab="Total number of k-Means clusters")
```
```{r hierarchical clustering}
library(cluster)
#create distance matrix
C <- 1-cor(t(Formatted_Readings))
Dendrogram <- agnes(C,diss=T,method="complete")
plot(Dendrogram, which.plot=2) #plot the dendrogram
Cluster4 <- cutree(Dendogram, k=3)
plotcl(Formatted_Readings, Cluster4, main="Hierarchical clustering results", ylab="Consumption (kWh)", lim=c(0,1.3))
```
This folder contains R files.
\ No newline at end of file
# 2018-WS-BIA
# Business Intelligence & Analytics (EESYS-BIA-M)
This repository contains the R code and data for exercises of the course Business Intelligene and Analytics and is maintained by the chair of Information Systems and Energy Efficient Systems.
\ No newline at end of file
This repository contains the R code and data for exercises of the course Business Intelligene & Analytics and is maintained by the chair of Information Systems and Energy Efficient Systems ([Dr. Konstantin Hopf](mailto:konstantin.hopf@uni-bamberg.de) and [Andreas Weigert](mailto:andreas.weigert@uni-bamberg.de)).
## Course details
For any details regarding the course, please visit the [Virtual Campus course page for the winter term 2019/20 ](https://vc.uni-bamberg.de/course/view.php?id=38311).
## Schedule (tentative)
**Important:** The Schedule and the scripts will change during the semester. Please check the updates in this GIT repository and the Virtual Campus course.
| Lecture | Topic | Script location(s) |
| ------- | ----- | ------------------ |
| L01 | Introduction ||
| L02 | Business data sources for analytics ||
| L03 | Talking about data ||
| L04 | Data quality, outlier and publica data for analytics ||
| L05 | Space and time in analytics | `Lecture_Scripts/BIA_L05_Geographic_Time_Data.Rmd` |
| L06 | Cluster analysis 1 ||
| L07 | Cluster analysis 2 | `Lecture_Scripts/BIA_L07_Clustering_SMD` |
| L08 | Predictive analytics 1 ||
| L09 | Predictive analytcis 2 ||
| L10 | Predictive analytics 3 ||
| L11 | Analytics for decision support ||
| L12 | Optimization and simulation ||
| L13 | Dashboards and BI solutions ||
| L14 | Legal and ethical issues of data analytics ||
| L15 | Q&A session ||
| Tutorials | Topic | Script location(s) |
| --------- | ----- | ------------------ |
| T01 | Introduction to R 1 | `Tutorial_Scripts/RIntro/` |
| T02 | Introduction to R 2 | `Tutorial_Scripts/RIntro/` |
| T03 | Introduction to R 3 | `Tutorial_Scripts/RIntro/` |
| T04 | Business Analytics in Organizations ||
| T05 | Research perspectives on BI & A in organizations ||
| T06 | Case study: Newsletter responses 1 | `Tutorial_Scripts/Case_NewsletterResponses/` |
| T07 | Case study: Newsletter responses 2 | `Tutorial_Scripts/Case_NewsletterResponses/` |
| T08 | Case study: Electric Vehicle analysis 1 | `Tutorial_Scripts/Case_ElectricVehicles/` |
| T09 | Case study: Electric Vehicle analysis 2 | `Tutorial_Scripts/Case_ElectricVehicles/` |
| T10 | Case study: Prediction for Energy Retailing 1 | `Tutorial_Scripts/Case_EnergyRetailAnalytics/` |
| T11 | Case study: Prediction for Energy Retailing 2 | `Tutorial_Scripts/Case_EnergyRetailAnalytics/` |
| T12 | Case study: Prediction for Energy Retailing 3 | `Tutorial_Scripts/Case_EnergyRetailAnalytics/` |
| T13 | Optimization and Decision Support | `Tutorial_Scripts/Optimization_DecisionSupport/`|
## Additional ressources
| `Tutorial_Scripts/Optimization_DecisionSupport/` | Additional exercises regarding data preparation, outlier detection, data understanding |
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment