Commit ff375191 authored by Tobias Weiss's avatar Tobias Weiss
Browse files

- moved code to new repo

parent acdcc9a2
Copyright 2021 - present Jacqueline Wastensteiner, Tobias Weiss, Felix Haag, Konstantin Hopf
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
\ No newline at end of file
# household-classification-explainable-ai
Source code for novel feedback element for electricity consumption data based on 30-min smart meter readings using explainable AI (LIME, SHAP) methods.
\ No newline at end of file
Source code for novel feedback element for electricity consumption data based on 30-min smart meter readings using explainable AI (LIME, SHAP) methods.
## Structure of this project
```bash
├── classifiers
│ ├── dummy_classifier.py => Class for a dummy classifier used in context of providing accuracy
│ ├── inception.py => Class for training a single Inception network
│ ├── nne.py => Class for creating an ensemble-learner out of n Inception networks
│ ├── random_forrest.py => Class for creating a random forest incl. feature extraction
├── explanations_lime => contains the with LIME created explanations for all household properties as .html-files
├── explanations_shap => contains the with SHAP created explanations for all household properties as .html-files
├── plots
│ ├── stability => contains plots indicating the distribution of the relevant measurements estimated by SHAP for all propertiess
│ ├── feedback_elements => contains plots used within the feedback elements
├── results
│ ├── inception => classification results of InceptionTime for all household properties
│ ├── perturbation => results of the perturbation analysis as raw data
│ ├── random_forrest => classification results of the random forest
│ ├── stability => results of the stability analysis as raw data
├── utils
│ ├── constants.py => file for managing folder paths, household properties and amount of weeks used for training the classifiers
│ ├── data_preparation.py => file containing functions for the data preparation of individual household properties
│ ├── utils.py => file containing helper functions
├── xai
│ ├── xai_comparison.py => file for conducting the stability and the perturbation analysis
│ ├── xai_lime.py => class containing the lime_explainer which is able to generate explanations
│ ├── xai_plots => file for creating the four different plots used within the feedback elements
│ ├── xai_shap.py => class containing the shap_explainer which is able to generate explanations
│ ├── xai_shap.py => class containing the shap_explainer which is able to generate explanations
├── .gitignore => gitignore for leaving files out for git
├── main.py => file upon which all potential actions of this project can be started using system specific parameters
└── .gitignore
```
## Setup
* Python 3.6 has to be installed on your system
* Set the root directory pointing to this directory as an absoloute path in utils/constants.py
* Define the desired household properties for classifcation and explanation in utils/constants.py
* Define the desired amount of weeks used for training in utils/constants.py
* Ensure that a function for preparing the data for all defined household properties exists in utils/data_preparation (in the function prepare_data())
* Install all packages from requirements.txt with pip `pip install -r requirements.txt` or `pip3 install -r requirements.txt`- depending on your configuraton.
## Data
The data used in this project comes from the [UCR/UEA archive](http://timeseriesclassification.com/TSC.zip).
We used the 85 datasets listed [here](https://www.cs.ucr.edu/~eamonn/time_series_data/). The data is not included within this repository
import pandas as pd
from utils.constants import ROOT_DIRECTORY
from sklearn.dummy import DummyClassifier
class DummyClf:
def __init__(self, property_name):
self.property_name = property_name
self.labels = list()
self.data_frame = pd.DataFrame()
def classify(self):
data = pd.read_csv(ROOT_DIRECTORY + "data/prepared_data_by_properties/normalized/" + self.property_name + '/' + self.property_name + '_test', delimiter=',')
data_prediction = pd.read_csv(ROOT_DIRECTORY + "data/prepared_data_by_properties/normalized/" + self.property_name + '/' + self.property_name + '_test', delimiter=',')
self.labels = data.iloc[:, 0]
self.data_frame = data.iloc[:, 1:]
dummy_clf = DummyClassifier(strategy="stratified")
#dummy_clf.fit(self.data_frame, self.labels)
dummy_clf.fit(data.iloc[:, 1:], data.iloc[:, 0])
#dummy_clf.predict(data_prediction.iloc[:, 1:])
dummy_clf.predict(data_prediction.iloc[:, 1:])
print(dummy_clf.score(data_prediction.iloc[:, 1:], data_prediction.iloc[:, 0]))
\ No newline at end of file
# resnet model
import keras
import numpy as np
import time
import tensorflow as tf
from utils.utils import save_logs
from utils.utils import calculate_metrics
from utils.utils import save_test_duration
from keras.callbacks import EarlyStopping
class Classifier_INCEPTION:
def __init__(self, output_directory, input_shape, nb_classes, verbose=False, build=True, batch_size=64,
nb_filters=32, use_residual=True, use_bottleneck=True, depth=6, kernel_size=41, nb_epochs=300):
self.output_directory = output_directory
self.nb_filters = nb_filters
self.use_residual = use_residual
self.use_bottleneck = use_bottleneck
self.depth = depth
self.kernel_size = kernel_size - 1
self.callbacks = None
self.batch_size = batch_size
self.bottleneck_size = 32
self.nb_epochs = nb_epochs
if build == True:
self.model = self.build_model(input_shape, nb_classes)
if (verbose == True):
self.model.summary()
self.verbose = verbose
self.model.save_weights(self.output_directory + 'model_init.hdf5')
def _inception_module(self, input_tensor, stride=1, activation='linear'):
#Bottleneck is used for better performance since it reduces model complexity though it does not lead to higher accuracy
if self.use_bottleneck and int(input_tensor.shape[-1]) > 1:
input_inception = keras.layers.Conv1D(filters=self.bottleneck_size, kernel_size=1,
padding='same', activation=activation, use_bias=False)(input_tensor)
else:
input_inception = input_tensor
# kernel_size_s = [40, 20, 10]
# For Conv1D-Layer which are connected with the bottleneck
kernel_size_s = [self.kernel_size // (2 ** i) for i in range(3)]
conv_list = []
for i in range(len(kernel_size_s)):
conv_list.append(keras.layers.Conv1D(filters=self.nb_filters, kernel_size=kernel_size_s[i],
strides=stride, padding='same', activation=activation, use_bias=False)(
input_inception))
max_pool_1 = keras.layers.MaxPool1D(pool_size=3, strides=stride, padding='same')(input_tensor)
conv_6 = keras.layers.Conv1D(filters=self.nb_filters, kernel_size=1,
padding='same', activation=activation, use_bias=False)(max_pool_1)
conv_list.append(conv_6)
x = keras.layers.Concatenate(axis=2)(conv_list)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Activation(activation='relu')(x)
return x
def _shortcut_layer(self, input_tensor, out_tensor):
shortcut_y = keras.layers.Conv1D(filters=int(out_tensor.shape[-1]), kernel_size=1,
padding='same', use_bias=False)(input_tensor)
shortcut_y = keras.layers.normalization.BatchNormalization()(shortcut_y)
x = keras.layers.Add()([shortcut_y, out_tensor])
x = keras.layers.Activation('relu')(x)
return x
def build_model(self, input_shape, nb_classes):
input_layer = keras.layers.Input(input_shape)
x = input_layer
input_res = input_layer
for d in range(self.depth):
x = self._inception_module(x)
if self.use_residual and d % 3 == 2:
x = self._shortcut_layer(input_res, x)
input_res = x
gap_layer = keras.layers.GlobalAveragePooling1D()(x)
output_layer = keras.layers.Dense(nb_classes, activation='softmax')(gap_layer)
model = keras.models.Model(inputs=input_layer, outputs=output_layer)
model.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(),
metrics=['accuracy'])
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.5, patience=50,
min_lr=0.0001)
file_path = self.output_directory + 'best_model.hdf5'
model_checkpoint = keras.callbacks.ModelCheckpoint(filepath=file_path, monitor='loss',
save_best_only=True)
early_stopping = EarlyStopping(monitor='loss', patience=50)
self.callbacks = [early_stopping ,reduce_lr, model_checkpoint]
return model
def fit(self, x_train, y_train, x_val, y_val, y_true, plot_test_acc=False):
# if len(keras.backend.tensorflow) == 0
# print('error no gpu')
# exit()
# else:
# keras
# print('GPU found')
# x_val and y_val are only used to monitor the test loss and NOT for training
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
#batch_size defines the number of samples that will be propagated through the network, at least 16 if none is given
#x_train.shape[0] = länge der trainingsdaten
if self.batch_size is None:
mini_batch_size = int(min(x_train.shape[0] / 10, 16))
else:
mini_batch_size = self.batch_size
start_time = time.time()
if plot_test_acc:
hist = self.model.fit(x_train, y_train, batch_size=mini_batch_size, epochs=self.nb_epochs,
verbose=self.verbose, validation_data=(x_val, y_val), callbacks=self.callbacks)
else:
hist = self.model.fit(x_train, y_train, batch_size=mini_batch_size, epochs=self.nb_epochs,
verbose=self.verbose, callbacks=self.callbacks)
duration = time.time() - start_time
self.model.save(self.output_directory + 'last_model.hdf5')
y_pred = self.predict(x_val, y_true, x_train, y_train, y_val,
return_df_metrics=False)
# save predictions
np.save(self.output_directory + 'y_pred.npy', y_pred)
# convert the predicted from binary to integer
y_pred = np.argmax(y_pred, axis=1)
df_metrics = save_logs(self.output_directory, hist, y_pred, y_true, duration,
plot_test_acc=plot_test_acc)
keras.backend.clear_session()
return df_metrics
def predict(self, x_test, y_true, x_train, y_train, y_test, return_df_metrics=True):
start_time = time.time()
model_path = self.output_directory + 'best_model.hdf5'
model = keras.models.load_model(model_path)
y_pred = model.predict(x_test, batch_size=self.batch_size)
if return_df_metrics:
y_pred = np.argmax(y_pred, axis=1)
df_metrics = calculate_metrics(y_true, y_pred, 0.0)
return df_metrics
else:
test_duration = time.time() - start_time
save_test_duration(self.output_directory + 'test_duration.csv', test_duration)
return y_pred
import keras
import numpy as np
from utils.utils import calculate_metrics
from utils.utils import create_directory
from utils.utils import check_if_file_exits
import gc
import time
class Classifier_NNE:
def create_classifier(self, model_name, input_shape, nb_classes, output_directory, verbose=False,
build=True):
if self.check_if_match('inception*', model_name):
from classifiers import inception
return inception.Classifier_INCEPTION(output_directory, input_shape, nb_classes, verbose,
build=build)
def check_if_match(self, rex, name2):
import re
pattern = re.compile(rex)
return pattern.match(name2)
def __init__(self, output_directory, input_shape, nb_classes, verbose=False, nb_iterations=5,
clf_name='inception'):
self.classifiers = [clf_name]
out_add = ''
for cc in self.classifiers:
out_add = out_add + cc + '-'
self.archive_name = 'Ensemble'
self.iterations_to_take = [i for i in range(nb_iterations)]
for cc in self.iterations_to_take:
out_add = out_add + str(cc) + '-'
self.output_directory = output_directory.replace('nne',
'nne' + '/' + out_add)
create_directory(self.output_directory)
self.dataset_name = output_directory.split('/')[-2]
self.verbose = verbose
self.models_dir = output_directory.replace('nne', 'classifier')
def fit(self, x_train, y_train, x_test, y_test, y_true):
# no training since models are pre-trained
start_time = time.time()
y_pred = np.zeros(shape=y_test.shape)
ll = 0
# loop through all classifiers
for model_name in self.classifiers:
# loop through different initialization of classifiers
for itr in self.iterations_to_take:
if itr == 0:
itr_str = ''
else:
itr_str = '_itr_' + str(itr)
curr_archive_name = self.archive_name + itr_str
curr_dir = self.models_dir.replace('classifier', model_name).replace(
self.archive_name, curr_archive_name)
model = self.create_classifier(model_name, None, None,
curr_dir, build=False)
predictions_file_name = curr_dir + 'y_pred.npy'
# check if predictions already made
if not check_if_file_exits(predictions_file_name):
# then load only the predictions from the file
curr_y_pred = np.load(predictions_file_name)
else:
# then compute the predictions
curr_y_pred = model.predict(x_test, y_true, x_train, y_train, y_test,
return_df_metrics=False)
keras.backend.clear_session()
np.save(predictions_file_name, curr_y_pred)
y_pred = y_pred + curr_y_pred
ll += 1
# average predictions
y_pred = y_pred / ll
# save predictions
np.save(self.output_directory + 'y_pred.npy', y_pred)
# convert the predicted from binary to integer
y_pred = np.argmax(y_pred, axis=1)
duration = time.time() - start_time
df_metrics = calculate_metrics(y_true, y_pred, duration)
df_metrics.to_csv(self.output_directory + 'df_metrics.csv', index=False)
gc.collect()
def get_prediction_scores(self, property_name, x_train, y_train, x_test, y_test, y_true, nb_classes, y_true_train):
model = keras.models.load_model('./results2/inception/' + self.property_name + '/best_model.hdf5', compile=False)
curr_y_pred = model.predict(x_test, y_true, x_train, y_train, y_test,
return_df_metrics=True)
print(curr_y_pred)
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment