Commit 4ade03c6 authored by Weigert, Andreas's avatar Weigert, Andreas
Browse files

added feature selection solution

parent 8423bf0e
......@@ -129,3 +129,32 @@ for(i in 2:nrow(smd)){
features <- rbind(features, calcFeatures.smd(smd[i,]))
}
```
```{r Feature selection}
# Feature filtering -------------------------------------------------------
# Combine all features in one data frame and apply feature selection methods from the FSelector package.
# a) Which features are selected?
# b) Can you explain why those features might be selected?
#combine all datasets
alldata <- cbind(customers, features)
#simple call of the feature selection function
cfs(pNumResidents ~ ., alldata)
#Problem: other dependant variables are selected -> only use relevant variables in feature setection!
#create a vector containing all feature names
all.features <- setdiff(colnames(alldata), c("VID", "residents.numAdult",
"residents.numChildren", "housing.type","pNumResidents"))
#correlation based filter (2 similar ways to call the method)
selected.features <- cfs(formula = "pNumResidents ~ .", data = alldata[,c("pNumResidents", all.features)])
selected.features <- cfs(formula=as.simple.formula(class="pNumResidents", attributes = all.features),
data = alldata)
#further feature filter
selected.features2 <- consistency(formula=as.simple.formula(class="pNumResidents",
attributes = all.features), data = alldata)
```
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment