`mosmafs`

offers a variety of tools that make it possible to use the `ecr`

package for multi-objective optimization of mixed parameter spaces. Mixed here means spaces that both include categorical and numeric hyperparameters. The following (a little contrived) example shows how to use these tools.

`mosmafs`

crucially depends on the `ecr`

package (currently the github-version!), so make sure you have it installed and loaded. Other packages for this example should also be loaded.

To visualise what is going on, we will introduce a few deterministic operators that perform operations that can be easily understood and followed. These will all be mutation operators; recombination operators work the same but are more annoying to visualize.

The **flipflop** operator negates all bits of a binary individuum:

```
mutFlipflop <- makeMutator(
mutator = function(ind, ...) {
1 - ind
}, supported = "binary")
mutFlipflop(c(1, 0, 1))
#> [1] 0 1 0
```

The **cycle** operator cycles through the values of a discrete (`character`

) individuum:

```
mutCycle <- makeMutator(
mutator = function(ind, values) {
index <- mapply(match, ind, values)
index <- index %% viapply(values, length) + 1
mapply(`[[`, values, index)
}, supported = "custom")
mutCycle(c("a", "x", "z"), values = list(letters))
#> [1] "b" "y" "a"
```

The **plus** operator adds a parametervalue (default one) to the value of an individuum.

```
mutPlus <- makeMutator(
mutator = function(ind, summand = 1, ...) {
ind + summand
}, supported = "custom")
mutPlus(c(1, 2, 3))
#> [1] 2 3 4
```

The **reverse** operator reverses the order of an individuum:

The `mosmafs`

package comes with a few functions that generate synthetic datasets that can be used for feature selection. The following creates three columns of randomly sampled points in the `[-1, 1]^3`

cube and a target column indicating whether each point is a distance greater than 1 away from the origin. Then ten more copies of that task are added, but with permuted rows—effectively giving 30 noise columns (the piping operator `%>%`

from the `magrittr`

package is used for parsimonity).

To show the holdout prediction functionality, we will retain a part of the task as a “holdout” set.

The heart of this package is the `combine.operators()`

function that can be used to apply different operators to different parts of the search space; in particular, to different types of search space dimensions. It works by accepting a `ParamSet`

(from the `ParamHelpers`

package) that defines the search-space, and operators that define the operators to use on individual parameters.

We may have a `ParamSet`

with a logical parameter `bin1`

, a discrete (but binary) parameter `bin2`

, a discrete vector parameter `disc1`

, and a numeric parameter `num1`

. We define this `ParamSet`

using the `pSS()`

function from the excellent `mlrCPO`

package.

```
ps <- pSS(
bin1: logical,
bin2: discrete[no, yes],
disc1: discrete[letters]^3,
num1: numeric[0, 10])
```

We can now define an operator that uses the `mutFlipflop`

operator on both `bin1`

and `bin2`

, the `mutCycle`

operator on `disc1`

, and the `mutPlus`

operator on `num1`

:

```
combo <- combine.operators(ps,
bin1 = mutFlipflop,
bin2 = mutFlipflop,
disc1 = mutCycle,
num1 = mutPlus)
```

Here we are treating `bin2`

as a binary parameter, so we have to leave the `.binary.discrete.as.logical`

parameter of `combine.operators`

as `TRUE`

. If we wanted to use the `mutCycle`

operation on the `bin2`

operator, we would have to tell `combine.operators()`

to treat it as a discrete parameter:

```
combo2 <- combine.operators(ps,
bin1 = mutFlipflop,
bin2 = mutCycle,
disc1 = mutCycle,
num1 = mutPlus,
.binary.discrete.as.logical = FALSE)
```

Both these operators do essentially the same: They flip / cycle the binary and discrete parameters, and increase the numeric parameter by one.

Especially with larger parameter spaces it may be more desirable to define operations not for each parameter individually, but for groups of parameters. Groups can be declared by setting the special `.params.<GROUPNAME>`

value to a `character`

vector of parameter values to combine into a group. The special groups `numeric`

, `logical`

, `integer`

, and `discrete`

are pre-defined and group all parameters of a type that are not otherwise given an operator.

```
combo.group <- combine.operators(ps,
.params.group1 = c("bin1", "bin2"),
group1 = mutFlipflop,
discrete = mutCycle,
num1 = mutPlus)
```

```
combo.group(list(bin1 = TRUE, bin2 = "no", disc1 = c("a", "x", "z"), num1 = 3))
#> list(bin1 = FALSE, bin2 = "yes", disc1 = c("b", "y", "a"), num1 = 4)
```

When parameters are put in a group, they are passed to the underlying operators as a vector. This may make a difference if operations depend on the global state of a vector and don’t just operate component-wise. This can be shown with the `mutReverse`

operator, when given for a group vs. when given for parameters individually:

```
combo.rev.indiv <- combine.operators(ps,
bin1 = mutReverse,
bin2 = mutReverse,
discrete = mutCycle,
num1 = mutPlus)
combo.rev.group <- combine.operators(ps,
.params.group1 = c("bin1", "bin2"),
group1 = mutReverse,
discrete = mutCycle,
num1 = mutPlus)
```

When applied to `bin1`

and `bin2`

individually, the `mutReverse`

operator does not change the values:

```
combo.rev.indiv(list(bin1 = TRUE, bin2 = "no", disc1 = c("a", "x", "z"), num1 = 3))
#> list(bin1 = TRUE, bin2 = "no", disc1 = c("b", "y", "a"), num1 = 4)
```

However, when they are in a group, the values of `bin1`

and `bin2`

are combined as a vector and given to `mutReverse`

. The `TRUE`

value is given as value `1`

, the `"no"`

value is given as value `0`

. They are swapped, and the return list contains `bin1 = FALSE`

and `bin2 = "yes"`

:

```
combo.rev.group(list(bin1 = TRUE, bin2 = "no", disc1 = c("a", "x", "z"), num1 = 3))
#> list(bin1 = FALSE, bin2 = "yes", disc1 = c("b", "y", "a"), num1 = 4)
```

Because of this, it is **usually a bad idea** to combine parameters of different types into a single group. Discrete parameters (non-binary discrete parameters if `.binary.discrete.as.logical`

is `TRUE`

) can not be grouped together with other types of parameters.

A further extension offered by the `combine.operators()`

function is the ability to use “strategy parameters”. They make it possible to have the effect of an operator be dependent on the individuals involved. To do this, a `.strategy.<PARAMETER/GROUPNAME>`

value must be given, which should be a function that gets the individuum as input, and returns a named list of function parameters as output. This named list is added to the configuration of the operator associated with `<PARAMETER>`

or `<GROUPNAME>`

.

One could, for example, make the `summand`

parameter of the `mutPlus`

operation dependent on the `bin2`

value: If it is `"yes"`

, it adds a value, if it is `"no"`

it subtracts a value.

```
combo.strategy <- combine.operators(ps,
logical = mutFlipflop,
discrete = mutCycle,
numeric = mutPlus,
.strategy.numeric = function(ind) {
if (ind$bin2 == "yes") {
return(list(summand = 1))
} else {
return(list(summand = -1))
}
})
```

```
combo.strategy(list(bin1 = TRUE, bin2 = "yes", disc1 = c("a", "x", "z"), num1 = 3))
#> list(bin1 = FALSE, bin2 = "no", disc1 = c("b", "y", "a"), num1 = 4)
combo.strategy(list(bin1 = TRUE, bin2 = "no", disc1 = c("a", "x", "z"), num1 = 3))
#> list(bin1 = FALSE, bin2 = "yes", disc1 = c("b", "y", "a"), num1 = 2)
```

Note how the value of `ind$bin2`

from *before* the `mutFlipflop`

operation is used: The value that counts is always the input value given to the operator. It is therefore possible to use the values as strategy parameters for their own mutation.

Recombination operators can be combined in the same way that mutation operators can be. The input to a recombination operator is usually a list with two individuals, so the input to the strategy parameter function is also a list with two vectors. This function has to decide whether to construct a strategy depending to one of these individuals, or whether to combine their values, for example averaging them.

When using the `ecr`

package with its vanilla operators, individuals are usually vectors of numbers, for example 0-1-vectors for binary individuals, or `numeric`

values for continuous valued individuals. When using mixed operators, the individuals have to be *named lists*, with names and types corresponding to the `ParamSet`

given to `combine.operators()`

. The supported `Param`

types, and the expected entries in the individual lists, are the following:

Parameter Construction (`ParamHelpers` ) |
`pSS()` sugar |
Type Expected |
---|---|---|

`makeNumericParam` /`makeNumericVectorParam` |
`numeric[, ]` |
`numeric` |

`makeIntegerParam` /`makeIntegerVectorParam` |
`integer[, ]` |
(integer) `numeric` |

`makeLogicalParam` /`makeLogicalVectorParam` |
`logical` |
`logical` |

`makeDiscreteParam` /`makeDiscreteVectorParam` |
`discrete[]` |
`character` |

The only possible surprise here is that individuals must have “`character`

” entries for discrete parameters^{1}. That means, no `factor`

entries, and no lists of characters for `makeDiscreteVectorParam()`

types.

One can use the `sampleValue()`

function from the `ParamHelpers`

package to generate valid, randomly sampled individuals to use with combined operators. The `discrete.names`

option must be set to `TRUE`

for that:

Having a way to use mutation and recombination operators for mixed type individuals is already a big step towards using the `ecr`

package for simultaneous model and feature selection (“SMaFS”). It is still necessary to define the actual objective, for which we need a `Learner`

, a `Task`

, as well as a hyperparameter space to optimize over.

In this example, we choose to optimize the `"classif.rpart"`

learner from the `rpart`

package, on the `task`

defined above.

The parameter set we optimize over is the following

Furthermore, a resampling strategy must be chosen. It can be either a `mlr::ResampleDesc`

object, `mlr::ResampleInst`

object or any function which maps fidelity to the resampling to use. We use the resampling strategy `mlr::cv5`

.

Now we have all the information needed to define an objective function we try to optimize. If we are doing feature selection, the `makeObjective()`

function does this for us; otherwise we would need to define a “`smoof`

” function using `smoof::makeSingleObjectiveFunction()`

or `smoof::makeMultiObjectiveFunction()`

.

The `makeObjective()`

function allows us to supply a holdout dataset on which each model will be evaluated. If a holdout is given, then the resampling is performed as given to the `resampling`

parameter, and, *additionally*, the given `Learner`

is trained on the whole training data and predicted on the holdout set.

Notably the parameter `selector.selection`

was added within `makeObjective()`

to the parameter set, which controls what features are selected. The new parameter set needs to be used for the definition of the mutators and recombinators.

We are using some standard mutation and recombination operators for the parameters: Gaussian and rounded-gaussian mutation for numeric parameters, hamming weight preserving bitflip for `selector.selection`

, and generally uniform crossover mutation. We use the `ecr::setup`

method to set hyperparameters of some mutation operators.

```
mutator.simple <- combine.operators(ps.objective,
numeric = ecr::setup(mutGauss, sdev = 0.1),
integer = ecr::setup(mutGaussInt, sdev = 3),
selector.selection = mutBitflipCHW)
crossover.simple <- combine.operators(ps.objective,
numeric = recPCrossover,
integer = recPCrossover,
selector.selection = recPCrossover)
```

The initial population to sample over can be generated using the `ParamHelpers`

function `sampleValues()`

. Note that by generating an initial population of a certain size (32 here), we implicitly determine the ECR hyperparameter `mu`

of individuals in a generation.

This is all that is necessary, and it can be used to call the `ecr()`

function in the `ecr`

package. The `mosmafs`

package comes with a useful interface that sets some defaults: `slickEcr()`

. The run time, 10 generations, would probably be to short for serious applications, but should be enough for demonstration purposes.

```
run.simple <- slickEcr(
fitness.fun = fitness.fun.simple,
lambda = 16,
population = initials.simple,
mutator = mutator.simple,
recombinator = crossover.simple,
generations = 10)
```

There is a multitude of methods for analysing and plotting the result. The following plots the succession of pareto fronts for each generation:

```
plot_fronts <- function(run) {
fronts <- fitnesses(run, function(x) paretoEdges(x, c(1, 1)))
ggplot(data = fronts, aes(x = perf, y = propfeat, color = ordered(gen))) +
geom_line() +
geom_point(data = fronts[fronts$point, ], shape = "x", size = 5) +
xlim(0, 1) +
ylim(0, 1) +
coord_fixed()
}
plot_fronts(run.simple)
```

It is possible to inspect the runtimes of individuals, they are stored as attribute `"runtime"`

. The aggregated runtime for each generation can also be retrieved from the `log.newinds`

logging object. The runtime of individuals within each generation can also be queried using `popAggregate`

.

```
populations <- getPopulations(run.simple$log)
pop.gen1 <- populations[[1]]
individual.1 <- pop.gen1$population[[1]]
attr(individual.1, "runtime")
#> elapsed
#> 0.226764
individual.2 <- pop.gen1$population[[2]]
attr(individual.2, "runtime")
#> elapsed
#> 0.2080401
getStatistics(run.simple$log.newinds)
#> gen runtime.mean runtime.sum size population
#> 1 0 0.2342009 7.494428 32 init
#> 2 1 0.2428753 3.886004 16 offspring
#> 3 2 0.2245258 3.592412 16 offspring
#> 4 3 0.2196528 3.514444 16 offspring
#> 5 4 0.2620204 4.192326 16 offspring
#> 6 5 0.2373127 3.797003 16 offspring
#> 7 6 0.2400483 3.840772 16 offspring
#> 8 7 0.2393781 3.830050 16 offspring
#> 9 8 0.2106042 3.369667 16 offspring
#> 10 9 0.2444045 3.910472 16 offspring
#> 11 10 0.2334951 3.735921 16 offspring
```

Run statistics about the progress of performance can also be retrieved using the `collectResult()`

function. It creates a large `data.frame`

with many entries. The following plots the progressive average resampling performance, both of the training data and the holdout data.

```
ggplot(collectResult(run.simple), aes(x = evals)) +
geom_line(aes(y = hout.perf.mean, color = "Holdout")) +
geom_line(aes(y = eval.perf.mean, color = "Training")) +
ylim(0, 1)
```

Notice the `"true.hout.domHV"`

column which measures the dominated hypervolume on the holdout sample of those points that make up the paretofront on the training sample. It is a more realistic measure of generalization performance than the `"hout.domHV"`

column, which measures dominated hypervolume on the holdout sample of all points in a generation.

```
ggplot(collectResult(run.simple), aes(x = evals)) +
geom_line(aes(y = true.hout.domHV, color = "Holdout")) +
geom_line(aes(y = eval.domHV, color = "Training")) +
ylim(0, 1)
```

The totality of information provided by `collectResult()`

can be seen from its column names.

```
colnames(collectResult(run.simple))
#> [1] "gen" "runtime" "evals"
#> [4] "eval.perf.min" "eval.perf.mean" "eval.perf.max"
#> [7] "eval.propfeat.min" "eval.propfeat.mean" "eval.propfeat.max"
#> [10] "eval.domHV" "hout.perf.min" "hout.perf.mean"
#> [13] "hout.perf.max" "hout.propfeat.min" "hout.propfeat.mean"
#> [16] "hout.propfeat.max" "hout.domHV" "true.hout.domHV"
#> [19] "naive.hout.domHV" "cor.perf" "cor.propfeat"
```

It is possible to use “Feature Filter” values from `mlr`

feature filters to heuristically guide mutation of the `selector.selection`

bitvector. For this, the filter matrix can be generated using the `makeFilterMat()`

function, with a few features we choose (the github-version of `mlr`

needs to be installed for this: `devtools::install_github("mlr-org/mlr")`

). For demonstration purposes, this contains the useless-in-this-case `"variance"`

Filter that just measures each feature’s variance (to use the relatively good `"praznik_JMI"`

filter it is necessary to install the github-version of `mlr`

, e.g. by using `devtools::install_github("mlr-org/mlr")`

). The `"DUMMY"`

filter does not exist in `mlr`

, it instructs `makeFilterMat()`

to create a constant heuristic. It gives the strategy parameter another degree of freedom with which it can de-emphasize the heuristic in case it performs badly.

```
filters <- c("praznik_JMI", "auc", "anova.test", "variance", "DUMMY")
fima <- makeFilterMat(task, filters = filters)
```

We now create a `ParamSet`

that includes one strategy parameter “`filterweights`

”, which puts weights on the filter values and thereby controls how strong each one influences the mutation:

```
ps.strat <- pSS(
maxdepth: integer[1, 30],
minsplit: integer[2, 30],
cp: numeric[0.001, 0.999],
filterweights: numeric[0.001, 0.999]^length(filters))
```

Before mutation and recombination can be defined, the parameter set must be substracted from `makeObjective()`

```
fitness.fun.strat <- makeObjective(lrn, task, ps.strat, cv5)
ps.strat.obj <- getParamSet(fitness.fun.strat)
```

The mutation operator must now include information on how to use the strategy parameter. Instead of writing out the function, we are using the helper `makeFilterStrategy()`

that is designed out for this usecase:

```
mutator.strat <- combine.operators(ps.strat.obj,
numeric = ecr::setup(mutGauss, sdev = 0.1),
integer = ecr::setup(mutGaussInt, sdev = 3),
selector.selection = mutUniformMetaResetSHW,
.strategy.selector.selection = makeFilterStrategy(
reset.dists = fima, weight.param.name = "filterweights"))
```

We are going to use the same crossover operator as before, but we have to construct it with the new `ParamSet`

:

```
crossover.strat <- combine.operators(ps.strat.obj,
numeric = recPCrossover,
integer = recPCrossover,
selector.selection = recPCrossover)
```

We sample the initial population again using the `ParamHelpers`

function `sampleValues()`

and resample the `$selector.selection`

part using `initSelector()`

. `initSelector`

makes the number of 0s and 1s uniformly distributed, and applies the equilibrium-distribution of `mutUniformMetaResetSHW()`

:

```
initials.strat <- sampleValues(ps.strat.obj, 32, discrete.names = TRUE) %>%
initSelector(
soften.op = ecr::setup(mutUniformMetaResetSHW, p = 1),
soften.op.strategy = makeFilterStrategy(fima, "filterweights"))
```

These are all preparations needed for a heuristic-backed NSGA2 for feature-selection:

```
run.strat <- slickEcr(
fitness.fun = fitness.fun.strat,
lambda = 16,
population = initials.strat,
mutator = mutator.strat,
recombinator = crossover.strat,
generations = 10)
```

The pareto-fronts of this run look as follows:

The results returned by `slickEcr()`

contain all information to continue an optimization for more generations. To continue a run, use the `continueEcr()`

function. It is possible to change a few settings compared to the previous invocation of `slickEcr()`

(it is in fact possible to just create a result object without any generations using `initEcr()`

, and then call `continueEcr()`

at a later point on it).

```
getStatistics(run.strat.20$log.newinds)
#> gen runtime.mean runtime.sum size population
#> 1 0 0.2328167 7.450133 32 init
#> 2 1 0.2206262 3.530019 16 offspring
#> 3 2 0.2484078 3.974524 16 offspring
#> 4 3 0.2648093 4.236949 16 offspring
#> 5 4 0.2245242 3.592388 16 offspring
#> 6 5 0.2571549 4.114479 16 offspring
#> 7 6 0.2388807 3.822092 16 offspring
#> 8 7 0.2404426 3.847082 16 offspring
#> 9 8 0.2266318 3.626109 16 offspring
#> 10 9 0.2416093 3.865749 16 offspring
#> 11 10 0.2507146 4.011434 16 offspring
#> 12 11 0.2239953 1.791962 8 offspring
#> 13 12 0.2366085 1.892868 8 offspring
#> 14 13 0.2385776 1.908621 8 offspring
#> 15 14 0.2427327 1.941862 8 offspring
#> 16 15 0.2425791 1.940633 8 offspring
#> 17 16 0.2456001 1.964801 8 offspring
#> 18 17 0.2540927 2.032742 8 offspring
#> 19 18 0.2544738 2.035790 8 offspring
#> 20 19 0.2290847 1.832678 8 offspring
#> 21 20 0.2320937 1.856750 8 offspring
```

These operators are supplied by the `mosmafs`

package:

: Mutation with rounding, for integer individuals. This is achieved through trivial application of`mutGaussInt()`

,`mutPolynomialInt()`

,`mutUniformInt()`

,`recIntSBX()`

`intifyMutator()`

.: Geometric distribution mutation, for integer individuals.`mutDoubleGeom()`

: Mutation for discrete individuals.`mutRandomChoice()`

: Crossover recombination operator for individuals of any type.`recPCrossover()`

: Bitflip that approximately conserves hamming weight (sum of digits) of binary individuals in expectation. This is useful for the`mutBitflipCHW()`

`selector.selection`

vector because it does not bias mutation away from the extremes (very few features selected, or very many features selected) as strongly.: Extension of the`mutUniformReset()`

`mutBitflip()`

operator that draws new values from a distribution`reset.dist`

.: Extension of the`mutUniformMetaReset()`

`mutBitflip()`

operator that draws new values from a set of distribution`reset.dists`

, weighted by another parameter`reset.dist.weights`

.: Gaussian mutation with standard deviation scaled to the range of possible values.`mutGaussScaled()`

,`mutGaussIntScaled()`

: Multi-Objective k-tournament selection with individual rankging by nondominated sorting and secondarily either crowding distance or contribution to dominated hypervolume.`selTournamentMO()`

: Binary operators that emulate`mutBitflipCHW()`

,`mutUniformResetSHW()`

,`mutUniformMetaResetSHW()`

`mutBitflip()`

,`mutUniformReset()`

, and`mutUniformMetaReset()`

but keep the fraction of ones in the result more constant.

: Information aggregated as matrix or dataframe for each generation in a population.`popAggregate()`

,`availableAttributes()`

: Information aggregated throughout the run, on runtime and computation cost measures.`collectResult()`

Resetting of individual’s`initSelector()`

`selector.selection`

slot.

In particular, they must have entries corresponding to the

**names**of the possible values of a discrete parameter. Consider the`ParamSet`

`makeParamSet(makeDiscreteParam("fn", values = list(a = identity, b = exp)))`

. An individuum corresponding to this`ParamSet`

would have to be one of`list(fn = "a")`

or`list(fn = "b")`

.↩