Publications

24.

Williams, Jonathan; Stönner, Christof; Wicker, Jörg; Krauter, Nicolas; Derstorff, Bettina; Bourtsoukidis, Efstratios; Klüpfel, Thomas; Kramer, Stefan

Cinema audiences reproducibly vary the chemical composition of air during films, by broadcasting scene specific emissions on breath Journal Article

In: Scientific Reports, vol. 6, 2016.

23.

Wicker, Jörg; Krauter, Nicolas; Derstorff, Bettina; Stönner, Christof; Bourtsoukidis, Efstratios; Klüpfel, Thomas; Williams, Jonathan; Kramer, Stefan

Cinema Data Mining: The Smell of Fear Proceedings Article

In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1235-1304, ACM ACM, New York, NY, USA, 2015, ISBN: 978-1-4503-3664-2.

@inproceedings{wicker2015cinema,

title = {Cinema Data Mining: The Smell of Fear},

author = {J\"{o}rg Wicker and Nicolas Krauter and Bettina Derstorff and Christof St\"{o}nner and Efstratios Bourtsoukidis and Thomas Kl\"{u}pfel and Jonathan Williams and Stefan Kramer},

url = {https://wicker.nz/nwp-acm/authorize.php?id=N10031 

http://doi.acm.org/10.1145/2783258.2783404},

doi = {10.1145/2783258.2783404},

isbn = {978-1-4503-3664-2},

year  = {2015},

date = {2015-01-01},

booktitle = {Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},

pages = {1235-1304},

publisher = {ACM},

address = {New York, NY, USA},

organization = {ACM},

series = {KDD '15},

abstract = {While the physiological response of humans to emotional events or stimuli is well-investigated for many modalities (like EEG, skin resistance, ...), surprisingly little is known about the exhalation of so-called Volatile Organic Compounds (VOCs) at quite low concentrations in response to such stimuli. VOCs are molecules of relatively small mass that quickly evaporate or sublimate and can be detected in the air that surrounds us. The paper introduces a new field of application for data mining, where trace gas responses of people reacting on-line to films shown in cinemas (or movie theaters) are related to the semantic content of the films themselves. To do so, we measured the VOCs from a movie theatre over a whole month in intervals of thirty seconds, and annotated the screened films by a controlled vocabulary compiled from multiple sources. To gain a better understanding of the data and to reveal unknown relationships, we have built prediction models for so-called forward prediction (the prediction of future VOCs from the past), backward prediction (the prediction of past scene labels from future VOCs) and for some forms of abductive reasoning and Granger causality. Experimental results show that some VOCs and some labels can be predicted with relatively low error, and that hints for causality with low p-values can be detected in the data.},

keywords = {atmospheric chemistry, breath analysis, causality, cheminformatics, cinema data mining, data mining, emotional response analysis, movie analysis, smell of fear, sof, time series},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

22.

Tyukin, Andrey; Kramer, Stefan; Wicker, Jörg

Scavenger – A Framework for the Efficient Evaluation of Dynamic and Modular Algorithms Proceedings Article

In: Bifet, Albert; May, Michael; Zadrozny, Bianca; Gavalda, Ricard; Pedreschi, Dino; Cardoso, Jaime; Spiliopoulou, Myra (Ed.): Machine Learning and Knowledge Discovery in Databases, pp. 325-328, Springer International Publishing, 2015, ISBN: 978-3-319-23460-1.

21.

Benedik, Blaž; Taškova, Katerina; Tavčar, Jože; Duhovnik, Jože

Prediction of vacuum cleaner motor brush life: a regression approach Journal Article

In: IET Electric Power Applications, vol. 9, no. 9, pp. 569-577, 2015.

Abstract | Links | BibTeX | Altmetric | PlumX | Tags: brushes, carbon brush wear modelling, domestic appliances, dominant wear mechanism, electric motors, energy saving, field theory, motor design reliability, multiple regression analysis, vacuum cleaner motor brush life prediction, wear

20.

Dietzen, Matthias; Kalinina, Olga V.; Taškova, Katerina; Kneissl, Benny; Hildebrandt, Anna-Katharina; Jaenicke, Elmar; Decker, Heinz; Lengauer, Thomas; Hildebrandt, Andreas

Large oligomeric complex structures can be computationally assembled by efficiently combining docked interfaces Journal Article

In: Proteins: Structure, Function, and Bioinformatics, vol. 83, no. 10, pp. 1887-1899, 2015.

@article{https://doi.org/10.1002/prot.24873,

title = {Large oligomeric complex structures can be computationally assembled by efficiently combining docked interfaces},

author = {Matthias Dietzen and Olga V. Kalinina and Katerina Ta\v{s}kova and Benny Kneissl and Anna-Katharina Hildebrandt and Elmar Jaenicke and Heinz Decker and Thomas Lengauer and Andreas Hildebrandt},

url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.24873},

doi = {https://doi.org/10.1002/prot.24873},

year  = {2015},

date = {2015-01-01},

journal = {Proteins: Structure, Function, and Bioinformatics},

volume = {83},

number = {10},

pages = {1887-1899},

abstract = {ABSTRACT Macromolecular oligomeric assemblies are involved in many biochemical processes of living organisms. The benefits of such assemblies in crowded cellular environments include increased reaction rates, efficient feedback regulation, cooperativity and protective functions. However, an atom-level structural determination of large assemblies is challenging due to the size of the complex and the difference in binding affinities of the involved proteins. In this study, we propose a novel combinatorial greedy algorithm for assembling large oligomeric complexes from information on the approximate position of interaction interfaces of pairs of monomers in the complex. Prior information on complex symmetry is not required but rather the symmetry is inferred during assembly. We implement an efficient geometric score, the transformation match score, that bypasses the model ranking problems of state-of-the-art scoring functions by scoring the similarity between the inferred dimers of the same monomer simultaneously with different binding partners in a (sub)complex with a set of pregenerated docking poses. We compiled a diverse benchmark set of 308 homo and heteromeric complexes containing 6 to 60 monomers. To explore the applicability of the method, we considered 48 sets of parameters and selected those three sets of parameters, for which the algorithm can correctly reconstruct the maximum number, namely 252 complexes (81.8%) in, at least one of the respective three runs. The crossvalidation coverage, that is, the mean fraction of correctly reconstructed benchmark complexes during crossvalidation, was 78.1%, which demonstrates the ability of the presented method to correctly reconstruct topology of a large variety of biological complexes. Proteins 2015; 83:1887\textendash1899. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.},

keywords = {3D-MOSAIC, complex match score, macromolecular assembly, protein\textendashprotein interactions, structural modeling, transformation match score},

pubstate = {published},

tppubtype = {article}

}

Close

19.

Šilc, Jurij; Taškova, Katerina; Korošec, Peter

Data mining-assisted parameter tuning of a search algorithm Journal Article

In: Informatica, vol. 39, no. 2, 2015.

Abstract | Links | BibTeX | Tags: data mining

18.

Tyukin, Andrey; Kramer, Stefan; Wicker, Jörg

BMaD — A Boolean Matrix Decomposition Framework Proceedings Article

In: Calders, Toon; Esposito, Floriana; Hüllermeier, Eyke; Meo, Rosa (Ed.): Machine Learning and Knowledge Discovery in Databases, pp. 481-484, Springer Berlin Heidelberg, 2014, ISBN: 978-3-662-44844-1.

17.

Wicker, Jörg

Large Classifier Systems in Bio- and Cheminformatics PhD Thesis

Technische Universität München, 2013.

Abstract | Links | BibTeX | Tags: biodegradation, bioinformatics, cheminformatics, computational sustainability, data mining, enviPath, machine learning, multi-label classification, multi-relational learning, toxicity

@phdthesis{wicker2013large,

title = {Large Classifier Systems in Bio- and Cheminformatics},

author = {J\"{o}rg Wicker},

url = {http://mediatum.ub.tum.de/node?id=1165858},

year  = {2013},

date = {2013-01-01},

school = {Technische Universit\"{a}t M\"{u}nchen},

abstract = {Large classifier systems are machine learning algorithms that use multiple 

classifiers to improve the prediction of target values in advanced 

classification tasks. Although learning problems in bio- and 

cheminformatics commonly provide data in schemes suitable for large 

classifier systems, they are rarely used in these domains. This thesis 

introduces two new classifiers incorporating systems of classifiers 

using Boolean matrix decomposition to handle data in a schema that 

often occurs in bio- and cheminformatics. 

 

The first approach, called MLC-BMaD (multi-label classification using 

Boolean matrix decomposition), uses Boolean matrix decomposition to 

decompose the labels in a multi-label classification task. The 

decomposed matrices are a compact representation of the information 

in the labels (first matrix) and the dependencies among the labels 

(second matrix). The first matrix is used in a further multi-label 

classification while the second matrix is used to generate the final 

matrix from the predicted values of the first matrix. 

MLC-BMaD was evaluated on six standard multi-label data sets, the 

experiments showed that MLC-BMaD can perform particularly well on data 

sets with a high number of labels and a small number of instances and 

can outperform standard multi-label algorithms. 

Subsequently, MLC-BMaD is extended to a special case of 

multi-relational learning, by considering the labels not as simple 

labels, but instances. The algorithm, called ClassFact 

(Classification factorization), uses both matrices in a multi-label 

classification. Each label represents a mapping between two 

instances. 

Experiments on three data sets from the domain of bioinformatics show 

that ClassFact can outperform the baseline method, which merges the 

relations into one, on hard classification tasks. 

 

Furthermore, large classifier systems are used on two cheminformatics 

data sets, the first one is used to predict the environmental fate of 

chemicals by predicting biodegradation pathways. The second is a data 

set from the domain of predictive toxicology. In biodegradation 

pathway prediction, I extend a knowledge-based system and incorporate 

a machine learning approach to predict a probability for 

biotransformation products based on the structure- and knowledge-based 

predictions of products, which are based on transformation rules. The 

use of multi-label classification improves the performance of the 

classifiers and extends the number of transformation rules that can be 

covered. 

For the prediction of toxic effects of chemicals, I applied large 

classifier systems to the ToxCasttexttrademark data set, which maps 

toxic effects to chemicals. As the given toxic effects are not easy to 

predict due to missing information and a skewed class 

distribution, I introduce a filtering step in the multi-label 

classification, which finds labels that are usable in multi-label 

prediction and does not take the others in the 

prediction into account. Experiments show 

that this approach can improve upon the baseline method using binary 

classification, as well as multi-label approaches using no filtering. 

 

The presented results show that large classifier systems can play a 

role in future research challenges, especially in bio- and 

cheminformatics, where data sets frequently consist of more complex 

structures and data can be rather small in terms of the number of 

instances compared to other domains.},

keywords = {biodegradation, bioinformatics, cheminformatics, computational sustainability, data mining, enviPath, machine learning, multi-label classification, multi-relational learning, toxicity},

pubstate = {published},

tppubtype = {phdthesis}

}

Close

Large classifier systems are machine learning algorithms that use multiple
classifiers to improve the prediction of target values in advanced
classification tasks. Although learning problems in bio- and
cheminformatics commonly provide data in schemes suitable for large
classifier systems, they are rarely used in these domains. This thesis
introduces two new classifiers incorporating systems of classifiers
using Boolean matrix decomposition to handle data in a schema that
often occurs in bio- and cheminformatics.

The first approach, called MLC-BMaD (multi-label classification using
Boolean matrix decomposition), uses Boolean matrix decomposition to
decompose the labels in a multi-label classification task. The
decomposed matrices are a compact representation of the information
in the labels (first matrix) and the dependencies among the labels
(second matrix). The first matrix is used in a further multi-label
classification while the second matrix is used to generate the final
matrix from the predicted values of the first matrix.
MLC-BMaD was evaluated on six standard multi-label data sets, the
experiments showed that MLC-BMaD can perform particularly well on data
sets with a high number of labels and a small number of instances and
can outperform standard multi-label algorithms.
Subsequently, MLC-BMaD is extended to a special case of
multi-relational learning, by considering the labels not as simple
labels, but instances. The algorithm, called ClassFact
(Classification factorization), uses both matrices in a multi-label
classification. Each label represents a mapping between two
instances.
Experiments on three data sets from the domain of bioinformatics show
that ClassFact can outperform the baseline method, which merges the
relations into one, on hard classification tasks.

Furthermore, large classifier systems are used on two cheminformatics
data sets, the first one is used to predict the environmental fate of
chemicals by predicting biodegradation pathways. The second is a data
set from the domain of predictive toxicology. In biodegradation
pathway prediction, I extend a knowledge-based system and incorporate
a machine learning approach to predict a probability for
biotransformation products based on the structure- and knowledge-based
predictions of products, which are based on transformation rules. The
use of multi-label classification improves the performance of the
classifiers and extends the number of transformation rules that can be
covered.
For the prediction of toxic effects of chemicals, I applied large
classifier systems to the ToxCasttexttrademark data set, which maps
toxic effects to chemicals. As the given toxic effects are not easy to
predict due to missing information and a skewed class
distribution, I introduce a filtering step in the multi-label
classification, which finds labels that are usable in multi-label
prediction and does not take the others in the
prediction into account. Experiments show
that this approach can improve upon the baseline method using binary
classification, as well as multi-label approaches using no filtering.

The presented results show that large classifier systems can play a
role in future research challenges, especially in bio- and
cheminformatics, where data sets frequently consist of more complex
structures and data can be rather small in terms of the number of
instances compared to other domains.

Close

16.

Wicker, Jörg; Pfahringer, Bernhard; Kramer, Stefan

Multi-label Classification Using Boolean Matrix Decomposition Proceedings Article

In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, pp. 179–186, ACM, 2012, ISBN: 978-1-4503-0857-1.

15.

Čerepnalkoski, Darko; Taškova, Katerina; Todorovski, Ljupčo; Atanasova, Nataša; Džeroski, Sašo

The influence of parameter fitting methods on model structure selection in automated modeling of aquatic ecosystems Journal Article

In: Ecological Modelling, vol. 245, pp. 136-165, 2012, ISSN: 0304-3800, (7th European Conference on Ecological Modelling (ECEM)).

@article{CEREPNALKOSKI2012136,

title = {The influence of parameter fitting methods on model structure selection in automated modeling of aquatic ecosystems},

author = {Darko \v{C}erepnalkoski and Katerina Ta\v{s}kova and Ljup\v{c}o Todorovski and Nata\v{s}a Atanasova and Sa\v{s}o D\v{z}eroski},

url = {https://www.sciencedirect.com/science/article/pii/S0304380012002724},

doi = {https://doi.org/10.1016/j.ecolmodel.2012.06.001},

issn = {0304-3800},

year  = {2012},

date = {2012-01-01},

journal = {Ecological Modelling},

volume = {245},

pages = {136-165},

abstract = {Modeling dynamical systems involves two subtasks: structure identification and parameter estimation. ProBMoT is a tool for automated modeling of dynamical systems that addresses both tasks simultaneously. It takes into account domain knowledge formalized as templates for components of the process-based models: entities and processes. Taking a conceptual model of the system, the library of domain knowledge, and measurements of a particular dynamical system, it identifies both the structure and numerical parameters of the appropriate process-based model. ProBMoT has two main components corresponding to the two subtasks of modeling. The first component is concerned with generating candidate model structures that adhere to the conceptual model specified as input. The second subsystem uses the measured data to find suitable values for the constant parameters of a given model by using parameter estimation methods. ProBMoT uses model error to rank model structures and select the one that fits measured data best. In this paper, we investigate the influence of the selection of the parameter estimation methods on the structure identification. We consider one local (derivative-based) and one global (meta-heuristic) parameter estimation method. As opposed to other comparative studies of parameter estimation methods that focus on identifying parameters of a single model structure, we compare the parameter estimation methods in the context of repetitive parameter estimation for a number of candidate model structures. The results confirm the superiority of the global optimization methods over the local ones in the context of structure identification.},

note = {7th European Conference on Ecological Modelling (ECEM)},

keywords = {Aquatic ecosystems, Dynamical systems, Equation discovery, Meta-heuristic optimization, Parameter estimation, Process-based modeling},

pubstate = {published},

tppubtype = {article}

}

Close

14.

Taskova, Katerina; Šilc, Jurij; Atanasova, Nataša; Džeroski, Sašo

Parameter estimation in a nonlinear dynamic model of an aquatic ecosystem with meta-heuristic optimization Journal Article

In: Ecological Modelling, vol. 226, pp. 36-61, 2012, ISSN: 0304-3800.

@article{TASHKOVA201236,

title = {Parameter estimation in a nonlinear dynamic model of an aquatic ecosystem with meta-heuristic optimization},

author = {Katerina Taskova and Jurij \v{S}ilc and Nata\v{s}a Atanasova and Sa\v{s}o D\v{z}eroski},

url = {https://www.sciencedirect.com/science/article/pii/S0304380011005795},

doi = {https://doi.org/10.1016/j.ecolmodel.2011.11.029},

issn = {0304-3800},

year  = {2012},

date = {2012-01-01},

urldate = {2012-01-01},

journal = {Ecological Modelling},

volume = {226},

pages = {36-61},

abstract = {Parameter estimation in dynamic models of ecosystems is essentially an optimization task. Due to the characteristics of ecosystems and typical models thereof, such as non-linearity, high dimensionality, and low quantity and quality of observed data, this optimization task can be very hard for traditional (derivative-based or local) optimization methods. This calls for the use of advanced meta-heuristic approaches, such as evolutionary or swarm-based methods. In this paper, we conduct an empirical comparison of four meta-heuristic optimization methods, and one local optimization method as a baseline, on a representative task of parameter estimation in a nonlinear dynamic model of an aquatic ecosystem. The five methods compared are the differential ant-stigmergy algorithm (DASA) and its continuous variant (CDASA), particle swarm optimization (PSO), differential evolution (DE) and algorithm 717 (A717). We use synthetic data, both without and with different levels of noise, as well as real measurements from Lake Bled. We also consider two different simulation approaches: teacher forcing, which makes supervised predictions one (small) time step ahead, and full (multistep) simulation, which makes predictions based on the history predictions for longer time periods. The meta-heuristic global optimization methods for parameter estimation are clearly superior and should be preferred over local optimization methods. While the differences in performance between the different methods within the class of meta-heuristics are not significant across all conditions, differential evolution yields the best results in terms of quality of the reconstructed system dynamics as well as speed of convergence. While the use of teacher forcing simulation makes parameter estimation much faster, the use of full simulation produces much better parameter estimates from real measured data.},

keywords = {Aquatic ecosystems, Least-squares estimation, Meta-heuristic optimization, Ordinary differential equations, Parameter estimation},

pubstate = {published},

tppubtype = {article}

}

Close

13.

Taškova, Katerina

Parameter Identification in Nonlinear Dynamic Systems with Meta-heuristic Approaches PhD Thesis

2012.

Abstract | BibTeX | Tags:

@phdthesis{tavskova2012parameter,

title = {Parameter Identification in Nonlinear Dynamic Systems with Meta-heuristic Approaches},

author = {Katerina Ta\v{s}kova},

year = {2012},

date = {2012-01-01},

urldate = {2012-01-01},

abstract = {The task of mathematical modeling of dynamic systems from observed system behavior,

widely known under the name of system identification, breaks down into two subtasks.

The first task, referred to as structure identification, is to specify the model structure,

i.e., the functional form of the model. In practice, the model structure is usually given by

a human domain expert and reflects prior domain knowledge: this is called knowledge-

driven identification (as opposed to data-driven identification, which is based only on

data). Structure identification plays an important role in modeling as it defines the

choice available for the selection of the “best model”.

The second task, referred to as parameter identification, aims to estimate the values of

the model parameters that define a best possible fit of the model to the measured data. It

assumes that the model structure is known and the observed system behavior is given in

the form of measured data. Accurate estimation of the model parameters is important for

describing and analyzing the behavior of the modeled system. Parameter identification

is therefore a crucial step in almost all approaches for reconstructing system dynamics

from measured data, including knowledge-driven and data-driven system identification as

well as traditional (human) and automated modeling, i.e., the automated discovery of

appropriate model structures and model parameter values by equation discovery tools.

In this dissertation, we address the task of parameter identification in dynamic mod-

els of real-life systems. The models are represented by ordinary differential equations

(ODEs), as considered in the fields of systems biology and ecological modeling. The task

is approached as a least-squares estimation problem within the frequentist framework.

The latter means that the model parameters have fixed unique values and their optimal

values are the ones that minimize a quadratic cost function, i.e., the sum of squared errors

between the model prediction and the experimentally measured data. Least-squares esti-

mation is essentially an optimization task. However, it can turn into a difficult problem

for traditional (gradient-based) optimization methods when modeling complex system dy-

namics. Therefore, it should be addressed by advanced meta-heuristic approaches, such

as evolutionary or swarm intelligence methods.

Typically, biological and ecosystem models are nonlinear and have many parameters,

the studied systems can often be only partially observed, and their measurements are

sparse and imperfect due to noise. All of these constraints can lead to identifiability

problems, i.e., the inability to uniquely identify the unknown model parameters, making

parameter estimation an even harder optimization task. Furthermore, the implicit def-

inition of the cost function requires expensive numerical ODE simulations that have to

be performed for every parameter solution investigated during the optimization process.

As a result, parameter identification is a challenging and computationally expensive step

in the process of reconstructing the structure and behavior of biological and ecological

systems.

This dissertation attempts to improve the quality of reconstructed system dynamics

by improving parameter identification. In this context, we perform a thorough empirical

evaluation of representative meta-heuristic methods on the task of estimating parameters

in two nonlinear ODE models. The considered models describe two practically rele-

x Abstract

vant and representative real-life systems, i.e., endosome maturation in endocytosis and a

food web of Lake Bled. The compared meta-heuristic methods are the differential ant-

stigmergy algorithm, the continuous differential ant-stigmergy algorithm, particle swarm

optimization, and differential evolution. As a baseline method for the experimental com-

parison, we use Algorithm 717, a gradient-based local search method essentially designed

for nonlinear least-squares estimation. Different experimental scenarios are considered to

investigate the effect of limited observability of the system dynamics, the influence of the

ODE simulation method, and the impact of the noise in the data, on the complexity of

the parameter identification task, as well as the applicability and performance of different

optimization methods in this context.

The empirical evaluation shows that the meta-heuristic global optimization methods

for parameter identification are clearly superior and should be preferred over local opti-

mization methods. While the differences in performance between the different methods

within the class of meta-heuristics are not significant across all conditions, differential

evolution yields the best results in terms of the quality of the reconstructed system dy-

namics as well as the speed of convergence. The observability of the system shows a

strong influence, where less complete observations make the optimization task much more

difficult. The results clearly indicate the importance of choosing a relevant cost function

when the modeled systems dynamics is only partially observed. While the use of a simple

one-step trapezoidal-based integrator for supervised prediction makes parameter identifi-

cation much faster, the use of a multistep variable-coefficient integrator for unsupervised

prediction produces much better parameter estimates from real-measured data.

Furthermore, we consider the problem of parameter identification within the process

of automated modeling of dynamic systems, where a large number of model structures

is considered. One major drawback of existing automated modeling approaches is the

use of local search methods for parameter identification. In this context, we investigate

the influence of parameter identification (in terms of a global and a local optimization

method) on the outcome of the automated modeling process, i.e., on what models are

selected. We consider eight tasks of automated modeling of phytoplankton dynamics in

Lake Bled from single-year data measured in eight different years. The outcome of the

experiments empirically demonstrate the benefit of estimating model parameters by global

optimization methods for the model (structure) selection process, opening the opportunity

to model long term system dynamics.

Many challenges still remain concerning the use of optimization methods for parameter

identification in dynamic systems, especially in the context of automated modeling by

equation discovery methods. Besides the need to extend our study by including additional

dynamic systems from different domains, several lines for further improvement of existing

automated modeling methods can be followed. These include the use of more appropriate

and informative cost functions, as well as more robust and faster methods for parameter

identification. Finally, explicit integration of the feedback from identifiability analysis

within the process of model selection is highly desirable.},

keywords = {},

pubstate = {published},

tppubtype = {phdthesis}

}

Close

The task of mathematical modeling of dynamic systems from observed system behavior,
widely known under the name of system identification, breaks down into two subtasks.
The first task, referred to as structure identification, is to specify the model structure,
i.e., the functional form of the model. In practice, the model structure is usually given by
a human domain expert and reflects prior domain knowledge: this is called knowledge-
driven identification (as opposed to data-driven identification, which is based only on
data). Structure identification plays an important role in modeling as it defines the
choice available for the selection of the “best model”.
The second task, referred to as parameter identification, aims to estimate the values of
the model parameters that define a best possible fit of the model to the measured data. It
assumes that the model structure is known and the observed system behavior is given in
the form of measured data. Accurate estimation of the model parameters is important for
describing and analyzing the behavior of the modeled system. Parameter identification
is therefore a crucial step in almost all approaches for reconstructing system dynamics
from measured data, including knowledge-driven and data-driven system identification as
well as traditional (human) and automated modeling, i.e., the automated discovery of
appropriate model structures and model parameter values by equation discovery tools.
In this dissertation, we address the task of parameter identification in dynamic mod-
els of real-life systems. The models are represented by ordinary differential equations
(ODEs), as considered in the fields of systems biology and ecological modeling. The task
is approached as a least-squares estimation problem within the frequentist framework.
The latter means that the model parameters have fixed unique values and their optimal
values are the ones that minimize a quadratic cost function, i.e., the sum of squared errors
between the model prediction and the experimentally measured data. Least-squares esti-
mation is essentially an optimization task. However, it can turn into a difficult problem
for traditional (gradient-based) optimization methods when modeling complex system dy-
namics. Therefore, it should be addressed by advanced meta-heuristic approaches, such
as evolutionary or swarm intelligence methods.
Typically, biological and ecosystem models are nonlinear and have many parameters,
the studied systems can often be only partially observed, and their measurements are
sparse and imperfect due to noise. All of these constraints can lead to identifiability
problems, i.e., the inability to uniquely identify the unknown model parameters, making
parameter estimation an even harder optimization task. Furthermore, the implicit def-
inition of the cost function requires expensive numerical ODE simulations that have to
be performed for every parameter solution investigated during the optimization process.
As a result, parameter identification is a challenging and computationally expensive step
in the process of reconstructing the structure and behavior of biological and ecological
systems.
This dissertation attempts to improve the quality of reconstructed system dynamics
by improving parameter identification. In this context, we perform a thorough empirical
evaluation of representative meta-heuristic methods on the task of estimating parameters
in two nonlinear ODE models. The considered models describe two practically rele-
x Abstract
vant and representative real-life systems, i.e., endosome maturation in endocytosis and a
food web of Lake Bled. The compared meta-heuristic methods are the differential ant-
stigmergy algorithm, the continuous differential ant-stigmergy algorithm, particle swarm
optimization, and differential evolution. As a baseline method for the experimental com-
parison, we use Algorithm 717, a gradient-based local search method essentially designed
for nonlinear least-squares estimation. Different experimental scenarios are considered to
investigate the effect of limited observability of the system dynamics, the influence of the
ODE simulation method, and the impact of the noise in the data, on the complexity of
the parameter identification task, as well as the applicability and performance of different
optimization methods in this context.
The empirical evaluation shows that the meta-heuristic global optimization methods
for parameter identification are clearly superior and should be preferred over local opti-
mization methods. While the differences in performance between the different methods
within the class of meta-heuristics are not significant across all conditions, differential
evolution yields the best results in terms of the quality of the reconstructed system dy-
namics as well as the speed of convergence. The observability of the system shows a
strong influence, where less complete observations make the optimization task much more
difficult. The results clearly indicate the importance of choosing a relevant cost function
when the modeled systems dynamics is only partially observed. While the use of a simple
one-step trapezoidal-based integrator for supervised prediction makes parameter identifi-
cation much faster, the use of a multistep variable-coefficient integrator for unsupervised
prediction produces much better parameter estimates from real-measured data.
Furthermore, we consider the problem of parameter identification within the process
of automated modeling of dynamic systems, where a large number of model structures
is considered. One major drawback of existing automated modeling approaches is the
use of local search methods for parameter identification. In this context, we investigate
the influence of parameter identification (in terms of a global and a local optimization
method) on the outcome of the automated modeling process, i.e., on what models are
selected. We consider eight tasks of automated modeling of phytoplankton dynamics in
Lake Bled from single-year data measured in eight different years. The outcome of the
experiments empirically demonstrate the benefit of estimating model parameters by global
optimization methods for the model (structure) selection process, opening the opportunity
to model long term system dynamics.
Many challenges still remain concerning the use of optimization methods for parameter
identification in dynamic systems, especially in the context of automated modeling by
equation discovery methods. Besides the need to extend our study by including additional
dynamic systems from different domains, several lines for further improvement of existing
automated modeling methods can be followed. These include the use of more appropriate
and informative cost functions, as well as more robust and faster methods for parameter
identification. Finally, explicit integration of the feedback from identifiability analysis
within the process of model selection is highly desirable.

Close

12.

Taskova, Katerina; Korošec, Peter; Šilc, Jurij; Džeroski, Sašo

Parameter estimation with bio-inspired meta-heuristic optimization: modeling the dynamics of endocytosis Journal Article

In: BMC Systems Biology, vol. 5, iss. 1, pp. 1752-0509, 2011.

Links | BibTeX | Altmetric | PlumX | Tags: machine learning, Parameter estimation

11.

Taskova, Katerina; Korošec, Peter; Šilc, Jurij

A distributed multilevel ant-colony algorithm for the multi-way graph partitioning Journal Article

In: International Journal of Bio-Inspired Computation, vol. 3, no. 5, pp. 286-296, 2011.

10.

Hardy, Barry; Douglas, Nicki; Helma, Christoph; Rautenberg, Micha; Jeliazkova, Nina; Jeliazkov, Vedrin; Nikolova, Ivelina; Benigni, Romualdo; Tcheremenskaia, Olga; Kramer, Stefan; Girschick, Tobias; Buchwald, Fabian; Wicker, Jörg; Karwath, Andreas; Gütlein, Martin; Maunz, Andreas; Sarimveis, Haralambos; Melagraki, Georgia; Afantitis, Antreas; Sopasakis, Pantelis; Gallagher, David; Poroikov, Vladimir; Filimonov, Dmitry; Zakharov, Alexey; Lagunin, Alexey; Gloriozova, Tatyana; Novikov, Sergey; Skvortsova, Natalia; Druzhilovsky, Dmitry; Chawla, Sunil; Ghosh, Indira; Ray, Surajit; Patel, Hitesh; Escher, Sylvia

Collaborative development of predictive toxicology applications Journal Article

In: Journal of Cheminformatics, vol. 2, no. 1, pp. 7, 2010, ISSN: 1758-2946.

@article{hardy2010collaborative,

title = {Collaborative development of predictive toxicology applications},

author = {Barry Hardy and Nicki Douglas and Christoph Helma and Micha Rautenberg and Nina Jeliazkova and Vedrin Jeliazkov and Ivelina Nikolova and Romualdo Benigni and Olga Tcheremenskaia and Stefan Kramer and Tobias Girschick and Fabian Buchwald and J\"{o}rg Wicker and Andreas Karwath and Martin G\"{u}tlein and Andreas Maunz and Haralambos Sarimveis and Georgia Melagraki and Antreas Afantitis and Pantelis Sopasakis and David Gallagher and Vladimir Poroikov and Dmitry Filimonov and Alexey Zakharov and Alexey Lagunin and Tatyana Gloriozova and Sergey Novikov and Natalia Skvortsova and Dmitry Druzhilovsky and Sunil Chawla and Indira Ghosh and Surajit Ray and Hitesh Patel and Sylvia Escher},

url = {http://www.jcheminf.com/content/2/1/7},

doi = {10.1186/1758-2946-2-7},

issn = {1758-2946},

year  = {2010},

date = {2010-01-01},

journal = {Journal of Cheminformatics},

volume = {2},

number = {1},

pages = {7},

abstract = {OpenTox provides an interoperable, standards-based Framework for the support of predictive toxicology data management, algorithms, modelling, validation and reporting. It is relevant to satisfying the chemical safety assessment requirements of the REACH legislation as it supports access to experimental data, (Quantitative) Structure-Activity Relationship models, and toxicological information through an integrating platform that adheres to regulatory requirements and OECD validation principles. Initial research defined the essential components of the Framework including the approach to data access, schema and management, use of controlled vocabularies and ontologies, architecture, web service and communications protocols, and selection and integration of algorithms for predictive modelling. OpenTox provides end-user oriented tools to non-computational specialists, risk assessors, and toxicological experts in addition to Application Programming Interfaces (APIs) for developers of new applications. OpenTox actively supports public standards for data representation, interfaces, vocabularies and ontologies, Open Source approaches to core platform components, and community-based collaboration approaches, so as to progress system interoperability goals.The OpenTox Framework includes APIs and services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, and reporting which may be combined into multiple applications satisfying a variety of different user needs. OpenTox applications are based on a set of distributed, interoperable OpenTox API-compliant REST web services. The OpenTox approach to ontology allows for efficient mapping of complementary data coming from different datasets into a unifying structure having a shared terminology and representation.Two initial OpenTox applications are presented as an illustration of the potential impact of OpenTox for high-quality and consistent structure-activity relationship modelling of REACH-relevant endpoints: ToxPredict which predicts and reports on toxicities for endpoints for an input chemical structure, and ToxCreate which builds and validates a predictive toxicity model based on an input toxicology dataset. Because of the extensible nature of the standardised Framework design, barriers of interoperability between applications and content are removed, as the user may combine data, models and validation from multiple sources in a dependable and time-effective way.},

keywords = {cheminformatics, computational sustainability, data mining, machine learning, REST, toxicity},

pubstate = {published},

tppubtype = {article}

}

Close

OpenTox provides an interoperable, standards-based Framework for the support of predictive toxicology data management, algorithms, modelling, validation and reporting. It is relevant to satisfying the chemical safety assessment requirements of the REACH legislation as it supports access to experimental data, (Quantitative) Structure-Activity Relationship models, and toxicological information through an integrating platform that adheres to regulatory requirements and OECD validation principles. Initial research defined the essential components of the Framework including the approach to data access, schema and management, use of controlled vocabularies and ontologies, architecture, web service and communications protocols, and selection and integration of algorithms for predictive modelling. OpenTox provides end-user oriented tools to non-computational specialists, risk assessors, and toxicological experts in addition to Application Programming Interfaces (APIs) for developers of new applications. OpenTox actively supports public standards for data representation, interfaces, vocabularies and ontologies, Open Source approaches to core platform components, and community-based collaboration approaches, so as to progress system interoperability goals.The OpenTox Framework includes APIs and services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, and reporting which may be combined into multiple applications satisfying a variety of different user needs. OpenTox applications are based on a set of distributed, interoperable OpenTox API-compliant REST web services. The OpenTox approach to ontology allows for efficient mapping of complementary data coming from different datasets into a unifying structure having a shared terminology and representation.Two initial OpenTox applications are presented as an illustration of the potential impact of OpenTox for high-quality and consistent structure-activity relationship modelling of REACH-relevant endpoints: ToxPredict which predicts and reports on toxicities for endpoints for an input chemical structure, and ToxCreate which builds and validates a predictive toxicity model based on an input toxicology dataset. Because of the extensible nature of the standardised Framework design, barriers of interoperability between applications and content are removed, as the user may combine data, models and validation from multiple sources in a dependable and time-effective way.

Close

9.

Wicker, Jörg; Fenner, Kathrin; Ellis, Lynda; Wackett, Larry; Kramer, Stefan

Predicting biodegradation products and pathways: a hybrid knowledge- and machine learning-based approach Journal Article

In: Bioinformatics, vol. 26, no. 6, pp. 814-821, 2010.

8.

Wicker, Jörg; Richter, Lothar; Kramer, Stefan

SINDBAD and SiQL: Overview, Applications and Future Developments Book Section

In: Džeroski, Sašo; Goethals, Bart; Panov, Panče (Ed.): Inductive Databases and Constraint-Based Data Mining, pp. 289-309, Springer New York, 2010, ISBN: 978-1-4419-7737-3.

7.

Korošec, Peter; Taskova, Katerina; Šilc, Jury

The differential Ant-Stigmergy Algorithm for large-scale global optimization Proceedings Article

In: IEEE Congress on Evolutionary Computation, pp. 1-8, 2010.

Links | BibTeX | Altmetric | PlumX | Tags:

6.

Taškova, Katerina; Korošec, Peter; Šilc, Jurij

A Distributed Multilevel Ant-Colony Approach for Finite Element Mesh Decomposition Proceedings Article

In: Wyrzykowski, Roman; Dongarra, Jack; Karczewski, Konrad; Wasniewski, Jerzy (Ed.): Parallel Processing and Applied Mathematics, pp. 398–407, Springer Berlin Heidelberg, Berlin, Heidelberg, 2010, ISBN: 978-3-642-14403-5.

Abstract | BibTeX | Tags:

5.

Wicker, Jörg; Richter, Lothar; Kessler, Kristina; Kramer, Stefan

SINDBAD and SiQL: An Inductive Database and Query Language in the Relational Model Proceedings Article

In: Daelemans, Walter; Goethals, Bart; Morik, Katharina (Ed.): Machine Learning and Knowledge Discovery in Databases, pp. 690-694, Springer Berlin Heidelberg, 2008, ISBN: 978-3-540-87480-5.

4.

Richter, Lothar; Wicker, Jörg; Kessler, Kristina; Kramer, Stefan

An Inductive Database and Query Language in the Relational Model Proceedings Article

In: Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, pp. 740–744, ACM, 2008, ISBN: 978-1-59593-926-5.

3.

Wicker, Jörg; Brosdau, Christoph; Richter, Lothar; Kramer, Stefan

SINDBAD SAILS: A Service Architecture for Inductive Learning Schemes Proceedings Article

In: Proceedings of the First Workshop on Third Generation Data Mining: Towards Service-Oriented Knowledge Discovery, 2008.

Abstract | Links | BibTeX | Tags: data mining, inductive databases, machine learning, query languages

2.

Wicker, Jörg; Fenner, Kathrin; Ellis, Lynda; Wackett, Larry; Kramer, Stefan

Machine Learning and Data Mining Approaches to Biodegradation Pathway Prediction Proceedings Article

In: Bridewell, Will; Calders, Toon; Medeiros, Ana Karla; Kramer, Stefan; Pechenizkiy, Mykola; Todorovski, Ljupco (Ed.): Proceedings of the Second International Workshop on the Induction of Process Models at ECML PKDD 2008, 2008.

Links | BibTeX | Tags: biodegradation, cheminformatics, computational sustainability, enviPath, machine learning, metabolic pathways

1.

Kramer, Stefan; Aufschild, Volker; Hapfelmeier, Andreas; Jarasch, Alexander; Kessler, Kristina; Reckow, Stefan; Wicker, Jörg; Richter, Lothar

Inductive Databases in the Relational Model: The Data as the Bridge Proceedings Article

In: Bonchi, Francesco; Boulicaut, Jean-François (Ed.): Knowledge Discovery in Inductive Databases, pp. 124-138, Springer Berlin Heidelberg, 2006, ISBN: 978-3-540-33292-3.