Select Page

# Publications

## Our published Work!

### 2020

• D. Seçilmic{s}, D. Morgan, T. Hillerton, A. Tjärnberg, S. Nelander, T. E. M. Nordling, and E. L. L. Sonnhammer, “Uncovering cancer gene regulation by accurate regulatory network inference from uninformative data,” , p. submitted, 2020.
[BibTeX] [Abstract]

The interactions among the components of a living cell that constitute the gene regulatory network (GRN) can be inferred from perturbation-based gene expression data. Such networks are useful for providing mechanistic insights of a biological system. In order to explore the feasibility and quality of GRN inference at a large scale, we used the L1000 data where approximately 1000 genes have been perturbed and their expression levels have been quantified in 9 cancer cell lines. We found that these datasets have a very low signal-to-noise ratio (SNR) level causing them to be too uninformative to infer accurate GRNs. We developed a gene reduction pipeline in which we eliminate uninformative genes from the system using a selection criterion based on SNR, until reaching an informative subset. The results show that our pipeline can identify an informative subset in an overall uninformative dataset, allowing inference of accurate subset GRNs. The accurate GRNs were functionally characterized and ​potential ​novel cancer-related regulatory interactions were identified.

@article{Secilmis2020,
abstract = {The interactions among the components of a living cell that constitute the gene regulatory network (GRN) can be inferred from perturbation-based gene expression data. Such networks are useful for providing mechanistic insights of a biological system. In order to explore the feasibility and quality of GRN inference at a large scale, we used the L1000 data where approximately 1000 genes have been perturbed and their expression levels have been quantified in 9 cancer cell lines. We found that these datasets have a very low signal-to-noise ratio (SNR) level causing them to be too uninformative to infer accurate GRNs. We developed a gene reduction pipeline in which we eliminate uninformative genes from the system using a selection criterion based on SNR, until reaching an informative subset. The results show that our pipeline can identify an informative subset in an overall uninformative dataset, allowing inference of accurate subset GRNs. The accurate GRNs were functionally characterized and ​potential ​novel cancer-related regulatory interactions were identified.},
author = {Se\c{c}ilmi\c{s}, Deniz and Daniel Morgan and Thomas Hillerton and Tj{\"{a}}rnberg, Andreas and Sven Nelander and Nordling, Torbj{\"{o}}rn E M and Erik L L Sonnhammer},
keywords = {Network inference},
mendeley-groups = {Unpublished},
mendeley-tags = {Network inference},
pages = {submitted},
title = {{Uncovering cancer gene regulation by accurate regulatory network inference from uninformative data}},
year = {2020}
}

### 2019

• J. . Yu, C. . Wang, H. . Wu, S. Zhang, and T. E. M. Nordling, Smart toys and tools for quantification of development of infants and symptoms of Parkinson’s patientsOslo: Ieee, 2019.

Despite the revolutionary progress in understanding of human and artificial intelligence over the past five years, characterised by Hawkins’ the Thousand Brains Theory of Intelligence and AlphaGo’s victory over Go world champions Lee Sedol and Ke Jie, many questions remain. We are interested in the development of the sensorimotor function in infants and degradation of it in Parkinson’s patients. We believe that new insights could be gained through long-term longitudinal studies with daily quantification of the motor skills. Towards this aim we here report on four prototype tools for data collection from infants or Parkinson’s patients–a smart drum, a smart bongo, a smart pacifier, and an App for quantifying finger tapping. The smart drum and bongo are commercial toys updated with embedded electronics for internet access and two-way communi- cation with the toys through the chat App Telegram. The smart pacifier is based on a Philips Soothie pacifier updated with an inertial measurement unit (IMU) with Bluetooth for tracking of movement in real-time. The App for quantifying finger tapping is based on the Movement Disorder Society-Sponsored Revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) finger tap test that is widely used to evaluate Bradykinesia (slowness and difficulty to start movement). We report on the design and first field test. These tools show potential for long- term longitudinal data collection in the home of children or Parkinson’s patients. We welcome suggestions on how to improve the design and new collaborations applying these tools.

@misc{Yu2019ICDL-EpiRob,
abstract = {Despite the revolutionary progress in understanding of human and artificial intelligence over the past five years, characterised by Hawkins' the Thousand Brains Theory of Intelligence and AlphaGo's victory over Go world champions Lee Sedol and Ke Jie, many questions remain. We are interested in the development of the sensorimotor function in infants and degradation of it in Parkinson's patients. We believe that new insights could be gained through long-term longitudinal studies with daily quantification of the motor skills. Towards this aim we here report on four prototype tools for data collection from infants or Parkinson's patients–a smart drum, a smart bongo, a smart pacifier, and an App for quantifying finger tapping. The smart drum and bongo are commercial toys updated with embedded electronics for internet access and two-way communi- cation with the toys through the chat App Telegram. The smart pacifier is based on a Philips Soothie pacifier updated with an inertial measurement unit (IMU) with Bluetooth for tracking of movement in real-time. The App for quantifying finger tapping is based on the Movement Disorder Society-Sponsored Revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS) finger tap test that is widely used to evaluate Bradykinesia (slowness and difficulty to start movement). We report on the design and first field test. These tools show potential for long- term longitudinal data collection in the home of children or Parkinson's patients. We welcome suggestions on how to improve the design and new collaborations applying these tools.},
author = {Yu, Jing-Chi Alan' and Wang, Chien-Chih Jack' and Wu, Hsi-Chih Butters' and Zhang, Shun-Jie and Nordling, Torbj{\"{o}}rn E. M.},
booktitle = {9th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, Oslo (Norway)},
howpublished = {9th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, Oslo (Norway)},
keywords = {IoT,Smart toys},
mendeley-tags = {IoT,Smart toys},
month = {aug},
publisher = {IEEE},
title = {{Smart toys and tools for quantification of development of infants and symptoms of Parkinson's patients}},
type = {misc},
url = {https://icdl-epirob2019.org},
year = {2019}
}

• T. E. M. Nordling, “Automation, Digitalisation, Industry 4.0, and Artificial Intelligence Trends and Case Studies,” in The 4th international conference on mechanical engineering (icome 2019) in yogyakarta (indonesia), Yogyakarta, Indonesia, 2019.

@inproceedings{Nordling2019ICOME,
author = {Nordling, Torbj{\"{o}}rn E. M.},
booktitle = {The 4th International Conference on Mechanical Engineering (ICOME 2019) in Yogyakarta (Indonesia)},
editor = {Mubarok, Fahmi},
keywords = {Automation,Digitalisation,Industry 4.0,artificial intelligence,deep learning},
month = {aug},
publisher = {Dept. of Mechanical Engineering, Institut Teknologi Sepuluh Nopember},
title = {{Automation, Digitalisation, Industry 4.0, and Artificial Intelligence Trends and Case Studies}},
url = {https://elib.its.ac.id/conf/icome/?page_id=147},
year = {2019}
}

• F. Hsia, D. Tang, W. Jevasuwan, N. Fukata, X. Zhou, M. Mitome, Y. Bando, T. E. M. Nordling, and D. Golberg, “Realization and direct observation of five normal and parametric modes in silicon nanowire resonators by in situ transmission electron microscopy,” Nanoscale advances, vol. 1, iss. 5, p. 1784–1790, 2019. doi:10.1039/C8NA00373D

Mechanical resonators have wide applications in sensing motions, chemical and bio-substances, and provide an accurate method to measure intrinsic elastic properties of the oscillating material. A high resonant order with high response frequency and a small resonator mass is critical for enhancing the sensibility and precision. Here, we report on the realization and direct observation of high-order and high-frequency silicon nanowire (Si NW) resonators. By using an oscillating electric- field inducing a mechanical resonance of the single-crystalline Si NWs inside a transmission electron microscope (TEM), we observed the resonance up to the 5th order, for both normal and parametric modes at $\sim$100 MHz frequencies. The precision of the resonant frequency was enhanced from 3.14\% at the 1st order to 0.25 \% at the 5th order, correlating to the increase of energy dissipation. The elastic modulus of Si NWs was measured to be $\sim$169 GPa in [110] direction, size scaling effects were found to be absent down to $\sim$20 nm level.

@article{Hsia2019,
abstract = {Mechanical resonators have wide applications in sensing motions, chemical and bio-substances, and provide an accurate method to measure intrinsic elastic properties of the oscillating material. A high resonant order with high response frequency and a small resonator mass is critical for enhancing the sensibility and precision. Here, we report on the realization and direct observation of high-order and high-frequency silicon nanowire (Si NW) resonators. By using an oscillating electric- field inducing a mechanical resonance of the single-crystalline Si NWs inside a transmission electron microscope (TEM), we observed the resonance up to the 5th order, for both normal and parametric modes at $\sim$100 MHz frequencies. The precision of the resonant frequency was enhanced from 3.14\% at the 1st order to 0.25 \% at the 5th order, correlating to the increase of energy dissipation. The elastic modulus of Si NWs was measured to be $\sim$169 GPa in [110] direction, size scaling effects were found to be absent down to $\sim$20 nm level.},
author = {Feng-Chun Hsia and Dai-Ming Tang and Wipakorn Jevasuwan and Naoki Fukata and Xin Zhou and Masanori Mitome and Yoshio Bando and Nordling, Torbj{\"{o}}rn E. M. and Dmitri Golberg},
doi = {10.1039/C8NA00373D},
file = {:Users/tn/Articles/Mendeley_collection/Nanoscale Advances/Hsia et al._2019.pdf:pdf},
issn = {2516-0230},
month = {feb},
number = {5},
pages = {1784--1790},
publisher = {RSC},
title = {{Realization and direct observation of five normal and parametric modes in silicon nanowire resonators by in situ transmission electron microscopy}},
volume = {1},
year = {2019}
}

• B. J. A. van Bueren, M. A. A. M. Leenders, and T. E. M. Nordling, “Case Study: Taiwan’s pathway into a circular future for buildings,” Iop conference series: earth and environmental science, vol. 225, p. 12060, 2019. doi:10.1088/1755-1315/225/1/012060

The aim of this paper is to explore successful paths and potential obstacles for introducing circular buildings to a region new to the strategy of Circular Economy (CE). For this, the process of circular buildings development in Taiwan is analysed. In 2016, the government of Taiwan passed an act that put a focus on CE. Taiwan entered this field with nearly no prior experience. This paper analyses three cases: The Holland Pavilion for the World Flora Expo Taichung; the TaiSugar Circular Village Tainan; and the CE Social Housing Taipei. Interestingly, Taiwan choose the Netherlands as a country for guidance on best practices and the path to implementation. Our analysis focuses on barriers and opportunities found in the initiation, commissioning, and the ongoing development process of these projects. Data is collected through interviews with 30 stakeholders, from government, industries and academia who are involved in the projects. International collaboration is shown to have speeded up the CE building innovation process in Taiwan.

@article{vanBueren2019,
abstract = {The aim of this paper is to explore successful paths and potential obstacles for introducing circular buildings to a region new to the strategy of Circular Economy (CE). For this, the process of circular buildings development in Taiwan is analysed. In 2016, the government of Taiwan passed an act that put a focus on CE. Taiwan entered this field with nearly no prior experience. This paper analyses three cases: The Holland Pavilion for the World Flora Expo Taichung; the TaiSugar Circular Village Tainan; and the CE Social Housing Taipei. Interestingly, Taiwan choose the Netherlands as a country for guidance on best practices and the path to implementation. Our analysis focuses on barriers and opportunities found in the initiation, commissioning, and the ongoing development process of these projects. Data is collected through interviews with 30 stakeholders, from government, industries and academia who are involved in the projects. International collaboration is shown to have speeded up the CE building innovation process in Taiwan.},
author = {Bart J A van Bueren and Mark A A M Leenders and Nordling, Torbj{\"{o}}rn E M},
doi = {10.1088/1755-1315/225/1/012060},
howpublished = {SBE19 Brussels - BAMB-CIRCPATH "Buildings as Material Banks - A Pathway For A Circular Future" 5–7 February 2019, Brussels, Belgium},
issn = {1755-1315},
journal = {IOP Conference Series: Earth and Environmental Science},
keywords = {Circular buildings,Taiwan},
month = {feb},
pages = {012060},
publisher = {IOP Publishing},
title = {{Case Study: Taiwan's pathway into a circular future for buildings}},
url = {http://stacks.iop.org/1755-1315/225/i=1/a=012060?key=crossref.1fd4cbb2e2f7a49780cb4acf73093d80},
volume = {225},
year = {2019}
}

• D. Morgan, A. Tjärnberg, T. E. M. Nordling, and E. L. L. Sonnhammer, “A generalized framework for controlling FDR in gene regulatory network inference,” Bioinformatics, vol. 35, iss. 6, p. 1026–1032, 2019. doi:10.1093/bioinformatics/bty764

Motivation: Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method-specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied. Results: To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences. Availability: https://bitbucket.org/sonnhammergrni/genespider/src/NB/%2BMethods/NestBoot.m

@article{Morgan2019,
abstract = {Motivation: Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method-specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied. Results: To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences. Availability: https://bitbucket.org/sonnhammergrni/genespider/src/NB/%2BMethods/NestBoot.m},
author = {Daniel Morgan and Tj{\"{a}}rnberg, Andreas and Nordling, Torbj{\"{o}}rn E M and Erik L L Sonnhammer},
doi = {10.1093/bioinformatics/bty764},
editor = {Berger, Bonnie},
issn = {1367-4803},
journal = {Bioinformatics},
month = {mar},
number = {6},
pages = {1026--1032},
title = {{A generalized framework for controlling FDR in gene regulatory network inference}},
volume = {35},
year = {2019}
}

### 2018

• Y. Liou, P. Tsai, Z. Huang, P. Chiou, H. Chu, L. Ciou, and T. E. M. Nordling, “FFANEprot: Predicting Protein Functions using a Weight-sharing Multitask Neural Network Optimized by a Firefly Algorithm with Natural Enemy Strategy,” in 17th international conference on bioinformatics (incob-2018), New Delhi, India, 2018.

Background: The prediction of multiple functions of several proteins at once from the protein sequence alone is essential, but difficult. To solve this problem, we composed a dataset of 81,267 proteins and 1,169 Gene Ontology (GO) terms of the molecular function (MF) from the Swiss-Prot database, and used weight-sharing and multi-task learning to create FFANEprot. Results: The architecture of FFANEprot was optimised by a Firefly algorithm with a natural enemy strategy (FFANE), i.e. periodic reversals. The training and test Matthews correlation coefficients (accuracies) are 0.52 (98.84%) and 0.49 (98.67%), respectively. When analysing the trained networks, we found many completely inhibitory neurons, which typically have a small kernel size and occupy approximately 30% of the CNNs. Conclusion: FFANEprot can predict GO MF terms with high accuracy from sequence alone. Our FFANEnet source code is available at http://ffanenet.nordlinglab.org.

@inproceedings{Liou2018,
abstract = {Background: The prediction of multiple functions of several proteins at once from the protein sequence alone is essential, but difficult. To solve this problem, we composed a dataset of 81,267 proteins and 1,169 Gene Ontology (GO) terms of the molecular function (MF) from the Swiss-Prot database, and used weight-sharing and multi-task learning to create FFANEprot. Results: The architecture of FFANEprot was optimised by a Firefly algorithm with a natural enemy strategy (FFANE), i.e. periodic reversals. The training and test Matthews correlation coefficients (accuracies) are 0.52 (98.84%) and 0.49 (98.67%), respectively. When analysing the trained networks, we found many completely inhibitory neurons, which typically have a small kernel size and occupy approximately 30% of the CNNs. Conclusion: FFANEprot can predict GO MF terms with high accuracy from sequence alone. Our FFANEnet source code is available at http://ffanenet.nordlinglab.org.},
annote = {On behalf of the Asia Pacific Bioinformatics Network (APBioNet) and organizing partners, it is our great honour to invite you to join the 17th International Conference On BioInformatics (InCOB-2018), which will be held on 26-28 September, 2018 at Jawaharlal Nehru University (JNU), New Delhi.},
author = {Yi-Fan Liou and Po-Jung Tsai and Zi-Yu Huang and Po-Chin Chiou and Hsiao-Wei Chu and Li-Ping Ciou and Nordling, Torbj{\"{o}}rn E. M.},
booktitle = {17th International Conference on Bioinformatics (INCoB-2018)},
keywords = {Convolutional neural network,Deep learning,Evolutionary algorithm,Firefly algorithm,Inhibitory neurons,Natural enemy strategy},
month = {sep},
publisher = {Asia Pacific Bioinformatics Network (APBioNet)},
title = {{FFANEprot: Predicting Protein Functions using a Weight-sharing Multitask Neural Network Optimized by a Firefly Algorithm with Natural Enemy Strategy}},
type = {oral presentation},
url = {http://www.incob2018.org/},
year = {2018}
}

• Y. Wu, R. Yamanaka, N. Hiroi, and T. E. M. Nordling, Image Analysis for Cell Adaptation Against Environmental StressSanyo-Onoda: Sanyo-onoda city university, 2018.

Introduction of the study of cell adaption against the environmental stress by using microfluidic device and image analysis technique.

@misc{Rain2018NorikoPoster,
abstract = {Introduction of the study of cell adaption against the
environmental stress by using microfluidic device and image analysis technique.},
author = {Wu, Yu-Heng and Yamanaka, Ryu and Noriko Hiroi and Nordling, Torbj{\"{o}}rn E M},
booktitle = {CELLAB-SOCU Summer Symposium in Sanyo-Onoda, Japan},
editor = {Noriko Hiroi},
howpublished = {CELLAB-SOCU Summer Symposium},
keywords = {Cell segmentation,Microfluidic},
mendeley-tags = {Cell segmentation,Microfluidic},
month = {sep},
publisher = {Sanyo-Onoda City University},
title = {{Image Analysis for Cell Adaptation Against Environmental Stress}},
type = {Poster},
url = {http://pc4ls.rs.socu.ac.jp/cellab-socu_sympo.html},
year = {2018}
}

• W. Wu, H. Tu, Y. Chu, T. E. M. Nordling, Y. Tseng, and H. Liaw, “YHMI: a web tool to identify histone modifications and histone/chromatin regulators from a gene list in yeast,” Database, vol. 2018, 2018. doi:10.1093/database/bay116

Post-translational modifications of histones (e.g. acetylation, methylation, phosphorylation and ubiquitination) play crucial roles in regulating gene expression by altering chromatin structures and creating docking sites for histone/chromatin regulators. However, the combination patterns of histone modifications, regulatory proteins and their corresponding target genes remain incompletely understood. Therefore, it is advantageous to have a tool for the enrichment/depletion analysis of histone modifications and histone/chromatin regulators from a gene list. Many ChIP-chip/ChIP-seq datasets of histone modifications and histone/chromatin regulators in yeast can be found in the literature. Knowing the needs and having the data motivate us to develop a web tool, called Yeast Histone Modifications Identifier (YHMI), which can identify the enriched/depleted histone modifications and the enriched histone/chromatin regulators from a list of yeast genes. Both tables and figures are provided to visualize the identification results. Finally, the high-quality and biological insight of the identification results are demonstrated by two case studies. We believe that YHMI is a valuable tool for yeast biologists to do epigenetics research.

@article{Wu2018YHMI,
abstract = {Post-translational modifications of histones (e.g. acetylation, methylation, phosphorylation and ubiquitination) play crucial roles in regulating gene expression by altering chromatin structures and creating docking sites for histone/chromatin regulators. However, the combination patterns of histone modifications, regulatory proteins and their corresponding target genes remain incompletely understood. Therefore, it is advantageous to have a tool for the enrichment/depletion analysis of histone modifications and histone/chromatin regulators from a gene list. Many ChIP-chip/ChIP-seq datasets of histone modifications and histone/chromatin regulators in yeast can be found in the literature. Knowing the needs and having the data motivate us to develop a web tool, called Yeast Histone Modifications Identifier (YHMI), which can identify the enriched/depleted histone modifications and the enriched histone/chromatin regulators from a list of yeast genes. Both tables and figures are provided to visualize the identification results. Finally, the high-quality and biological insight of the identification results are demonstrated by two case studies. We believe that YHMI is a valuable tool for yeast biologists to do epigenetics research.},
author = {Wei-Sheng Wu and Hao-Ping Tu and Yu-Han Chu and Nordling, Torbj{\"{o}}rn E M and Yan-Yuan Tseng and Hung-Jiun Liaw},
doi = {10.1093/database/bay116},
issn = {1758-0463},
journal = {Database},
month = {jan},
pmid = {30371756},
title = {{YHMI: a web tool to identify histone modifications and histone/chromatin regulators from a gene list in yeast}},
volume = {2018},
year = {2018}
}

• S. Fourati, A. Talla, M. Mahmoudian, J. G. Burkhart, R. Klén, R. Henao, T. Yu, Z. Aydın, K. Y. Yeung, M. E. Ahsen, R. Almugbel, S. Jahandideh, X. Liang, T. E. M. Nordling, M. Shiga, A. Stanescu, R. Vogel, G. Pandey, C. Chiu, M. T. McClain, C. W. Woods, G. S. Ginsburg, L. L. Elo, E. L. Tsalik, L. M. Mangravite, and S. K. Sieberts, “A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection,” Nature communications, vol. 9, iss. 1, p. 4418, 2018. doi:10.1038/s41467-018-06735-8

The response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral blood gene expression profiles collected from healthy subjects prior to exposure to one of four respiratory viruses (H1N1, H3N2, Rhinovirus, and RSV), as well as up to 24 h following exposure, we find that it is possible to construct models predictive of symptomatic response using profiles even prior to viral exposure. Analysis of predictive gene features reveal little overlap among models; however, in aggregate, these genes are enriched for common pathways. Heme metabolism, the most significantly enriched pathway, is associated with a higher risk of developing symptoms following viral exposure. This study demonstrates that pre-exposure molecular predictors can be identified and improves our understanding of the mechanisms of response to respiratory viruses.

@article{Fourati2018,
abstract = {The response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral blood gene expression profiles collected from healthy subjects prior to exposure to one of four respiratory viruses (H1N1, H3N2, Rhinovirus, and RSV), as well as up to 24 h following exposure, we find that it is possible to construct models predictive of symptomatic response using profiles even prior to viral exposure. Analysis of predictive gene features reveal little overlap among models; however, in aggregate, these genes are enriched for common pathways. Heme metabolism, the most significantly enriched pathway, is associated with a higher risk of developing symptoms following viral exposure. This study demonstrates that pre-exposure molecular predictors can be identified and improves our understanding of the mechanisms of response to respiratory viruses.},
author = {Slim Fourati and Aarthi Talla and Mehrad Mahmoudian and Joshua Burkhart G and Kl{\'{e}}n, Riku and Ricardo Henao and Thomas Yu and Zafer Aydın and Ka Yee Yeung and Mehmet Ahsen Eren and Reem Almugbel and Samad Jahandideh and Xiao Liang and Nordling, Torbj{\"{o}}rn E M and Motoki Shiga and Ana Stanescu and Robert Vogel and Gaurav Pandey and Chiu, Christopher and McClain, Micah T and Woods, Christopher W and Ginsburg, Geoffrey S and Laura Elo L and Ephraim Tsalik L and Lara Mangravite M and Solveig Sieberts K},
doi = {10.1038/s41467-018-06735-8},
issn = {2041-1723},
journal = {Nature Communications},
month = {dec},
number = {1},
pages = {4418},
pmid = {30356117},
title = {{A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection}},
url = {http://biorxiv.org/content/early/2018/04/30/311696.abstract http://www.ncbi.nlm.nih.gov/pubmed/30356117 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC6200745 http://www.nature.com/articles/s41467-018-06735-8},
volume = {9},
year = {2018}
}

• T. E. M. Nordling, From quantum uncertainty to reliable network inference with and without deep neural networksSanyo-Onoda: Sanyo-onoda city university, 2018.

In this talk I will discuss when an interaction matter from a quantum mechanical to control theoretical perspective. I will also give an introduction to network inference and precent recent advances.

@misc{Nordling2018Noriko,
abstract = {In this talk I will discuss when an interaction matter from a quantum mechanical to control theoretical perspective. I will also give an introduction to network inference and precent recent advances.},
author = {Nordling, Torbj{\"{o}}rn E. M.},
booktitle = {CELLAB-SOCU Summer Symposium in Sanyo-Onoda, Japan},
editor = {Noriko Hiroi},
howpublished = {CELLAB-SOCU Summer Symposium},
keywords = {Network inference},
mendeley-tags = {Network inference},
month = {sep},
pages = {0},
publisher = {Sanyo-Onoda City University},
title = {{From quantum uncertainty to reliable network inference with and without deep neural networks}},
type = {Symposium},
url = {http://pc4ls.rs.socu.ac.jp/cellab-socu_sympo.html},
year = {2018}
}

• W. Wu, Y. Jiang, J. Chang, Y. Chu, Y. Chiu, Y. Tsao, T. E. M. Nordling, Y. Tseng, and J. T. Tseng, “HRPDviewer: human ribosome profiling data viewer,” Database, vol. 2018, 2018. doi:10.1093/database/bay074

Translational regulation plays an important role in protein synthesis. Dysregulation of translation causes abnormal cell physiology and leads to diseases such as inflammatory disorders and cancers. An emerging technique, called ribosome profiling (ribo-seq), was developed to capture a snapshot of translation. It is based on deep sequencing of ribosome-protected mRNA fragments. A lot of ribo-seq data have been generated in various studies, so databases are needed for depositing and visualizing the published ribo-seq data. Nowadays, GWIPS-viz, RPFdb and TranslatomeDB are the three largest databases developed for this purpose. However, two challenges remain to be addressed. First, GWIPS-viz and RPFdb databases align the published ribo-seq data to the genome. Since ribo-seq data aim to reveal the actively translated mRNA transcripts, there are advantages of aligning ribo-req data to the transcriptome over the genome. Second, TranslatomeDB does not provide any visualization and the other two databases only pro- vide visualization of the ribo-seq data around a specific genomic location, while simulta- neous visualization of the ribo-seq data on multiple mRNA transcripts produced from the same gene or different genes is desired. To address these two challenges, we developed the Human Ribosome Profiling Data viewer (HRPDviewer). HRPDviewer (i) contains 610 published human ribo-seq datasets from Gene Expression Omnibus, (ii) aligns the ribo- seq data to the transcriptome and (iii) provides visualization of the ribo-seq data on the selected mRNA transcripts. Using HRPDviewer, researchers can compare the ribosome binding patterns of multiple mRNA transcripts from the same gene or different genes to gain an accurate understanding of protein synthesis in human cells. We believe that HRPDviewer is a useful resource for researchers to study translational regulation in human.

@article{Wu2018HRPDviewer,
abstract = {Translational regulation plays an important role in protein synthesis. Dysregulation of translation causes abnormal cell physiology and leads to diseases such as inflammatory disorders and cancers. An emerging technique, called ribosome profiling (ribo-seq), was developed to capture a snapshot of translation. It is based on deep sequencing of ribosome-protected mRNA fragments. A lot of ribo-seq data have been generated in various studies, so databases are needed for depositing and visualizing the published ribo-seq data. Nowadays, GWIPS-viz, RPFdb and TranslatomeDB are the three largest databases developed for this purpose. However, two challenges remain to be addressed. First, GWIPS-viz and RPFdb databases align the published ribo-seq data to the genome. Since ribo-seq data aim to reveal the actively translated mRNA transcripts, there are advantages of aligning ribo-req data to the transcriptome over the genome. Second, TranslatomeDB does not provide any visualization and the other two databases only pro- vide visualization of the ribo-seq data around a specific genomic location, while simulta- neous visualization of the ribo-seq data on multiple mRNA transcripts produced from the same gene or different genes is desired. To address these two challenges, we developed the Human Ribosome Profiling Data viewer (HRPDviewer). HRPDviewer (i) contains 610 published human ribo-seq datasets from Gene Expression Omnibus, (ii) aligns the ribo- seq data to the transcriptome and (iii) provides visualization of the ribo-seq data on the selected mRNA transcripts. Using HRPDviewer, researchers can compare the ribosome binding patterns of multiple mRNA transcripts from the same gene or different genes to gain an accurate understanding of protein synthesis in human cells. We believe that HRPDviewer is a useful resource for researchers to study translational regulation in human.},
author = {Wei-Sheng Wu and Yu-Xuan Jiang and Jer-Wei Chang and Yu-Han Chu and Yi-Hao Chiu and Yi-Hong Tsao and Nordling, Torbj{\"{o}}rn E M and Yan-Yuan Tseng and Joseph T Tseng},
doi = {10.1093/database/bay074},
issn = {1758-0463},
journal = {Database},
month = {jan},
title = {{HRPDviewer: human ribosome profiling data viewer}},
volume = {2018},
year = {2018}
}

• T. E. M. Nordling, Learning from data in machine and manTaipei, Taiwan: Ministry of science and technology in taiwan, 2018.
[BibTeX] [Abstract]

Artificial intelligence (AI), in particular Deep learning, has since the ImageNet LSVRC-2012 contest established itself as a core technology driving the 3rd industrial revolution with many commercial applications. This rapid success of Artificial narrow intelligence (ANI) is due to four factors: big labelled data, GPU accelerated distributed computing, open source software, and algorithms. Training of artificial neural networks (ANNs) in general require thousands, if not millions, of examples, while humans can learn from a single example. Human like unsupervised learning has by Facebook’s Yann LeCun been called the “holy grail” of AI research. In my lab, we are currently building smart toys to collect longitudinal data on how children learn with the aim of creating more data efficient training methods. While zero-shot and one-shot learning are powerful methods for particular problems, they do not in general improve data efficiency. We are also exploring the use of evolutionary algorithms to find a good ANN architecture for a specific problem, which I will exemplify through our recent work on prediction of multiple functions of several proteins at once from the protein sequence alone. To solve this problem, we composed a dataset of 81,267 proteins and 1,169 GO terms of molecular function from Swiss-Prot and used weight-sharing and multi-task learning to create FFANEprot. The architecture of the convolutional neural network (CNN) of FFANEprot was optimised by a Firefly algorithm with natural enemy strategy, which can reduce the probability of being trapped in local optima during the evolutionary process. The training and test accuracies are 98.84% and 98.67%, which is better than all the conventional CNN architectures investigated. If time allows, then I will also show some ongoing work on functional magnetic resonance imaging (fMRI) data.

@misc{Nordling2018NICT,
abstract = {Artificial intelligence (AI), in particular Deep learning, has since the ImageNet LSVRC-2012 contest established itself as a core technology driving the 3rd industrial revolution with many commercial applications. This rapid success of Artificial narrow intelligence (ANI) is due to four factors: big labelled data, GPU accelerated distributed computing, open source software, and algorithms. Training of artificial neural networks (ANNs) in general require thousands, if not millions, of examples, while humans can learn from a single example. Human like unsupervised learning has by Facebook's Yann LeCun been called the “holy grail” of AI research. In my lab, we are currently building smart toys to collect longitudinal data on how children learn with the aim of creating more data efficient training methods. While zero-shot and one-shot learning are powerful methods for particular problems, they do not in general improve data efficiency. We are also exploring the use of evolutionary algorithms to find a good ANN architecture for a specific problem, which I will exemplify through our recent work on prediction of multiple functions of several proteins at once from the protein sequence alone. To solve this problem, we composed a dataset of 81,267 proteins and 1,169 GO terms of molecular function from Swiss-Prot and used weight-sharing and multi-task learning to create FFANEprot. The architecture of the convolutional neural network (CNN) of FFANEprot was optimised by a Firefly algorithm with natural enemy strategy, which can reduce the probability of being trapped in local optima during the evolutionary process. The training and test accuracies are 98.84% and 98.67%, which is better than all the conventional CNN architectures investigated. If time allows, then I will also show some ongoing work on functional magnetic resonance imaging (fMRI) data.},
author = {Nordling, Torbj{\"{o}}rn E. M.},
booktitle = {MOST-NICT Joint Workshop on AI for ICT in Taipei (Taiwan)},
howpublished = {MOST-NICT Joint Workshop on AI for ICT in Taipei (Taiwan)},
month = {jun},
publisher = {Ministry of Science and Technology in Taiwan},
title = {{Learning from data in machine and man}},
type = {Oral presentation},
year = {2018}
}

• U. Celano, F. Hsia, D. Vanhaeren, K. Paredis, T. E. M. Nordling, J. G. Buijnsters, T. Hantschel, and W. Vandervorst, “Mesoscopic physical removal of material using sliding nano-diamond contacts,” Scientific reports, vol. 8, iss. 1, p. 2994, 2018. doi:10.1038/s41598-018-21171-w

Wear mechanisms including fracture and plastic deformation at the nanoscale are central to understand sliding contacts. Recently, the combination of tip-induced material erosion with the sensing capability of secondary imaging modes of AFM, has enabled a slice-and-view tomographic technique named AFM tomography or Scalpel SPM. However, the elusive laws governing nanoscale wear and the large quantity of atoms involved in the tip-sample contact, require a dedicated mesoscale description to understand and model the tip-induced material removal. Here, we study nanosized sliding contacts made of diamond in the regime whereby thousands of nm3 are removed. We explore the fundamentals of high-pressure tip-induced material removal for various materials. Changes in the load force are systematically combined with AFM and SEM to increase the understanding and the process controllability. The nonlinear variation of the removal rate with the load force is interpreted as a combination of two contact regimes each dominating in a particular force range. By using the gradual transition between the two regimes, (1) the experimental rate of material eroded on each tip passage is modeled, (2) a controllable removal rate below 5 nm/scan for all the materials is demonstrated, thus opening to future development of 3D tomographic AFM.

@article{Celano2018,
abstract = {Wear mechanisms including fracture and plastic deformation at the nanoscale are central to understand sliding contacts. Recently, the combination of tip-induced material erosion with the sensing capability of secondary imaging modes of AFM, has enabled a slice-and-view tomographic technique named AFM tomography or Scalpel SPM. However, the elusive laws governing nanoscale wear and the large quantity of atoms involved in the tip-sample contact, require a dedicated mesoscale description to understand and model the tip-induced material removal. Here, we study nanosized sliding contacts made of diamond in the regime whereby thousands of nm3 are removed. We explore the fundamentals of high-pressure tip-induced material removal for various materials. Changes in the load force are systematically combined with AFM and SEM to increase the understanding and the process controllability. The nonlinear variation of the removal rate with the load force is interpreted as a combination of two contact regimes each dominating in a particular force range. By using the gradual transition between the two regimes, (1) the experimental rate of material eroded on each tip passage is modeled, (2) a controllable removal rate below 5 nm/scan for all the materials is demonstrated, thus opening to future development of 3D tomographic AFM.},
author = {Umberto Celano and Feng-Chun Hsia and Danielle Vanhaeren and Kristof Paredis and Nordling, Torbj{\"{o}}rn E. M. and Josephus G. Buijnsters and Thomas Hantschel and Wilfried Vandervorst},
doi = {10.1038/s41598-018-21171-w},
file = {:Users/tn/Articles/Mendeley_collection/Scientific Reports/Celano et al._2018.pdf:pdf},
issn = {2045-2322},
journal = {Scientific Reports},
keywords = {Humanities and Social Sciences,Science,multidisciplinary},
month = {dec},
number = {1},
pages = {2994},
publisher = {Nature Publishing Group},
title = {{Mesoscopic physical removal of material using sliding nano-diamond contacts}},
url = {http://www.nature.com/articles/s41598-018-21171-w},
volume = {8},
year = {2018}
}

• C. Hsu, W. Yu-Heng, F. Menolascina, and T. E. M. Nordling, “Modelling of the GAL1 Genetic Circuit in Yeast Using Three Equations,” in Ifac-papersonline, Shenyang, China, 2018, p. 185–190. doi:10.1016/j.ifacol.2018.09.297

Synthetic gene circuits can be used to modify and control existing biological processes and thus e.g. increase drug yields. Currently their use is hampered by the, largely, trial and error approach used to design them. Lack of reliable quantitative dynamical models of genetic circuits e.g. prevents the use of well established control design methods. We aim toward creation of a pipeline for automated closed-loop identification of dynamic models of synthetically engineered genetic circuits in microorganisms. As a step towards this aim, we here study modelling of the input-output behaviour of the yGIL337 strain of S. cerevisiae. In this strain expression of the fluorescent reporter can be turned on by growing the yeast in galactose and off by glucose. We perform parameter estimation on a system of three ordinary differential equations of Michaelis-Menten type based on in vivo data from a microfluidic experiment by Fiore et al. (2013) after redoing the data preprocessing. The parameter estimation is done using AMIGO2–a state of the art Matlab toolbox for iterative identification of dynamical models. We show that the goodness-of-fit of our model is comparable to the five models proposed by Fiore et al. and we hypothesise that the system is an adaptive feedback system.

@inproceedings{Hsu2018ADCHEM,
abstract = {Synthetic gene circuits can be used to modify and control existing biological processes and thus e.g. increase drug yields. Currently their use is hampered by the, largely, trial and error approach used to design them. Lack of reliable quantitative dynamical models of genetic circuits e.g. prevents the use of well established control design methods. We aim toward creation of a pipeline for automated closed-loop identification of dynamic models of synthetically engineered genetic circuits in microorganisms. As a step towards this aim, we here study modelling of the input-output behaviour of the yGIL337 strain of S. cerevisiae. In this strain expression of the fluorescent reporter can be turned on by growing the yeast in galactose and off by glucose. We perform parameter estimation on a system of three ordinary differential equations of Michaelis-Menten type based on in vivo data from a microfluidic experiment by Fiore et al. (2013) after redoing the data preprocessing. The parameter estimation is done using AMIGO2–a state of the art Matlab toolbox for iterative identification of dynamical models. We show that the goodness-of-fit of our model is comparable to the five models proposed by Fiore et al. and we hypothesise that the system is an adaptive feedback system.},
author = {Chi-Ching Hsu and Wu Yu-Heng and Filippo Menolascina and Nordling, Torbj{\"{o}}rn E.M.},
booktitle = {IFAC-PapersOnLine},
doi = {10.1016/j.ifacol.2018.09.297},
issn = {24058963},
keywords = {genetic circuit,parameter estimation,synthetic biology,system identification,systems biology},
mendeley-groups = {Unpublished},
month = {jul},
number = {18},
pages = {185--190},
publisher = {International Federation of Automatic Control (IFAC)},
title = {{Modelling of the GAL1 Genetic Circuit in Yeast Using Three Equations}},
volume = {51},
year = {2018}
}

### 2017

• S. Sieberts, R. Henao, S. Fourati, R. Klén, M. Mahmoudian, A. Talla, G. Pandey, T. Nordling, M. Ahsen, R. Almugbel, Z. Aydın, J. Burkhart, S. Jahandideh, X. Liang, M. Shiga, A. Stanescu, R. Vogel, K. Y. Yeung, T. Yu, L. Elo, E. Tsalik, and L. Mangravite, “Respiratory Viral DREAM Challenge: Discovering dynamic molecular signatures in response to viral exposure,” in 10th annual recomb/iscb conference on regulatory and systems genomics, with dream challenges, New York City, New York, 2017.

Acute respiratory infections (ARIs) are among the most common reasons for seeking medical attention in the United States. Prediction of susceptibility or early infection can have can have important impacts in terms of treatment decisions or in the prediction or control of pandemics. Following exposure to respiratory viruses, infection rates are variable among individuals, and some patients who exhibit viral shedding never develop clinical symptoms. We developed and ran a DREAM challenge to predict viral shedding and respiratory symptoms in patients exposed to one of four different respiratory viruses (H1N1, H3N2, rhinovirus, or respiratory syncytial virus (RSV)), based on pre- and early-stage post-exposure, longitudinally sampled blood gene expression profiling up to 24 hours post-e. The challenge attracted participation of 118 individuals organized into 36 teams, who were able to demonstrate signal to predict which subjects will become symptomatic following viral exposure, even prior to exposure. This challenge featured an active community phase featuring participant-lead projects exploring gene pathways that contribute to prediction, heterogeneity in prediction across samples, characteristics of well performing models, reproducibility of submitted code, and the assessment of model overfitting in the absence of an independent test data set.

@inproceedings{Sieberts2017RECOMB,
abstract = {Acute respiratory infections (ARIs) are among the most common reasons for seeking medical attention in the United States. Prediction of susceptibility or early infection can have can have important impacts in terms of treatment decisions or in the prediction or control of pandemics. Following exposure to respiratory viruses, infection rates are variable among individuals, and some patients who exhibit viral shedding never develop clinical symptoms. We developed and ran a DREAM challenge to predict viral shedding and respiratory symptoms in patients exposed to one of four different respiratory viruses (H1N1, H3N2, rhinovirus, or respiratory syncytial virus (RSV)), based on pre- and early-stage post-exposure, longitudinally sampled blood gene expression profiling up to 24 hours post-e. The challenge attracted participation of 118 individuals organized into 36 teams, who were able to demonstrate signal to predict which subjects will become symptomatic following viral exposure, even prior to exposure. This challenge featured an active community phase featuring participant-lead projects exploring gene pathways that contribute to prediction, heterogeneity in prediction across samples, characteristics of well performing models, reproducibility of submitted code, and the assessment of model overfitting in the absence of an independent test data set.},
address = {New York City, New York},
author = {Solveig Sieberts and Ricardo Henao and Slim Fourati and Kl{\'{e}}n, Riku and Mehrad Mahmoudian and Aarthi Talla and Gaurav Pandey and Nordling, Torbj{\"{o}}rn and Mehmet Ahsen and Reem Almugbel and Zafer Aydın and Joshua Burkhart and Samad Jahandideh and Xiao Liang and Motoki Shiga and Ana Stanescu and Robert Vogel and Ka Yee Yeung and Thomas Yu and Laura Elo and Ephraim Tsalik and Lara Mangravite},
booktitle = {10th Annual RECOMB/ISCB Conference on Regulatory and Systems Genomics, with DREAM Challenges},
keywords = {DREAM challenge,Gene expression,Infection susceptibility,Predictive modeling,Respiratory infection,Viral infection},
mendeley-tags = {DREAM challenge,Gene expression,Infection susceptibility,Predictive modeling,Respiratory infection,Viral infection},
month = {nov},
publisher = {International Society for Computational Biology},
title = {{Respiratory Viral DREAM Challenge: Discovering dynamic molecular signatures in response to viral exposure}},
type = {Poster},
url = {https://www.iscb.org/recomb-regsysgen2017},
year = {2017}
}

• C. Azencott, T. Aittokallio, S. Roy, A. Agrawal, T. Aittokallio, C. Azencott, E. Barillot, N. Bessonov, D. Chasman, U. Czerwinska, A. F. Siahpirani, S. Friend, A. Goldenberg, J. Greenberg, M. Huber, S. Kaski, C. Kurz, M. Mailick, M. Merzenich, N. Morozova, A. Movaghar, M. Nahum, T. E. M. Nordling, T. Norman, R. Penner, S. Roy, K. Saha, A. Salim, S. Sorooshyari, V. Soumelis, A. Stark-Inbar, A. Sterling, G. Stolovitzky, S. S. Shiju, J. Tang, A. Tosenberger, T. {Vieet Van}, K. Wennerberg, A. Zinovyev, T. Norman, S. Friend, G. Stolovitzky, and A. Goldenberg, “The inconvenience of data of convenience: computational research beyond post-mortem analyses,” Nature methods, vol. 14, iss. 10, p. 937–938, 2017. doi:10.1038/nmeth.4457
@article{Azencott2017,
annote = {DREAM Idea Challenge Consortium
Ankit Agrawal,
Tero Aittokallio,
Chlo{\'{e}}-Agathe Azencott,
Emmanuel Barillot,
Nikolai Bessonov,
Deborah Chasman,
Urszula Czerwinska,
Alireza Fotuhi Siahpirani,
Stephen Friend,
Anna Goldenberg,
Jan Greenberg,
Manuel Huber,
Christoph Kurz,
Marsha Mailick,
Michael Merzenich,
Arezoo Movaghar,
Mor Nahum,
Torbj{\"{o}}rn E M Nordling,
Thea Norman,
Robert Penner,
Sushmita Roy,
Krishanu Saha,
Asif Salim,
Siamak Sorooshyari,
Vassili Soumelis,
Alit Stark-Inbar,
Audra Sterling,
Gustavo Stolovitzky,
S S Shiju,
Jing Tang,
Alen Tosenberger,
Thomas Vieet Van,
Krister Wennerberg &
Andrey Zinovyev},
author = {Azencott, Chlo{\'{e}}-Agathe and Tero Aittokallio and Sushmita Roy and Ankit Agrawal and Tero Aittokallio and Azencott, Chlo{\'{e}}-Agathe and Emmanuel Barillot and Nikolai Bessonov and Deborah Chasman and Urszula Czerwinska and Alireza Fotuhi Siahpirani and Stephen Friend and Anna Goldenberg and Jan Greenberg and Manuel Huber and Samuel Kaski and Christoph Kurz and Marsha Mailick and Michael Merzenich and Nadya Morozova and Arezoo Movaghar and Mor Nahum and Nordling, Torbj{\"{o}}rn E M and Thea Norman and Robert Penner and Sushmita Roy and Krishanu Saha and Asif Salim and Siamak Sorooshyari and Vassili Soumelis and Alit Stark-Inbar and Audra Sterling and Gustavo Stolovitzky and S S Shiju and Jing Tang and Alen Tosenberger and Thomas {Vieet Van} and Krister Wennerberg and Andrey Zinovyev and Thea Norman and Stephen Friend and Gustavo Stolovitzky and Anna Goldenberg},
doi = {10.1038/nmeth.4457},
issn = {1548-7091},
journal = {Nature Methods},
keywords = {DREAMIdea},
month = {sep},
number = {10},
pages = {937--938},
publisher = {Nature Research},
title = {{The inconvenience of data of convenience: computational research beyond post-mortem analyses}},
url = {http://www.nature.com/doifinder/10.1038/nmeth.4457},
volume = {14},
year = {2017}
}

• D. Morgan, A. Tjärnberg, T. E. M. Nordling, and E. L. L. Sonnhammer, “Nested Bootstrapping for reliable Gene Regulatory Network Inference,” in The 25th conference on intelligent systems for molecular biology and the 16th european conference on computational biology (ismb/eccb 2017), Prague, Czech Republic, 2017.

Common Gene Regulatory Network (GRN) inference methods, such as LASSO, do not provide information about the confidence of inferred links. We address this by extending the bootstrap method, instead overlapping the analysis in iterated runs, and applying it to three inference methods. Details of the shortcomings of L1-regularization methods when operating over sufficiently informative data are known. Here, all of the referenced methods perform sub-optimally in terms of Matthew’s Correlation Coefficient (MCC) for low signal-to-noise ratio (SNR) data matrices, even when the data are informative enough for network inference by other metrics. It is thus important not only to introduce methods which are optimized for analyzing datasets of certain quality, but also to define criteria for determining which method to use to optimize analysis. When considering which gene-gene interactions are true, we seek to differentiate spurious gene-gene interactions from those that truly exist in the system. To this end we use a linear ODE model and the GeneSPIDER package to infer the regulatory network of interactions by relating the effect of single gene perturbations to the expression of the remaining unperturbed set.

@inproceedings{Morgan2017ISMB,
abstract = {Common Gene Regulatory Network (GRN) inference methods, such as LASSO, do not provide information about the confidence of inferred links. We address this by extending the bootstrap method, instead overlapping the analysis in iterated runs, and applying it to three inference methods. Details of the shortcomings of L1-regularization methods when operating over sufficiently informative data are known. Here, all of the referenced methods perform sub-optimally in terms of Matthew's Correlation Coefficient (MCC) for low signal-to-noise ratio (SNR) data matrices, even when the data are informative enough for network inference by other metrics. It is thus important not only to introduce methods which are optimized for analyzing datasets of certain quality, but also to define criteria for determining which method to use to optimize analysis. When considering which gene-gene interactions are true, we seek to differentiate spurious gene-gene interactions from those that truly exist in the system. To this end we use a linear ODE model and the GeneSPIDER package to infer the regulatory network of interactions by relating the effect of single gene perturbations to the expression of the remaining unperturbed set.},
author = {Daniel Morgan and Tj{\"{a}}rnberg, Andreas and Nordling, Torbj{\"{o}}rn E. M. and Erik L. L. Sonnhammer},
booktitle = {The 25th Conference on Intelligent Systems for Molecular Biology and the 16th European Conference on Computational Biology (ISMB/ECCB 2017)},
file = {:Users/tn/Articles/Mendeley_collection/The 25th Conference on Intelligent Systems for Molecular Biology and the 16th European Conference on Computational Biology (ISMBECCB 2017)/Morgan et al._2017.pdf:pdf},
keywords = {Bootstrap},
month = {jul},
publisher = {International Society for Computational Biology},
title = {{Nested Bootstrapping for reliable Gene Regulatory Network Inference}},
type = {Poster},
year = {2017}
}

• R. Magnusson, G. P. Mariotti, M. Köpsen, W. Lövfors, D. R. Gawel, R. Jörnsten, J. Linde, T. E. M. Nordling, E. Nyman, S. Schulze, C. E. Nestor, H. Zhang, G. Cedersund, M. Benson, A. Tjärnberg, and M. Gustafsson, “LASSIM – A network inference toolbox for genome-wide mechanistic modeling,” Plos computational biology, vol. 13, iss. 6, p. e1005608, 2017. doi:10.1371/journal.pcbi.1005608

Recent technological advancements have made time-resolved, quantitative, multi-omics data available for many model systems, which could be integrated for systems pharmacokinetic use. Here, we present large-scale simulation modeling (LASSIM), which is a novel mathematical tool for performing large-scale inference using mechanistically defined ordinary differential equations (ODE) for gene regulatory networks (GRNs). LASSIM integrates structural knowledge about regulatory interactions and non-linear equations with multiple steady state and dynamic response expression datasets. The rationale behind LASSIM is that biological GRNs can be simplified using a limited subset of core genes that are assumed to regulate all other gene transcription events in the network. The LASSIM method is implemented as a general-purpose toolbox using the PyGMO Python package to make the most of multicore computers and high performance clusters, and is available at https://gitlab.com/Gustafsson-lab/lassim. As a method, LASSIM works in two steps, where it first infers a non-linear ODE system of the pre-specified core gene expression. Second, LASSIM in parallel optimizes the parameters that model the regulation of peripheral genes by core system genes. We showed the usefulness of this method by applying LASSIM to infer a large-scale non-linear model of naïve Th2 cell differentiation, made possible by integrating Th2 specific bindings, time-series together with six public and six novel siRNA-mediated knock-down experiments. ChIP-seq showed significant overlap for all tested transcription factors. Next, we performed novel time-series measurements of total T-cells during differentiation towards Th2 and verified that our LASSIM model could monitor those data significantly better than comparable models that used the same Th2 bindings. In summary, the LASSIM toolbox opens the door to a new type of model-based data analysis that combines the strengths of reliable mechanistic models with truly systems-level data. We demonstrate the power of this approach by inferring a mechanistically motivated, genome-wide model of the Th2 transcription regulatory system, which plays an important role in several immune related diseases.

@article{Magnusson2017LASSIM,
abstract = {Recent technological advancements have made time-resolved, quantitative, multi-omics data available for many model systems, which could be integrated for systems pharmacokinetic use. Here, we present large-scale simulation modeling (LASSIM), which is a novel mathematical tool for performing large-scale inference using mechanistically defined ordinary differential equations (ODE) for gene regulatory networks (GRNs). LASSIM integrates structural knowledge about regulatory interactions and non-linear equations with multiple steady state and dynamic response expression datasets. The rationale behind LASSIM is that biological GRNs can be simplified using a limited subset of core genes that are assumed to regulate all other gene transcription events in the network. The LASSIM method is implemented as a general-purpose toolbox using the PyGMO Python package to make the most of multicore computers and high performance clusters, and is available at https://gitlab.com/Gustafsson-lab/lassim. As a method, LASSIM works in two steps, where it first infers a non-linear ODE system of the pre-specified core gene expression. Second, LASSIM in parallel optimizes the parameters that model the regulation of peripheral genes by core system genes. We showed the usefulness of this method by applying LASSIM to infer a large-scale non-linear model of na{\"{i}}ve Th2 cell differentiation, made possible by integrating Th2 specific bindings, time-series together with six public and six novel siRNA-mediated knock-down experiments. ChIP-seq showed significant overlap for all tested transcription factors. Next, we performed novel time-series measurements of total T-cells during differentiation towards Th2 and verified that our LASSIM model could monitor those data significantly better than comparable models that used the same Th2 bindings. In summary, the LASSIM toolbox opens the door to a new type of model-based data analysis that combines the strengths of reliable mechanistic models with truly systems-level data. We demonstrate the power of this approach by inferring a mechanistically motivated, genome-wide model of the Th2 transcription regulatory system, which plays an important role in several immune related diseases.},
author = {Magnusson, Rasmus and Mariotti, Guido Pio and K{\"{o}}psen, Mattias and L{\"{o}}vfors, William and Gawel, Danuta R. and J{\"{o}}rnsten, Rebecka and Linde, J{\"{o}}rg and Nordling, Torbj{\"{o}}rn E. M. and Nyman, Elin and Schulze, Sylvie and Nestor, Colm E. and Zhang, Huan and Cedersund, Gunnar and Benson, Mikael and Tj{\"{a}}rnberg, Andreas and Gustafsson, Mika},
doi = {10.1371/journal.pcbi.1005608},
editor = {Lengauer, Thomas},
issn = {1553-7358},
journal = {PLOS Computational Biology},
month = {jun},
number = {6},
pages = {e1005608},
title = {{LASSIM - A network inference toolbox for genome-wide mechanistic modeling}},
url = {http://dx.plos.org/10.1371/journal.pcbi.1005608},
volume = {13},
year = {2017}
}

• A. Tjärnberg, D. C. Morgan, M. Studham, T. E. M. Nordling, and E. L. L. Sonnhammer, “GeneSPIDER – gene regulatory network inference benchmarking with controlled network and data properties,” Molecular biosystems, vol. 13, iss. 7, p. 1304–1312, 2017. doi:10.1039/C7MB00058H

A key question in network inference, that has not been properly answered, is what accuracy can be expected for a given biological dataset and inference method.

@article{Tjarnberg2017GeneSPIDER,
abstract = {A key question in network inference, that has not been properly answered, is what accuracy can be expected for a given biological dataset and inference method.},
author = {Tj{\"{a}}rnberg, Andreas and Daniel Morgan C. and Matthew Studham and Nordling, Torbj{\"{o}}rn E. M. and Erik L. L. Sonnhammer},
doi = {10.1039/C7MB00058H},
issn = {1742-206X},
journal = {Molecular BioSystems},
keywords = {Benchmarking,Modelling,Network inference,Software,linear systems},
mendeley-tags = {Network inference},
month = {jul},
number = {7},
pages = {1304--1312},
title = {{GeneSPIDER – gene regulatory network inference benchmarking with controlled network and data properties}},
volume = {13},
year = {2017}
}

### 2016

• E. W. Jacobsen and T. E. M. Nordling, “Robust Target Identification for Drug Discovery,” in Ifac-papersonline, 11th ifac symposium on dynamics and control of process systems, including biosystems (dycops-cab 2016), Trondheim, Norway, 2016, p. 815–820. doi:10.1016/j.ifacol.2016.07.290
[BibTeX] [Abstract]

A key step in the development of new pharmaceutical drugs is that of identifying direct targets of the bioactive compounds, and distinguishing these from all other gene products that respond indirectly to the drug targets. Currently dominating approaches to this problem are based on often time consuming and costly experimental methods aimed at locating physical bindings of the corresponding small molecule to proteins or DNA sequences. In this paper we consider target identification based on time-series expression data of the corresponding gene regulatory network, using perturbation with the active compound only. As we show, the problem of identifying the direct targets can then be cast as a linear regression problem and, in principle, be accomplished with a number of samples equal to the number of involved genes and bioactive compounds. However, the regression matrix will typically be highly ill-conditioned and the target identification therefore prone even to small measurement uncertainties. In order to provide a label of confidence for the target identification, we consider conditions that can be used to quantify the robustness of the identification of individual drug targets with respect to uncertainty in the expression data. For this purpose, we cast the uncertain regression problem as a robust rank problem and employ SVD or the structured singular value to compute the robust rank. The proposed method is illustrated by application to a small scale gene regulatory network synthesised in yeast to serve as a benchmark problem in network inference.

@inproceedings{Jacobsen2016DYCOPS,
abstract = {A key step in the development of new pharmaceutical drugs is that of identifying direct targets of the bioactive compounds, and distinguishing these from all other gene products that respond indirectly to the drug targets. Currently dominating approaches to this problem are based on often time consuming and costly experimental methods aimed at locating physical bindings of the corresponding small molecule to proteins or DNA sequences. In this paper we consider target identification based on time-series expression data of the corresponding gene regulatory network, using perturbation with the active compound only. As we show, the problem of identifying the direct targets can then be cast as a linear regression problem and, in principle, be accomplished with a number of samples equal to the number of involved genes and bioactive compounds. However, the regression matrix will typically be highly ill-conditioned and the target identification therefore prone even to small measurement uncertainties. In order to provide a label of confidence for the target identification, we consider conditions that can be used to quantify the robustness of the identification of individual drug targets with respect to uncertainty in the expression data. For this purpose, we cast the uncertain regression problem as a robust rank problem and employ SVD or the structured singular value to compute the robust rank. The proposed method is illustrated by application to a small scale gene regulatory network synthesised in yeast to serve as a benchmark problem in network inference.},
author = {Elling W. Jacobsen and Nordling, Torbj{\"{o}}rn E. M.},
booktitle = {IFAC-PapersOnLine, 11th IFAC Symposium on Dynamics and Control of Process Systems, including Biosystems (DYCOPS-CAB 2016)},
doi = {10.1016/j.ifacol.2016.07.290},
file = {:Users/tn/Articles/Mendeley_collection/IFAC-PapersOnLine, 11th IFAC Symposium on Dynamics and Control of Process Systems, including Biosystems (DYCOPS-CAB 2016)/Jacobsen, Nordling_2016.pdf:pdf},
keywords = {Robust network inference,drug discovery,gene regulatory networks,network inference,regression,robust,systems biology,systems medicine,target identification},
mendeley-tags = {Robust network inference},
month = {jun},
number = {7},
pages = {815--820},
publisher = {The International Federation of Automatic Control},
title = {{Robust Target Identification for Drug Discovery}},
volume = {49},
year = {2016}
}

• T. E. M. Nordling, Robust Network Inference–Testing of all possible influence hypotheses during model selectionNational Cheng Kung University, Tainan, Taiwan (R.O.C.): The 6th east asia mechanical and aerospace engineering workshop in tainan, taiwan, 2016.

The main objective of network inference is to identify the structure, i.e. topology, of a network based on observed changes in state variables representing the nodes. For example, to infer genetic influences among genes based on measured expression changes in so called gene regulatory networks one need to test all possible influence hypotheses. Each influence hypothesis corresponds to a possible link in the network, which typically is represented by a model parameter. Network inference is thus model selection. In Robust Network Inference (RNI) the hypothesis—the link/parameter is not needed to explain the data—is tested for every possible link/parameter, based on the assumed uncertainty in the observed inputs and outputs of the system. RNI reveals the links that exist, i.e. must be present in order to explain the data within the chosen class of models. These links exist in reality, i.e. are true positives, assuming that the real system can be approximated using the chosen class of models and the real measurement error is smaller than the assumed uncertainty. Contrary to methods that infer the most likely model, RNI accounts for all possible models. For each possible link/parameter, Nordling’s confidence score is calculated in RNI. Contrary to the marginal tests used so far, this score is a reliable significance measure for existence of the link, as well as a relative measure of the distance to the closest model lacking the link. By rejecting the hypothesis, e.g., the existence of a genetic influence or the importance of a parameter is proven using RNI.

@misc{Nordling20166thEast,
abstract = {The main objective of network inference is to identify the structure, i.e. topology, of a network based on observed changes in state variables representing the nodes. For example, to infer genetic influences among genes based on measured expression changes in so called gene regulatory networks one need to test all possible influence hypotheses. Each influence hypothesis corresponds to a possible link in the network, which typically is represented by a model parameter. Network inference is thus model selection. In Robust Network Inference (RNI) the hypothesis—the link/parameter is not needed to explain the data—is tested for every possible link/parameter, based on the assumed uncertainty in the observed inputs and outputs of the system. RNI reveals the links that exist, i.e. must be present in order to explain the data within the chosen class of models. These links exist in reality, i.e. are true positives, assuming that the real system can be approximated using the chosen class of models and the real measurement error is smaller than the assumed uncertainty. Contrary to methods that infer the most likely model, RNI accounts for all possible models. For each possible link/parameter, Nordling's confidence score is calculated in RNI. Contrary to the marginal tests used so far, this score is a reliable significance measure for existence of the link, as well as a relative measure of the distance to the closest model lacking the link. By rejecting the hypothesis, e.g., the existence of a genetic influence or the importance of a parameter is proven using RNI.},
address = {National Cheng Kung University, Tainan, Taiwan (R.O.C.)},
author = {Nordling, Torbj{\"{o}}rn E M},
booktitle = {The 6th East Asia Mechanical and Aerospace Engineering Workshop},
file = {:Users/tn/Articles/Mendeley_collection/The 6th East Asia Mechanical and Aerospace Engineering Workshop/Nordling_2016.pdf:pdf},
howpublished = {The 6th East Asia Mechanical and Aerospace Engineering Workshop},
keywords = {Gene regulatory networks,Hypothesis testing,Nordling's confidence score,Reverse engineering,Robust network inference},
month = {jun},
publisher = {The 6th East Asia Mechanical and Aerospace Engineering Workshop in Tainan, Taiwan},
title = {{Robust Network Inference--Testing of all possible influence hypotheses during model selection}},
type = {Oral presentation},
url = {http://news-en.secr.ncku.edu.tw/files/14-1083-155190,r614-1.php?Lang=en},
year = {2016}
}

• N. Padhan, T. E. M. Nordling, M. Sundström, P. Akerud, H. Birgisson, P. Nygren, S. Nelander, and L. Claesson-Welsh, “High sensitivity isoelectric focusing to establish a signaling biomarker for the diagnosis of human colorectal cancer,” Bmc cancer, vol. 16, iss. 683, p. 1–14, 2016. doi:10.1186/s12885-016-2725-z
[BibTeX] [Abstract]

Background: The progression of colorectal cancer (CRC) involves recurrent amplifications/mutations in the epidermal growth factor receptor (EGFR) and downstream signal transducers of the Ras pathway, KRAS and BRAF. Whether genetic events predicted to result in increased and constitutive signaling indeed lead to enhanced biological activity is often unclear and, due to technical challenges, unexplored. Here, we investigated proliferative signaling in CRC using a highly sensitive method for protein detection. The aim of the study was to determine whether multiple changes in proliferative signaling in CRC could be combined and exploited as a “complex biomarker” for diagnostic purposes. Methods: We used robotized capillary isoelectric focusing as well as conventional immunoblotting for the comprehensive analysis of epidermal growth factor receptor signaling pathways converging on extracellular regulated kinase 1/2 (ERK1/2), AKT, phospholipase C$\gamma$1 (PLC$\gamma$1) and c-SRC in normal mucosa compared with CRC stage II and IV. Computational analyses were used to test different activity patterns for the analyzed signal transducers. Results: Signaling pathways implicated in cell proliferation were differently dysregulated in CRC and, unexpectedly, several were downregulated in disease. Thus, levels of activated ERK1 (pERK1), but not pERK2, decreased in stage II and IV while total ERK1/2 expression remained unaffected. In addition, c-SRC expression was lower in CRC compared with normal tissues and phosphorylation on the activating residue Y418 was not detected. In contrast, PLC$\gamma$1 and AKT expression levels were elevated in disease. Immunoblotting of the different signal transducers, run in parallel to capillary isoelectric focusing, showed higher variability and lower sensitivity and resolution. Computational analyses showed that, while individual signaling changes lacked predictive power, using the combination of changes in three signaling components to create a “complex biomarker” allowed with very high accuracy, the correct diagnosis of tissues as either normal or cancerous. Conclusions: We present techniques that allow rapid and sensitive determination of cancer signaling that can be used to differentiate colorectal cancer from normal tissue.

@article{Padhan2016CRC,
abstract = {Background: The progression of colorectal cancer (CRC) involves recurrent amplifications/mutations in the epidermal growth factor receptor (EGFR) and downstream signal transducers of the Ras pathway, KRAS and BRAF. Whether genetic events predicted to result in increased and constitutive signaling indeed lead to enhanced biological activity is often unclear and, due to technical challenges, unexplored. Here, we investigated proliferative signaling in CRC using a highly sensitive method for protein detection. The aim of the study was to determine whether multiple changes in proliferative signaling in CRC could be combined and exploited as a “complex biomarker” for diagnostic purposes. Methods: We used robotized capillary isoelectric focusing as well as conventional immunoblotting for the comprehensive analysis of epidermal growth factor receptor signaling pathways converging on extracellular regulated kinase 1/2 (ERK1/2), AKT, phospholipase C$\gamma$1 (PLC$\gamma$1) and c-SRC in normal mucosa compared with CRC stage II and IV. Computational analyses were used to test different activity patterns for the analyzed signal transducers. Results: Signaling pathways implicated in cell proliferation were differently dysregulated in CRC and, unexpectedly, several were downregulated in disease. Thus, levels of activated ERK1 (pERK1), but not pERK2, decreased in stage II and IV while total ERK1/2 expression remained unaffected. In addition, c-SRC expression was lower in CRC compared with normal tissues and phosphorylation on the activating residue Y418 was not detected. In contrast, PLC$\gamma$1 and AKT expression levels were elevated in disease. Immunoblotting of the different signal transducers, run in parallel to capillary isoelectric focusing, showed higher variability and lower sensitivity and resolution. Computational analyses showed that, while individual signaling changes lacked predictive power, using the combination of changes in three signaling components to create a “complex biomarker” allowed with very high accuracy, the correct diagnosis of tissues as either normal or cancerous. Conclusions: We present techniques that allow rapid and sensitive determination of cancer signaling that can be used to differentiate colorectal cancer from normal tissue.},
author = {Narendra Padhan and Nordling, Torbj{\"{o}}rn E M and Sundstr{\"{o}}m, Magnus and Akerud, Peter and Birgisson, Helgi and Nygren, Peter and Sven Nelander and Lena Claesson-Welsh},
doi = {10.1186/s12885-016-2725-z},
journal = {BMC Cancer},
keywords = {Journal},
mendeley-tags = {Journal},
month = {aug},
number = {683},
pages = {1--14},
title = {{High sensitivity isoelectric focusing to establish a signaling biomarker for the diagnosis of human colorectal cancer}},
volume = {16},
year = {2016}
}

### 2015

• S. Baskaran, P. Johansson, C. Hansson, T. Nordling, L. Elfineh, U. Martens, M. Häggblad, B. Westermark, L. Uhrbom, K. F. Nilsson, B. Lundgren, C. Krona, and S. Nelander, Systematic identification of gene targets in a biobank of patient derived glioblastoma-initiating cellsCampus Berlin-Buch, Max Delbrück Communications Center (MDC.C), Robert-Rössle-Str. 10, D-13125 Berlin, Germany: Brain tumor 2015, 2015.

Glioblastoma, a devastating cancer type with dismal patient prog- nosis needs exploration of new therapeutic targets to complement the existing treatment strategies. Here, we report large-scale gene knockdown (KD) using a tailored short interfering RNA library to identify essential gene targets across a panel of GBM cells from the Human Glioma Cell Cultures (HGCC) biobank. The HGCC material represents functionally validated and characterized tumor initiating cells consisting of all molecular subtypes isolated from grade IV astrocytoma patients. In a primary screen, we individually knocked down 1200 genes across six HGCCs and measured cell viability after 72hours. Of the 1200 genes studied, 30 candidate genes that produced at-least 25\% reduction in cell viability were chosen for further analysis. The targets, some of which are druggable, fall into three major functional classes: cell cycle regulation, DNA repair, and protein degradation. To define biomarkers of vulnerability, we are currently performing a secondary screen on the identified 30 genes across a broader panel of well-characterized HGCC lines and a control human astrocytic cell line. In the extended screen, the viability assay is complemented with cytoskeleton staining to record the phenotypic change of the cells induced by target knockdown. The candidate genes validated by the secondary screen will be further functionally studied using both in vitro studies of glioma initiating cells and in vivo modeling using zebrafish to identify their possible role as a therapeutic target.

@misc{Baskaran2015Berlin,
abstract = {Glioblastoma, a devastating cancer type with dismal patient prog- nosis needs exploration of new therapeutic targets to complement the existing treatment strategies. Here, we report large-scale gene knockdown (KD) using a tailored short interfering RNA library to identify essential gene targets across a panel of GBM cells from the Human Glioma Cell Cultures (HGCC) biobank. The HGCC material represents functionally validated and characterized tumor initiating cells consisting of all molecular subtypes isolated from grade IV astrocytoma patients. In a primary screen, we individually knocked down 1200 genes across six HGCCs and measured cell viability after 72hours. Of the 1200 genes studied, 30 candidate genes that produced at-least 25\% reduction in cell viability were chosen for further analysis. The targets, some of which are druggable, fall into three major functional classes: cell cycle regulation, DNA repair, and protein degradation. To define biomarkers of vulnerability, we are currently performing a secondary screen on the identified 30 genes across a broader panel of well-characterized HGCC lines and a control human astrocytic cell line. In the extended screen, the viability assay is complemented with cytoskeleton staining to record the phenotypic change of the cells induced by target knockdown. The candidate genes validated by the secondary screen will be further functionally studied using both in vitro studies of glioma initiating cells and in vivo modeling using zebrafish to identify their possible role as a therapeutic target.},
address = {Campus Berlin-Buch, Max Delbrück Communications Center (MDC.C), Robert-Rössle-Str. 10, D-13125 Berlin, Germany},
author = {Baskaran, Sathishkumar and Johansson, Patrik and Hansson, Caroline and Nordling, Torbj{\"{o}}rn and Elfineh, Ludmila and Martens, Ulf and H{\"{a}}ggblad, Maria and Westermark, Bengt and Uhrbom, Lene and Nilsson, Karin Forsberg and Lundgren, Bo and Krona, Cecilia and Sven Nelander},
booktitle = {Brain tumor 2015},
howpublished = {Brain tumor 2015 in Berlin (Germany)},
month = {may},
publisher = {Brain tumor 2015},
title = {{Systematic identification of gene targets in a biobank of patient derived glioblastoma-initiating cells}},
type = {Poster},
url = {http://www.braintumor-berlin.de/sites/braintumor-berlin.de/files/images/Program 2015s.pdf},
year = {2015}
}

• T. E. M. Nordling, e-Science in Cancer Research: Identification of Biomarkers and Signatures in Protein DataArlanda, Sweden: Swedish e-science academy 2015, 2015.

The correct diagnosis of cancer patients conventionally depends on the pathologist’s experience and ability to distinguish cancer tissue from normal tissue under a microscope. Advances in technology for measuring the abundance of, e.g., proteins and mRNAs in tissue samples make it interesting to search for an optimal subset of these for classification of samples as cancer or normal. This search for an optimal subset of molecules is in Statistics and Machine learning known as variable selection, features selection, and subset selection. It is typically computationally intensive and biomarker discovery benefit from an e-Science approach. In this talk, I give a brief introduction to biomarker discovery in cancer research. I discuss issues of identification of biomarkers that provide distinct signatures for prediction of tissues as cancer or normal, exemplified by a recent study of cancer signalling signatures in human colon cancer. More precisely, I discuss ranking of individual features versus combinations of features, model over-fitting, and confidence evaluation. I show that the optimal subset for separation of cancer tissues from normal tissues does not contain any of the proteins in the top quintile in terms of significant difference between the groups according to Mann-Whitney U-test or correlation to the diagnosis. I also demonstrate how Monte Carlo simulations of the separation with random class assignment can be used to calculate p-values for observing any specific separation by chance and selection of the optimal number of proteins in the subset based on these p-values. Both selection of the optimal number of biomarkers and calculation of p-values corrected for multiple hypothesis testing are essential to obtain a subset of biomarkers that yield robust predictions for clinical use.

@misc{Nordling2015eSSence,
abstract = {The correct diagnosis of cancer patients conventionally depends on the pathologist's experience and ability to distinguish cancer tissue from normal tissue under a microscope. Advances in technology for measuring the abundance of, e.g., proteins and mRNAs in tissue samples make it interesting to search for an optimal subset of these for classification of samples as cancer or normal. This search for an optimal subset of molecules is in Statistics and Machine learning known as variable selection, features selection, and subset selection. It is typically computationally intensive and biomarker discovery benefit from an e-Science approach. In this talk, I give a brief introduction to biomarker discovery in cancer research. I discuss issues of identification of biomarkers that provide distinct signatures for prediction of tissues as cancer or normal, exemplified by a recent study of cancer signalling signatures in human colon cancer. More precisely, I discuss ranking of individual features versus combinations of features, model over-fitting, and confidence evaluation. I show that the optimal subset for separation of cancer tissues from normal tissues does not contain any of the proteins in the top quintile in terms of significant difference between the groups according to Mann-Whitney U-test or correlation to the diagnosis. I also demonstrate how Monte Carlo simulations of the separation with random class assignment can be used to calculate p-values for observing any specific separation by chance and selection of the optimal number of proteins in the subset based on these p-values. Both selection of the optimal number of biomarkers and calculation of p-values corrected for multiple hypothesis testing are essential to obtain a subset of biomarkers that yield robust predictions for clinical use.},
author = {Nordling, Torbj{\"{o}}rn E M},
booktitle = {Swedish e-science academy 2015},
howpublished = {Swedish e-science academy 2015 in Arlanda (Sweden)},
month = {oct},
publisher = {Swedish e-science academy 2015},
title = {{e-Science in Cancer Research: Identification of Biomarkers and Signatures in Protein Data}},
type = {Oral presentation},
year = {2015}
}

• T. E. M. Nordling, Biomarker discovery–Issues in ranking, model selection, and p-value calculationBarry Lam Hall, College of EECS, National Taiwan University, Taipei, Republic of China (Taiwan): The 3rd eita-bio conference 2015, 2015.

The correct diagnosis of cancer patients conventionally depends on the pathologist’s experience and ability to distinguish cancer tissue from normal tissue under a microscope. Advances in technology for measuring the abundance of, e.g., proteins and mRNAs in tissue samples make it interesting to search for an optimal subset of these for classification of samples as cancer or normal. This search for an optimal subset of molecules is in Statistics and Machine learning known as variable selection, features selection, and subset selection. It is still considered an unsolved problem and heuristic search procedures are used when an exhaustive search is computationally unfeasible.In this talk, I give a brief introduction to biomarker discovery in cancer research. I discuss issues of identification of biomarkers that provide distinct signatures for prediction of tissues as cancer or normal, exemplified by a recent study of cancer signaling signatures in human colon cancer. More precisely, I discuss ranking of individual features versus combinations of features, model over-fitting, and confidence evaluation. I show that the optimal subset for separation of cancer tissues from normal tissues does not contain any of the proteins in the top quintile in terms of significant difference between the groups according to Mann-Whitney U- test or correlation to the diagnosis. I also demonstrate how Monte Carlo simulations of the separation with random class assignment can be used to calculate p-values for observing any specific separation by chance and selection of the optimal number of proteins in the subset based on these p-values. Both selection of the optimal number of biomarkers and calculation of p-values corrected for multiple hypothesis testing are essential to obtain a subset of biomarkers that yield robust predictions for clinical use.

@misc{Nordling2015EITA,
abstract = {The correct diagnosis of cancer patients conventionally depends on the pathologist's experience and ability to distinguish cancer tissue from normal tissue under a microscope. Advances in technology for measuring the abundance of, e.g., proteins and mRNAs in tissue samples make it interesting to search for an optimal subset of these for classification of samples as cancer or normal. This search for an optimal subset of molecules is in Statistics and Machine learning known as variable selection, features selection, and subset selection. It is still considered an unsolved problem and heuristic search procedures are used when an exhaustive search is computationally unfeasible.In this talk, I give a brief introduction to biomarker discovery in cancer research. I discuss issues of identification of biomarkers that provide distinct signatures for prediction of tissues as cancer or normal, exemplified by a recent study of cancer signaling signatures in human colon cancer. More precisely, I discuss ranking of individual features versus combinations of features, model over-fitting, and confidence evaluation. I show that the optimal subset for separation of cancer tissues from normal tissues does not contain any of the proteins in the top quintile in terms of significant difference between the groups according to Mann-Whitney U- test or correlation to the diagnosis. I also demonstrate how Monte Carlo simulations of the separation with random class assignment can be used to calculate p-values for observing any specific separation by chance and selection of the optimal number of proteins in the subset based on these p-values. Both selection of the optimal number of biomarkers and calculation of p-values corrected for multiple hypothesis testing are essential to obtain a subset of biomarkers that yield robust predictions for clinical use.},
address = {Barry Lam Hall, College of EECS, National Taiwan University, Taipei, Republic of China (Taiwan)},
author = {Nordling, Torbj{\"{o}}rn E M},
booktitle = {The 3rd EITA-Bio Conference 2015},
howpublished = {The 3rd EITA-Bio Conference 2015 in Taipei (Taiwan)},
month = {oct},
publisher = {The 3rd EITA-Bio Conference 2015},
title = {{Biomarker discovery--Issues in ranking, model selection, and p-value calculation}},
type = {Oral presentation},
url = {http://www.eitc.org/eita/eita-venture-community/eita-venture-forum/year-2015/eita-bio-2015 http://www.eitc.org/eita/eita-venture-community/eita-venture-forum/year-2015/eita-bio-2015/conference-proceedings-eita-bio-2015-at-ntu-pdf},
year = {2015}
}

• S. Baskaran, P. Johansson, C. Hansson, T. Nordling, L. Elfineh, U. Martens, M. Häggblad, B. Westermark, L. Uhrbom, K. F. Nilsson, B. Lundgren, C. Krona, and S. Nelander, Systematic identification of gene targets in a biobank of patient derived glioblastoma-initiating cellsOmni Shoreham Hotel, Washington DC, U.S.A.: Aacr special conference: advances in brain cancer research, 2015.

Glioblastoma Multiforme (GBM) is a highly heterogeneous and devastating type of brain tumor. So it is important to identify novel gene or therapeutic targets essential for tumor cells livelihood to develop new small molecules against them, which will results in improved treatment strategies. Here, we report our method and data from large-scale gene knockdown (KD) experiment using a tailored short interfering RNA library to pick out essential gene targets across a panel of GBM cells from the Human Glioma Cell Cultures (HGCC) biobank.

@misc{Baskaran2015AACR,
abstract = {Glioblastoma Multiforme (GBM) is a highly heterogeneous and devastating type of brain tumor. So it is important to identify novel gene or therapeutic targets essential for tumor cells livelihood to develop new small molecules against them, which will results in improved treatment strategies. Here, we report our method and data from large-scale gene knockdown (KD) experiment using a tailored short interfering RNA library to pick out essential gene targets across a panel of GBM cells from the Human Glioma Cell Cultures (HGCC) biobank.},
address = {Omni Shoreham Hotel, Washington DC, U.S.A.},
author = {Baskaran, Sathishkumar and Johansson, Patrik and Hansson, Caroline and Nordling, Torbj{\"{o}}rn and Elfineh, Ludmila and Martens, Ulf and H{\"{a}}ggblad, Maria and Westermark, Bengt and Uhrbom, Lene and Nilsson, Karin Forsberg and Lundgren, Bo and Krona, Cecilia and Sven Nelander},
booktitle = {AACR Special Conference: Advances in Brain Cancer Research},
howpublished = {AACR Special Conference: Advances in Brain Cancer Research in Omni Shoreham Hotel, Washington DC (U.S.A.)},
month = {may},
publisher = {AACR Special Conference: Advances in Brain Cancer Research},
title = {{Systematic identification of gene targets in a biobank of patient derived glioblastoma-initiating cells}},
type = {Poster},
url = {http://www.aacr.org/Documents/Brain15_PosterSessions.pdf},
year = {2015}
}

• T. E. M. Nordling, N. Padhan, S. Nelander, and L. Claesson-Welsh, “Identification of biomarkers and signatures in protein data,” in Proceedings of the 2015 ieee 11th international conference on escience (escience 2015), Munich, Germany, 2015, p. 411–419. doi:10.1109/eScience.2015.46

The correct diagnosis of cancer patients conventionally depends on the pathologist’s experience and ability to distinguish cancer tissue from normal tissue under a microscope. Advances in technology for measuring the abundance of, e.g., proteins and mRNAs in tissue samples make it interesting to search for an optimal subset of these for classification of samples as cancer or normal. We discuss issues of identification of biomarkers that provide distinct signatures for prediction of tissues as cancer or normal, exemplified by our recent study of cancer signalling signatures in human colon cancer characterised with regards to protein abundance using high sensitivity isoelectric focusing. We show that the optimal subset for separation of cancer tissues from normal tissues does not contain any of the proteins in the top quintile in terms of significant difference between the groups according to Mann-Whitney U-test or correlation to the diagnosis. Actually, one of the proteins belongs to the tertile with the lowest significance and correlation. This highlights the weakness of the practice of only looking for significant differences in the abundance of individual proteins and raises the question of how many lifesaving discoveries that have been missed due to it. We also demonstrate how Monte Carlo simulations of the separation with random class assignment can be used to calculate p-values for observing any specific separation by chance and selection of the optimal number of proteins in the subset based on these p-values. Both selection of the optimal number of biomarkers and calculation of p-values corrected for multiple hypothesis testing are essential to obtain a subset of biomarkers that yield robust predictions for clinical use.

@inproceedings{Nordling2015eScience,
abstract = {The correct diagnosis of cancer patients conventionally depends on the pathologist's experience and ability to distinguish cancer tissue from normal tissue under a microscope. Advances in technology for measuring the abundance of, e.g., proteins and mRNAs in tissue samples make it interesting to search for an optimal subset of these for classification of samples as cancer or normal. We discuss issues of identification of biomarkers that provide distinct signatures for prediction of tissues as cancer or normal, exemplified by our recent study of cancer signalling signatures in human colon cancer characterised with regards to protein abundance using high sensitivity isoelectric focusing. We show that the optimal subset for separation of cancer tissues from normal tissues does not contain any of the proteins in the top quintile in terms of significant difference between the groups according to Mann-Whitney U-test or correlation to the diagnosis. Actually, one of the proteins belongs to the tertile with the lowest significance and correlation. This highlights the weakness of the practice of only looking for significant differences in the abundance of individual proteins and raises the question of how many lifesaving discoveries that have been missed due to it. We also demonstrate how Monte Carlo simulations of the separation with random class assignment can be used to calculate p-values for observing any specific separation by chance and selection of the optimal number of proteins in the subset based on these p-values. Both selection of the optimal number of biomarkers and calculation of p-values corrected for multiple hypothesis testing are essential to obtain a subset of biomarkers that yield robust predictions for clinical use.},
author = {Nordling, Torbj{\"{o}}rn E. M. and Narendra Padhan and Sven Nelander and Lena Claesson-Welsh},
booktitle = {Proceedings of the 2015 IEEE 11th International Conference on eScience (eScience 2015)},
doi = {10.1109/eScience.2015.46},
editor = {O'Conner, Lisa},
file = {:Users/tn/Articles/Mendeley_collection/Proceedings of the 2015 IEEE 11th International Conference on eScience (eScience 2015)/Nordling et al._2015.pdf:pdf},
keywords = {Biomarkers,Cancer,Colon cancer,Feature selection,Mann-Whitney U test,Monte Carlo simulations,Protein abundance,Spearman's rank correlation,Student's t-test,Subset selection,Variable selection,p-values},
month = {aug},
pages = {411--419},
publisher = {IEEE Computer Society},
title = {{Identification of biomarkers and signatures in protein data}},
year = {2015}
}

• A. Tjärnberg, T. E. M. Nordling, M. Studham, S. Nelander, and E. L. Sonnhammer, “Avoiding pitfalls in L1-regularised inference of gene networks,” Molecular biosystems, vol. 11, iss. 1, p. 287–296, 2015. doi:10.1039/C4MB00419A

Statistical regularisation methods such as Lasso and related L1 regularised regression methods are commonly used to construct models of gene regulatory networks. Although they theoretically can infer the correct network structure, they have been shown in practice to make errors, i.e. leave out existing links and include non-existing links. We show that L1 regularisation methods typically produce a poor network model when the analysed data is ill-conditioned, i.e. the data matrix has a high condition number, even if it contains enough information for correct network inference. However, the correct structure of network models can be obtained for informative data when these methods fail, by using least-squares regression and setting small parameters to zero, or by using robust network inference, a recent method taking the intersection of all non-rejectable models. As available experimental data sets are generally ill-conditioned, we recommend all users to check the condition number of the data matrix to avoid this pitfall of L1 regularised inference, and to also consider alternative methods.

@article{Tjarnberg2014,
abstract = {Statistical regularisation methods such as Lasso and related L1 regularised regression methods are commonly used to construct models of gene regulatory networks. Although they theoretically can infer the correct network structure, they have been shown in practice to make errors, i.e. leave out existing links and include non-existing links. We show that L1 regularisation methods typically produce a poor network model when the analysed data is ill-conditioned, i.e. the data matrix has a high condition number, even if it contains enough information for correct network inference. However, the correct structure of network models can be obtained for informative data when these methods fail, by using least-squares regression and setting small parameters to zero, or by using robust network inference, a recent method taking the intersection of all non-rejectable models. As available experimental data sets are generally ill-conditioned, we recommend all users to check the condition number of the data matrix to avoid this pitfall of L1 regularised inference, and to also consider alternative methods.},
author = {Tj{\"{a}}rnberg, Andreas and Nordling, Torbj{\"{o}}rn E M and Matthew Studham and Sven Nelander and Erik LL Sonnhammer},
doi = {10.1039/C4MB00419A},
file = {:Users/tn/Articles/Mendeley_collection/Molecular BioSystems/Tj{\"{a}}rnberg et al._2015.pdf:pdf},
journal = {Molecular BioSystems},
keywords = {Corresponding author,Gene Regulatory Network Inference,Journal,LASSO,Model Selection,Network inference,Objective function,Regularization,condition number,ill-conditioned,system identification},
mendeley-tags = {Corresponding author,Journal,Network inference},
month = {jan},
number = {1},
pages = {287--296},
title = {{Avoiding pitfalls in L1-regularised inference of gene networks}},
url = {http://dx.doi.org/10.1039/c4mb00419a},
volume = {11},
year = {2015}
}

### 2014

• T. E. M. Nordling, A. Tjärnberg, M. Studham, E. L. L. Sonnhammer, and N. Sven, When and why does L1-regularisation fail in inference of gene networks?Ume{å}, Sweden: Essence academy 2014, 2014.

L1 regularisation methods–Lasso and Elastic Net–typically perform poorly in GRN inference when using data as ill-conditioned as typical experimental data. For both well-conditioned and ill-conditioned data, we found an SNR, of 10 to be sufficient for LSCO and RNI to achieve maximum accuracy close to one. For data with a SNR below one the accuracy of all methods was in general low.

@misc{Nordling2014eSSence,
abstract = {L1 regularisation methods--Lasso and Elastic Net--typically perform poorly in GRN inference when using data as ill-conditioned as typical experimental data. For both well-conditioned and ill-conditioned data, we found an SNR, of 10 to be sufficient for LSCO and RNI to achieve maximum accuracy close to one. For data with a SNR below one the accuracy of all methods was in general low.},
author = {Nordling, Torbj{\"{o}}rn E M and Tj{\"{a}}rnberg, Andreas and Matthew Studham and Erik L L Sonnhammer and Sven, Nelander},
howpublished = {eSSENCE Academy 2014 in Ume{\aa} (Sweden)},
month = {oct},
title = {{When and why does L1-regularisation fail in inference of gene networks?}},
type = {Poster},
year = {2014}
}

• T. E. M. Nordling, Why and what you need to know about Robust Network InferenceReykjavik, Iceland: North atlantic network summit 2014, 2014.

@misc{Nordling2014NNNS,
author = {Nordling, Torbj{\"{o}}rn E M},
booktitle = {North Atlantic Network Summit 2014},
file = {:Users/tn/Articles/Mendeley_collection/North Atlantic Network Summit 2014/Nordling_2014.pdf:pdf},
howpublished = {North Atlantic Network Summit 2014 in Reykjavik (Iceland)},
month = {apr},
publisher = {North Atlantic Network Summit 2014},
title = {{Why and what you need to know about Robust Network Inference}},
type = {Oral presentation},
url = {http://www.nnns2014.org/program},
year = {2014}
}

• M. E. Studham, A. Tjärnberg, T. E. M. Nordling, S. Nelander, and E. L. L. Sonnhammer, “Functional association networks as priors for gene regulatory network inference.,” Bioinformatics (oxford, england), vol. 30, iss. 12, p. i130–8, 2014. doi:10.1093/bioinformatics/btu285

MOTIVATION: Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data are inadequate for reliable inference of the network, informative priors have been shown to improve the accuracy of inferences. RESULTS: This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic datasets indicates that even noisy priors reflect some causal information that can improve GRN inference accuracy. Our analysis on yeast data indicates that using the functional association databases FunCoup and STRING as priors can give a small improvement in GRN inference accuracy with biological data.

@article{Studham2014,
abstract = {MOTIVATION: Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data are inadequate for reliable inference of the network, informative priors have been shown to improve the accuracy of inferences. RESULTS: This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic datasets indicates that even noisy priors reflect some causal information that can improve GRN inference accuracy. Our analysis on yeast data indicates that using the functional association databases FunCoup and STRING as priors can give a small improvement in GRN inference accuracy with biological data.},
author = {Matthew Studham E and Tj{\"{a}}rnberg, Andreas and Nordling, Torbj{\"{o}}rn E M and Sven Nelander and Erik L L Sonnhammer},
doi = {10.1093/bioinformatics/btu285},
file = {:Users/tn/Articles/Mendeley_collection/Bioinformatics (Oxford, England)/Studham et al._2014.pdf:pdf},
issn = {1367-4811},
journal = {Bioinformatics (Oxford, England)},
keywords = {Journal,Network priors,Systems Biology},
mendeley-tags = {Journal,Network priors,Systems Biology},
month = {jun},
number = {12},
pages = {i130--8},
pmid = {24931976},
title = {{Functional association networks as priors for gene regulatory network inference.}},
url = {http://bioinformatics.oxfordjournals.org/content/30/12/i130.abstract},
volume = {30},
year = {2014}
}

• M. Studham, A. Tjärnberg, T. E. M. Nordling, S. Nelander, and E. L. L. Sonnhammer, “Functional Association Networks as Priors for Gene Regulatory Network Inference,” in Proceedings of the 22nd annual international conference on intelligent systems for molecular biology, Boston, USA, 2014.

Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data is inadequate to fully explain the network, informative priors have been shown to improve the accuracy of inferences. This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic data sets indicates that if the prior networks have enough causal information then they can improve GRN inference accuracy, and if not then accuracy may decrease. This opens the door to the possibility that functional association databases can be used as priors to make GRN inference more reliable.

@inproceedings{Studham2014ISCB,
abstract = {Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data is inadequate to fully explain the network, informative priors have been shown to improve the accuracy of inferences. This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic data sets indicates that if the prior networks have enough causal information then they can improve GRN inference accuracy, and if not then accuracy may decrease. This opens the door to the possibility that functional association databases can be used as priors to make GRN inference more reliable.},
author = {Matthew Studham and Tj{\"{a}}rnberg, Andreas and Nordling, Torbj{\"{o}}rn E M and Sven Nelander and Erik L L Sonnhammer},
booktitle = {Proceedings of the 22nd Annual International Conference on Intelligent Systems for Molecular Biology},
keywords = {Convex optimization,Gene regulatory networks,Informative prior,Network inference},
month = {jul},
title = {{Functional Association Networks as Priors for Gene Regulatory Network Inference}},
url = {http://www.iscb.org/ismb2014},
year = {2014}
}

### 2013

• M. Studham, A. Tjärnberg, T. E. M. Nordling, and E. L. L. Sonnhammer, “Using functional association data as a prior for gene regulatory network inference,” in The 14th international conference on systems biology (icsb-2013) in copenhagen (denmark): abstract book, Copenhagen, Denmark, 2013.

Gene regulatory network (GRN) inference reveals the influences genes have on one another in dynamical regulatory systems. If the experimental data is inadequate to fully explain the network, informative priors have been shown to improve the accuracy of inferences. This study explores the potential of functional association databases, such as FunCoup, as a prior source for GRN inference. These databases aggregate heterogeneous data to determine a likelihood score for each gene pair which can easily be used in the context of network inference, and in doing so can add information from thousands of experiments to complement the input data. A convex optimization-based inference method, in which the prior is incorporated as part of the sparsity term, was used on synthetic data sets which differed in information content. Despite the fact that much of the functional association prior reflects correlation and not causation, this study indicates that these priors can improve accuracy in situations in which the input data alone is insufficient to explain the underlying network.

@inproceedings{Studham2013ICSB,
abstract = {Gene regulatory network (GRN) inference reveals the influences genes have on one another in dynamical regulatory systems. If the experimental data is inadequate to fully explain the network, informative priors have been shown to improve the accuracy of inferences. This study explores the potential of functional association databases, such as FunCoup, as a prior source for GRN inference. These databases aggregate heterogeneous data to determine a likelihood score for each gene pair which can easily be used in the context of network inference, and in doing so can add information from thousands of experiments to complement the input data. A convex optimization-based inference method, in which the prior is incorporated as part of the sparsity term, was used on synthetic data sets which differed in information content. Despite the fact that much of the functional association prior reflects correlation and not causation, this study indicates that these priors can improve accuracy in situations in which the input data alone is insufficient to explain the underlying network.},
author = {Matthew Studham and Tj{\"{a}}rnberg, Andreas and Nordling, Torbj{\"{o}}rn E M and Erik L L Sonnhammer},
booktitle = {The 14th International Conference on Systems Biology (ICSB-2013) in Copenhagen (Denmark): Abstract book},
file = {:Users/tn/Articles/Mendeley_collection/The 14th International Conference on Systems Biology (ICSB-2013) in Copenhagen (Denmark) Abstract book/Studham et al._2013.pdf:pdf;:Users/tn/Articles/Mendeley_collection/The 14th International Conference on Systems Biology (ICSB-2013) in Copenhagen (Denmark) Abstract book/Studham et al._2013(2).pdf:pdf},
keywords = {Network inference},
mendeley-tags = {Network inference},
title = {{Using functional association data as a prior for gene regulatory network inference}},
url = {http://www.icsb2013.dk},
year = {2013}
}

• A. Tjärnberg, T. E. M. Nordling, M. Studham, and E. L. L. Sonnhammer, “Stratified benchmarking of methods for gene regulatory network inference,” in 21st annual international conference on intelligent systems for molecular biology and 12th european conference on computational biology (ismb/eccb 2013), Berlin, Germany, 2013.

New methods for gene regulatory network (GRN) reconstruction from biological data are continuously being developed. While each method may be benchmarked against older methods upon publication, it is seldom independently benchmarked after publication on diverse data and against newer competing methods. We created diverse data sets capturing a range of network properties such as: network size, connectivity, interampatteness, as well as data properties, such as: signal to noise ratio, number of samples and the condition number of the response matrix. As a start, we compare 6 different GRN inference methods selected for their capability to use steady state perturbation data. The fact that these methods infer the same type of GRN means that we can focus on data and network properties. This gives the basis for informed decisions about under what conditions a method performs well and how to optimally utilize each method when the data is not fully informative. We compare the different inference algorithms using a wide range of performance measures and investigate under what circumstances a given inference method gives informative network models. We demonstrate that the quality of inferred gene networks in terms of mechanistic insight is highly dependent on the algorithm and properties of the network and data. The algorithm should therefore be chosen based on the expected properties of the network and data. Moreover, not only accurate measurements, but also experiments designed specifically to counteract intrinsic signal attenuation are required.

@inproceedings{Tjarnberg2013ISMB,
abstract = {New methods for gene regulatory network (GRN) reconstruction from biological data are continuously being developed. While each method may be benchmarked against older methods upon publication, it is seldom independently benchmarked after publication on diverse data and against newer competing methods. We created diverse data sets capturing a range of network properties such as: network size, connectivity, interampatteness, as well as data properties, such as: signal to noise ratio, number of samples and the condition number of the response matrix. As a start, we compare 6 different GRN inference methods selected for their capability to use steady state perturbation data. The fact that these methods infer the same type of GRN means that we can focus on data and network properties. This gives the basis for informed decisions about under what conditions a method performs well and how to optimally utilize each method when the data is not fully informative. We compare the different inference algorithms using a wide range of performance measures and investigate under what circumstances a given inference method gives informative network models. We demonstrate that the quality of inferred gene networks in terms of mechanistic insight is highly dependent on the algorithm and properties of the network and data. The algorithm should therefore be chosen based on the expected properties of the network and data. Moreover, not only accurate measurements, but also experiments designed specifically to counteract intrinsic signal attenuation are required.},
annote = {21-23 July 2013, Andreas presented},
author = {Tj{\"{a}}rnberg, Andreas and Nordling, Torbj{\"{o}}rn E M and Matthew Studham and Erik L L Sonnhammer},
booktitle = {21st Annual International Conference on Intelligent Systems for Molecular Biology and 12th European Conference on Computational Biology (ISMB/ECCB 2013)},
file = {:Users/tn/Articles/Mendeley_collection/21st Annual International Conference on Intelligent Systems for Molecular Biology and 12th European Conference on Computational Biology (ISM./Tj{\"{a}}rnberg et al._2013.pdf:pdf;:Users/tn/Articles/Mendeley_collection/21st Annual International Conference on Intelligent Systems for Molecular Biology and 12th European Conference on Computational Biology (ISM./Tj{\"{a}}rnberg et al._2013(2).pdf:pdf},
keywords = {Benchmarking,Network inference},
mendeley-tags = {Network inference},
publisher = {International Society for Computational Biology},
title = {{Stratified benchmarking of methods for gene regulatory network inference}},
year = {2013}
}

• T. E. M. Nordling, Robust inference of gene regulatory networksParis, France: Workshop hycon2-ad3 on biological and medical systems, 2013.

None

@misc{Nordling2013HYCON,
abstract = {None},
annote = {Presentation 5.7.2013.},
author = {Nordling, Torbj{\"{o}}rn E M},
booktitle = {Workshop HYCON2-AD3 on Biological and Medical systems},
howpublished = {Workshop HYCON2-AD3 on Biological and Medical systems in Paris (France)},
institution = {Workshop HYCON2-AD3 on Biological and Medical systems},
keywords = {Network inference},
month = {jun},
publisher = {Workshop HYCON2-AD3 on Biological and Medical systems},
title = {{Robust inference of gene regulatory networks}},
year = {2013}
}

• T. E. M. Nordling, “Robust inference of gene regulatory networks: System properties, variable selection, subnetworks, and design of experiments,” Ph.D. thesis PhD Thesis, Stockholm, Sweden, 2013.

In this thesis, inference of biological networks from in vivo data generated by perturbation experiments is considered, i.e. deduction of causal interactions that exist among the observed variables. Knowledge of such regulatory influences is essential in biology. A system property–interampatteness–is introduced that explains why the variation in existing gene expression data is concentrated to a few “characteristic modes” or “eigengenes”, and why previously inferred models have a large number of false positive and false negative links. An interampatte system is characterized by strong INTERactions enabling simultaneous AMPlification and ATTEnuation of different signals and we show that perturbation of individual state variables, e.g. genes, typically leads to ill-conditioned data with both characteristic and weak modes. The weak modes are typically dominated by measurement noise due to poor excitation and their existence hampers network reconstruction. The excitation problem is solved by iterative design of correlated multi-gene perturbation experiments that counteract the intrinsic signal attenuation of the system. The next perturbation should be designed such that the expected response practically spans an additional dimension of the state space. The proposed design is numerically demonstrated for the Snf1 signalling pathway in S. cerevisiae. The impact of unperturbed and unobserved latent state variables, that exist in any real biological system, on the inferred network and required set-up of the experiments for network inference is analysed. Their existence implies that a subnetwork of pseudo-direct causal regulatory influences, accounting for all environmental effects, in general is inferred. In principle, the number of latent states and different paths between the nodes of the network can be estimated, but their identity cannot be determined unless they are observed or perturbed directly. Network inference is recognized as a variable/model selection problem and solved by considering all possible models of a specified class that can explain the data at a desired significance level, and by classifying only the links present in all of these models as existing. As shown, these links can be determined without any parameter estimation by reformulating the variable selection problem as a robust rank problem. Solution of the rank problem enable assignment of confidence to individual interactions, without resorting to any approximation or asymptotic results. This is demonstrated by reverse engineering of the synthetic IRMA gene regulatory network from published data. A previously unknown activation of transcription of SWI5 by CBF1 in the IRMA strain of S. cerevisiae is proven to exist, which serves to illustrate that even the accumulated knowledge of well studied genes is incomplete.

@phdthesis{Nordling2013,
abstract = {In this thesis, inference of biological networks from in vivo data generated by perturbation experiments is considered, i.e. deduction of causal interactions that exist among the observed variables. Knowledge of such regulatory influences is essential in biology. A system property–interampatteness–is introduced that explains why the variation in existing gene expression data is concentrated to a few “characteristic modes” or “eigengenes”, and why previously inferred models have a large number of false positive and false negative links. An interampatte system is characterized by strong INTERactions enabling simultaneous AMPlification and ATTEnuation of different signals and we show that perturbation of individual state variables, e.g. genes, typically leads to ill-conditioned data with both characteristic and weak modes. The weak modes are typically dominated by measurement noise due to poor excitation and their existence hampers network reconstruction. The excitation problem is solved by iterative design of correlated multi-gene perturbation experiments that counteract the intrinsic signal attenuation of the system. The next perturbation should be designed such that the expected response practically spans an additional dimension of the state space. The proposed design is numerically demonstrated for the Snf1 signalling pathway in S. cerevisiae. The impact of unperturbed and unobserved latent state variables, that exist in any real biological system, on the inferred network and required set-up of the experiments for network inference is analysed. Their existence implies that a subnetwork of pseudo-direct causal regulatory influences, accounting for all environmental effects, in general is inferred. In principle, the number of latent states and different paths between the nodes of the network can be estimated, but their identity cannot be determined unless they are observed or perturbed directly. Network inference is recognized as a variable/model selection problem and solved by considering all possible models of a specified class that can explain the data at a desired significance level, and by classifying only the links present in all of these models as existing. As shown, these links can be determined without any parameter estimation by reformulating the variable selection problem as a robust rank problem. Solution of the rank problem enable assignment of confidence to individual interactions, without resorting to any approximation or asymptotic results. This is demonstrated by reverse engineering of the synthetic IRMA gene regulatory network from published data. A previously unknown activation of transcription of SWI5 by CBF1 in the IRMA strain of S. cerevisiae is proven to exist, which serves to illustrate that even the accumulated knowledge of well studied genes is incomplete.},
annote = {Robust inferens av genreglern{\"{a}}tverk: System egenskaper, variabel selektion, subn{\"{a}}tverk och design av experiment
Sammanfattning:
Denna avhandling behandlar inferens av biologiskan{\"{a}}tverk fr{\aa}n in vivo data genererat genom st{\"{o}}rningsexperiment, d.v.s. best{\"{a}}mning av kausala kopplingar som existerar mellan de observerade variablerna. Kunskap om dessa regulatoriska influenser {\"{a}}r v{\"{a}}sentlig f{\"{o}}r biologisk f{\"{o}}rst{\aa}else.
En system egenskap—f{\"{o}}rst{\"{a}}rksvagning—introduceras. Denna f{\"{o}}rklarar varf{\"{o}}r variationen i existerande genexpressionsdata {\"{a}}r koncentrerat till n{\aa}gra f{\aa} ”karakteristiska moder” eller ”egengener” och varf{\"{o}}r de modeller som konstruerats innan inneh{\aa}ller m{\aa}nga falska positiva och falska negativa linkar. Ett system med f{\"{o}}rst{\"{a}}rksvagning karakteriseras av starka kopplingar som m{\"{o}}jligg{\"{o}}r simultan F{\"{O}}RST{\"{A}}RKning och f{\"{o}}rSVAGNING av olika signaler. Vi demonstrerar att st{\"{o}}rning av individuella tillst{\aa}ndsvariabler, t.ex. gener, typiskt leder till illakonditionerat  data med b{\aa}de karakteristiska och svaga moder. De svaga moderna domineras typiskt av m{\"{a}}tbrus p.g.a. d{\aa}lig excitering och f{\"{o}}rsv{\aa}rar rekonstruktion av n{\"{a}}tverket.
Excitationsproblemet l{\"{o}}ses med iterativdesign av experiment d{\"{a}}r korrelerade st{\"{o}}rningar i multipla gener motverkar systemets inneboende f{\"{o}}rsvagning av signaller. F{\"{o}}ljande st{\"{o}}rning b{\"{o}}r designas s{\aa} att det f{\"{o}}rv{\"{a}}ntade svaret praktiskt sp{\"{a}}nner ytterligare en dimension av tillst{\aa}ndsrummet. Den f{\"{o}}reslagna designen demonstreras numeriskt f{\"{o}}r Snf1 signalleringsv{\"{a}}gen i S. cerevisiae.
P{\aa}verkan av ost{\"{o}}rda och icke observerade latenta tillst{\aa}ndsvariabler, som existerar i varje verkligt biologiskt system, p{\aa} konstruerade n{\"{a}}tverk och planeringen av experiment f{\"{o}}r n{\"{a}}tverksinferens analyseras. Existens av dessa tillst{\aa}ndsvariabler inneb{\"{a}}r att deln{\"{a}}tverk med pseudo-direkta regulatoriska influenser, som kompenserar f{\"{o}}r milj{\"{o}}effekter, generellt best{\"{a}}ms. I princip s{\aa} kan antalet latenta tillst{\aa}nd och alternativa v{\"{a}}gar mellan noder i n{\"{a}}tverket best{\"{a}}mmas, men deras identitet kan ej best{\"{a}}mmas om de inte direkt observeras eller st{\"{o}}rs.
N{\"{a}}tverksinferens behandlas som ett variabel-/modelselektionsproblem och l{\"{o}}ses genom att unders{\"{o}}ka alla modeller inom en vald klass som kan f{\"{o}}rklara datat p{\aa} den {\"{o}}nskade signifikansniv{\aa}n, samt klassificera endast linkar som {\"{a}}r n{\"{a}}rvarande i alla dessa modeller som existerande. Dessa linkar kan best{\"{a}}mmas utan estimering av parametrar genom att skriva om variabelselektionsproblemet som ett robustrangproblem. L{\"{o}}sning av rangproblemet m{\"{o}}jligg{\"{o}}r att statistisk konfidens kan tillskrivas individuella linkar utan approximationer eller asymptotiska betraktningar. Detta demonstreras genom rekonstruktion av det syntetiska IRMA genreglern{\"{a}}tverket fr{\aa}n publicerat data. En tidigare ok{\"{a}}nd aktivering av transkription av SWI5 av CBF1 i IRMA stammen av S. cerevisiae bevisas. Detta illustrerar att t.o.m. den ackumulerade kunskapen om v{\"{a}}lstuderade gener {\"{a}}r ofullst{\"{a}}ndig.},
author = {Nordling, Torbj{\"{o}}rn E M},
file = {:Users/tn/Articles/Mendeley_collection/Unknown/Nordling_2013.pdf:pdf},
isbn = {978-91-7501-762-4},
keywords = {Network inference,Robust network inference,Robust variable selection,Variable selection,biological networks,design of experiments,feature selection,gene regulatory networks,model selection,network inference,network theory,perturbation experiments,reverse engineering,subnetworks,subset selection,system identification,system theory,variable selection},
mendeley-tags = {Network inference,Robust network inference,Robust variable selection,Variable selection},
pages = {xi, 350},
school = {KTH Royal Institute of Technology},
title = {{Robust inference of gene regulatory networks: System properties, variable selection, subnetworks, and design of experiments}},
type = {Ph.D. thesis},
url = {http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-120830},
year = {2013}
}

• A. Tjärnberg, T. E. M. Nordling, M. Studham, and E. L. L. Sonnhammer, “Optimal sparsity criteria for network inference,” Journal of computational biology, vol. 20, iss. 5, p. 398–408, 2013. doi:10.1089/cmb.2012.0268

Gene regulatory network inference, i.e. determination of the regulatory interactions between a set of genes, provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/regularization coefficient, which we call $\zeta$ (zeta), to determine the degree of sparsity of the network estimates, i.e. the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of $\zeta$. In order to avoid such poor choices, we propose a method for optimisation of $\zeta$ which maximizes the accuracy of the inferred network for any sparsity-dependent inference method and data set. Our procedure is based on leave one out cross optimisation and selection of the $\zeta$ value that minimizes the prediction error. We also illustrate the adverse effect of noise, few samples, and uninformative experiments on network inference and our method for optimisation of $\zeta$. We demonstrate that our $\zeta$ optimisation method for two widely used inference algorithms–Glmnet and NIR–gives accurate and informative estimates of the network structure, given that the data is informative enough.

@article{Tjarnberg2013,
abstract = {Gene regulatory network inference, i.e. determination of the regulatory interactions between a set of genes, provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/regularization coefficient, which we call $\zeta$ (zeta), to determine the degree of sparsity of the network estimates, i.e. the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of $\zeta$. In order to avoid such poor choices, we propose a method for optimisation of $\zeta$ which maximizes the accuracy of the inferred network for any sparsity-dependent inference method and data set. Our procedure is based on leave one out cross optimisation and selection of the $\zeta$ value that minimizes the prediction error. We also illustrate the adverse effect of noise, few samples, and uninformative experiments on network inference and our method for optimisation of $\zeta$. We demonstrate that our $\zeta$ optimisation method for two widely used inference algorithms--Glmnet and NIR--gives accurate and informative estimates of the network structure, given that the data is informative enough.},
author = {Tj{\"{a}}rnberg, Andreas and Nordling, Torbj{\"{o}}rn E M and Matthew Studham and Erik L L Sonnhammer},
doi = {10.1089/cmb.2012.0268},
file = {:Users/tn/Articles/Mendeley_collection/Journal of Computational Biology/Tj{\"{a}}rnberg et al._2013.pdf:pdf},
journal = {Journal of Computational Biology},
keywords = {Corresponding author,Journal,Network inference,Sparsity,Systems biology},
mendeley-tags = {Corresponding author,Journal,Network inference,Sparsity,Systems biology},
month = {may},
number = {5},
pages = {398--408},
title = {{Optimal sparsity criteria for network inference}},
url = {http://dx.doi.org/10.1089/cmb.2012.0268},
volume = {20},
year = {2013}
}

### 2012

• E. W. Jacobsen and T. E. M. Nordling, “Robust inference of gene regulatory networks,” in The 13th international conference on systems biology (icsb-2012) in toronto (canada): abstract book, Toronto, Canda, 2012.

To successfully infer the biochemical network that underly a given biological function, two problems must be resolved. First, sufficiently informative data that allow discrimination between alternative models corresponding to different network structures must be recorded. Second, the ”correct” network model with a structure including only the active interactions must be selected based on the recorded data set. In this work we address both these problems within the framework of robust inference. We first address the problem in a deterministic framework and show that determination of the existence of a specific interaction, or directed network edge, can be reduced to a rank test on a matrix constructed from available perturbation and response data. To deal with uncertainty, we introduce a norm-bounded set around the nominal data points in the sample space and assume the true response of the system is within this set. A similar uncertainty description is employed to describe uncertainties in the applied perturbations. Determination of the existence of a specific interaction under uncertainty can then be formulated as a robust rank problem, which can be solved using results from robust control theory. The proposed method provides necessary and sufficient conditions for the existence of a directed network edge under the assumption that the true response is within the given uncertainty set. Similarly, network edges can be determined with a robustness margin, i.e., the size of the uncertainty set for which the edge can identified with confidentiality. An important outcome of the method is determination of interactions for which the available data set does not contain sufficient information to infer existence or non-existence with confidence. Furthermore, based on well known results from linear algebra, we show how specific perturbation experiments can be designed to generate data that enable inference of a specific edge at a given level of confidence.

@inproceedings{Jacobsen2012ICSB,
abstract = {To successfully infer the biochemical network that underly a given biological function, two problems must be resolved. First, sufficiently informative data that allow discrimination between alternative models corresponding to different network structures must be recorded. Second, the ''correct'' network model with a structure including only the active interactions must be selected based on the recorded data set. In this work we address both these problems within the framework of robust inference. We first address the problem in a deterministic framework and show that determination of the existence of a specific interaction, or directed network edge, can be reduced to a rank test on a matrix constructed from available perturbation and response data. To deal with uncertainty, we introduce a norm-bounded set around the nominal data points in the sample space and assume the true response of the system is within this set. A similar uncertainty description is employed to describe uncertainties in the applied perturbations. Determination of the existence of a specific interaction under uncertainty can then be formulated as a robust rank problem, which can be solved using results from robust control theory. The proposed method provides necessary and sufficient conditions for the existence of a directed network edge under the assumption that the true response is within the given uncertainty set. Similarly, network edges can be determined with a robustness margin, i.e., the size of the uncertainty set for which the edge can identified with confidentiality. An important outcome of the method is determination of interactions for which the available data set does not contain sufficient information to infer existence or non-existence with confidence. Furthermore, based on well known results from linear algebra, we show how specific perturbation experiments can be designed to generate data that enable inference of a specific edge at a given level of confidence.},
annote = {19-23.8.2012},
author = {Elling W Jacobsen and Nordling, Torbj{\"{o}}rn E M},
booktitle = {The 13th International Conference on Systems Biology (ICSB-2012) in Toronto (Canada): Abstract book},
editor = {Andrews, Brenda and Boone, Charlie},
file = {:Users/tn/Articles/Mendeley_collection/The 13th International Conference on Systems Biology (ICSB-2012) in Toronto (Canada) Abstract book/Jacobsen, Nordling_2012.pdf:pdf;:Users/tn/Articles/Mendeley_collection/The 13th International Conference on Systems Biology (ICSB-2012) in Toronto (Canada) Abstract book/Jacobsen, Nordling_2012(2).pdf:pdf},
keywords = {Gene regulatory network,Network inference,robust network reconstruction},
title = {{Robust inference of gene regulatory networks}},
url = {http://www.icsb2012toronto.com/},
year = {2012}
}

• A. Tjärnberg, T. E. M. Nordling, M. Studham, and E. L. L. Sonnhammer, Optimal sparsity criteria for network inferenceSan Francisco, California, U.S.A.: , 2012.

Gene regulatory network inference, i.e. determination of the regulatory interactions between a set of genes, provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/regularization coefficient, which we call $\zeta$ (zeta), to determine the degree of sparsity of the network estimates, i.e. the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of $\zeta$. In order to avoid such poor choices, we propose a method for optimisation of $\zeta$ which maximizes the accuracy of the inferred network for any sparsity-dependent inference method and data set. Our procedure is based on leave one out cross optimisation and selection of the $\zeta$ value that minimizes the prediction error. We also illustrate the adverse effect of noise, few samples, and uninformative experiments on network inference and our method for optimisation of $\zeta$. We demonstrate that our $\zeta$ optimisation method for two widely used inference algorithms–Glmnet and NIR–gives accurate and informative estimates of the network structure, given that the data is informative enough.

@misc{Tjarnberg2012RECOMB,
abstract = {Gene regulatory network inference, i.e. determination of the regulatory interactions between a set of genes, provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/regularization coefficient, which we call $\zeta$ (zeta), to determine the degree of sparsity of the network estimates, i.e. the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of $\zeta$. In order to avoid such poor choices, we propose a method for optimisation of $\zeta$ which maximizes the accuracy of the inferred network for any sparsity-dependent inference method and data set. Our procedure is based on leave one out cross optimisation and selection of the $\zeta$ value that minimizes the prediction error. We also illustrate the adverse effect of noise, few samples, and uninformative experiments on network inference and our method for optimisation of $\zeta$. We demonstrate that our $\zeta$ optimisation method for two widely used inference algorithms--Glmnet and NIR--gives accurate and informative estimates of the network structure, given that the data is informative enough.},
address = {San Francisco, California, U.S.A.},
annote = {Oral presentation given by Andreas Tj{\"{a}}rnberg},
author = {Tj{\"{a}}rnberg, Andreas and Nordling, Torbj{\"{o}}rn E M and Matthew Studham and Erik L L Sonnhammer},
booktitle = {5th annual RECOMB Conference on Regulatory and Systems Genomics, with DREAM Challenges in San Francisco (USA)},
file = {:Users/tn/Articles/Mendeley_collection/5th annual RECOMB Conference on Regulatory and Systems Genomics, with DREAM Challenges in San Francisco (USA)/Tj{\"{a}}rnberg et al._2012.pdf:pdf},
title = {{Optimal sparsity criteria for network inference}},
url = {http://recomb-2012.c2b2.columbia.edu/},
year = {2012}
}

### 2011

• R. Jörnsten, T. Abenius, T. Kling, L. Schmidt, E. Johansson, T. E. M. Nordling, B. Nordlander, C. Sander, P. Gennemark, K. Funa, B. Nilsson, L. Lindahl, and S. Nelander, “Network modeling of the transcriptional effects of copy number aberrations in glioblastoma.,” Molecular systems biology, vol. 7, iss. 1, p. 486, 2011. doi:10.1038/msb.2011.17

DNA copy number aberrations (CNAs) are a hallmark of cancer genomes. However, little is known about how such changes affect global gene expression. We develop a modeling framework, EPoC (Endogenous Perturbation analysis of Cancer), to (1) detect disease-driving CNAs and their effect on target mRNA expression, and to (2) stratify cancer patients into long- and short-term survivors. Our method constructs causal network models of gene expression by combining genome-wide DNA- and RNA-level data. Prognostic scores are obtained from a singular value decomposition of the networks. By applying EPoC to glioblastoma data from The Cancer Genome Atlas consortium, we demonstrate that the resulting network models contain known disease-relevant hub genes, reveal interesting candidate hubs, and uncover predictors of patient survival. Targeted validations in four glioblastoma cell lines support selected predictions, and implicate the p53-interacting protein Necdin in suppressing glioblastoma cell growth. We conclude that large-scale network modeling of the effects of CNAs on gene expression may provide insights into the biology of human cancer. Free software in MATLAB and R is provided.

@article{Jornsten2011,
abstract = {DNA copy number aberrations (CNAs) are a hallmark of cancer genomes. However, little is known about how such changes affect global gene expression. We develop a modeling framework, EPoC (Endogenous Perturbation analysis of Cancer), to (1) detect disease-driving CNAs and their effect on target mRNA expression, and to (2) stratify cancer patients into long- and short-term survivors. Our method constructs causal network models of gene expression by combining genome-wide DNA- and RNA-level data. Prognostic scores are obtained from a singular value decomposition of the networks. By applying EPoC to glioblastoma data from The Cancer Genome Atlas consortium, we demonstrate that the resulting network models contain known disease-relevant hub genes, reveal interesting candidate hubs, and uncover predictors of patient survival. Targeted validations in four glioblastoma cell lines support selected predictions, and implicate the p53-interacting protein Necdin in suppressing glioblastoma cell growth. We conclude that large-scale network modeling of the effects of CNAs on gene expression may provide insights into the biology of human cancer. Free software in MATLAB and R is provided.},
author = {J{\"{o}}rnsten, Rebecka and Tobias Abenius and Teresia Kling and Schmidt, Linn{\'{e}}a and Erik Johansson and Nordling, Torbj{\"{o}}rn E M and Bodil Nordlander and Chris Sander and Peter Gennemark and Keiko Funa and Nilsson, Bj{\"{o}}rn and Linda Lindahl and Sven Nelander},
doi = {10.1038/msb.2011.17},
file = {:Users/tn/Articles/Mendeley_collection/Molecular systems biology/J{\"{o}}rnsten et al._2011.pdf:pdf},
issn = {1744-4292},
journal = {Molecular systems biology},
keywords = {Journal},
language = {en},
mendeley-tags = {Journal},
month = {apr},
number = {1},
pages = {486},
pmid = {21525872},
publisher = {Nature Publishing Group},
title = {{Network modeling of the transcriptional effects of copy number aberrations in glioblastoma.}},
url = {http://www.nature.com/msb/journal/v7/n1/synopsis/msb201117.html},
volume = {7},
year = {2011}
}

• T. E. M. Nordling and E. W. Jacobsen, “On Sparsity As a Criterion in Reconstructing Biochemical Networks,” in Proceedings of the 18th international federation of automatic control (ifac) world congress, 2011, Milano, Italy, 2011, p. 11672–11678. doi:10.3182/20110828-6-IT-1002.03499

A common problem in inference of gene regulatory networks from experimental response data is the relatively small number of samples available in relation to the number of nodes/states. In many cases the identification problem is underdetermined and prior knowledge is required for the network reconstruction. A specific prior that has gained widespread popularity is the assumption that the underlying network is sparsely connected. This has led to a flood of network reconstruction algorithms based on subset selection and regularization techniques, mainly adopted from the statistics and signal processing communities. In particular, methods based on L1 and L2-penalties on the interaction strengths, such as LASSO, have been widely proposed and applied. We briefly review some of these methods and discuss their suitability for inferring the structure of biochemical networks. A particular problem is the fact that these methods provide little or no information on the uncertainty of individual identified edges, combined with the fact that the identified networks usually have a large fraction of false positives as well as false negatives.To partly overcome these problems we consider conditions that can be used to classify edges into those that can be uniquely determined based on a given incomplete data set, those that cannot be uniquely determined due to collinearity in the data and those for which no information is available. Apart from providing a label of confidence for the individual edges in the identified network, the classification can be used to improve the reconstruction by employing standard unbiased identification methods to the identifiable edges while employing sparse approximation methods for the remaining network. The method is demonstrated through application to a synthetic network in yeast which has recently been proposed for in vivo assessment of network identification methods.

@inproceedings{Nordling:2011:IFAC,
abstract = {A common problem in inference of gene regulatory networks from experimental response data is the relatively small number of samples available in relation to the number of nodes/states. In many cases the identification problem is underdetermined and prior knowledge is required for the network reconstruction. A specific prior that has gained widespread popularity is the assumption that the underlying network is sparsely connected. This has led to a flood of network reconstruction algorithms based on subset selection and regularization techniques, mainly adopted from the statistics and signal processing communities. In particular, methods based on L1 and L2-penalties on the interaction strengths, such as LASSO, have been widely proposed and applied. We briefly review some of these methods and discuss their suitability for inferring the structure of biochemical networks. A particular problem is the fact that these methods provide little or no information on the uncertainty of individual identified edges, combined with the fact that the identified networks usually have a large fraction of false positives as well as false negatives.To partly overcome these problems we consider conditions that can be used to classify edges into those that can be uniquely determined based on a given incomplete data set, those that cannot be uniquely determined due to collinearity in the data and those for which no information is available. Apart from providing a label of confidence for the individual edges in the identified network, the classification can be used to improve the reconstruction by employing standard unbiased identification methods to the identifiable edges while employing sparse approximation methods for the remaining network. The method is demonstrated through application to a synthetic network in yeast which has recently been proposed for in vivo assessment of network identification methods.},
author = {Nordling, Torbj{\"{o}}rn E M and Elling W Jacobsen},
booktitle = {Proceedings of the 18th International Federation of Automatic Control (IFAC) World Congress, 2011},
doi = {10.3182/20110828-6-IT-1002.03499},
editor = {Sergio, Bittanti and Cenedese, Angelo and Zampieri, Sandro},
file = {:Users/tn/Articles/Mendeley_collection/Proceedings of the 18th International Federation of Automatic Control (IFAC) World Congress, 2011/Nordling, Jacobsen_2011.pdf:pdf},
isbn = {978-3-902661-93-7},
keywords = {Gene regulatory networks,Modelling,Network inference,Regularization,Reverse engineering,Sparse networks,System identification},
month = {aug},
pages = {11672--11678},
publisher = {The International Federation of Automatic Control},
title = {{On Sparsity As a Criterion in Reconstructing Biochemical Networks}},
url = {http://www.ifac-papersonline.net/Detailed/51305.html},
year = {2011}
}

### 2010

• T. E. Nordling and E. W. Jacobsen, Interampatte systems–implications for gene regulationLund, Sweden: Reglermöte 2010, 2010.
[BibTeX] [Abstract]

The degree of interampatteness is a measure of a systems ability to simultaneously amplify and attenuate different input signals [1]. Interampatteness was introduced in the context of gene regulatory networks in order to explain, from a systems perspective, why essentially all variation in published gene expression data is concentrated to significantly fewer “characteristic modes” [2] or “eigengenes” [3] than the number of both recorded assays and measured genes. An interampatte system is characterised by strong INTERactions enabling simultaneous AMPlification and ATTEnuation of different signals. The response of an interampatte system to perturbations will therefore typically contain characteristic modes, but also mechanistically equally important weak modes which so far largely have been neglected. In particular, the information needed to reverse engineer the structure of the underlying regulatory network is largely contained in the weak modes and we have earlier demonstrated that even small measurement errors may cause practical unidentifiability [4]. For predictive purposes, on the other hand, the characteristic modes contain all essential signals and hence a good fit of these is sufficient to obtain a good predictive model. In this sense, interampatteness is a system property that enables data compression and use of models of reduced order for predictive purposes, but that obstructs inference of the network structure. Thus, it is important to make a clear distinction between identification for prediction versus identification for mechanistic knowledge. We explore the relationship between interampatteness and control. Interampatteness is shown to result from long cascades of amplification steps or from positive and negative feedback loops. Finally, we show that any network containing subprocesses operating at different time-scales will be interampatte.

@misc{Nordling2010Regler,
abstract = {The degree of interampatteness is a measure of a systems ability to simultaneously amplify and attenuate different input signals [1]. Interampatteness was introduced in the context of gene regulatory networks in order to explain, from a systems perspective, why essentially all variation in published gene expression data is concentrated to significantly fewer “characteristic modes” [2] or “eigengenes” [3] than the number of both recorded assays and measured genes. An interampatte system is characterised by strong INTERactions enabling simultaneous AMPlification and ATTEnuation of different signals. The response of an interampatte system to perturbations will therefore typically contain characteristic modes, but also mechanistically equally important weak modes which so far largely have been neglected. In particular, the information needed to reverse engineer the structure of the underlying regulatory network is largely contained in the weak modes and we have earlier demonstrated that even small measurement errors may cause practical unidentifiability [4]. For predictive purposes, on the other hand, the characteristic modes contain all essential signals and hence a good fit of these is sufficient to obtain a good predictive model. In this sense, interampatteness is a system property that enables data compression and use of models of reduced order for predictive purposes, but that obstructs inference of the network structure. Thus, it is important to make a clear distinction between identification for prediction versus identification for mechanistic knowledge. We explore the relationship between interampatteness and control. Interampatteness is shown to result from long cascades of amplification steps or from positive and negative feedback loops. Finally, we show that any network containing subprocesses operating at different time-scales will be interampatte.},
author = {Nordling, Torbj{\"{o}}rn EM and Elling W Jacobsen},
booktitle = {Reglerm{\"{o}}te 2010 in Lund, Sweden, August 2010},
file = {:Users/tn/Articles/Mendeley_collection/Reglerm{\"{o}}te 2010 in Lund, Sweden, August 2010/Nordling, Jacobsen_2010.pdf:pdf},
howpublished = {Reglerm{\"{o}}te 2010 in Lund, Sweden},
institution = {Reglerm{\"{o}}te 2010},
month = {aug},
publisher = {Reglerm{\"{o}}te 2010},
title = {{Interampatte systems--implications for gene regulation}},
year = {2010}
}

• T. E. M. Nordling and E. W. Jacobsen, “Sparsity is a means and not an aim in inference of gene regulatory networks,” in The 11th international conference on systems biology (icsb-2010) in edinburgh (uk): abstract book, Edinburgh, UK, 2010.

Availability of high-throughput gene expression data has lead to numerous attempts to infer network models of gene regulation based on expression changes. The low number of observations compared to the number of genes, the low signal-to-noise ratios, and the system being interampatte make the inference problem ill-posed and challenging. To solve the problem a majority of all published approaches resort to regularization, e.g. the LASSO penalty is used to find a sparse model. Regularization is known to introduce a bias, but its effect on inferred gene regulatory networks has hardly been investigated. In machine learning and compressed sensing, where regularization has been widely applied and studied, the objective is to reproduce a signal and the actual variable selection is of minor importance as long as the signal is reproduced well. In network inference, on the other hand, the variable selection is crucial since we want to identify the true topology of the network and a minimal number of links is not an aim per se. We first study the inference problem in a deterministic setting in order to gain insight and derive conditions on when the regularization causes false negative and positive links. By viewing the problem as a parameter identifiability problem, we establish three cases in which a subset of the parameters can be uniquely determined. Finally we devise conditions for invalidation of the inferred links using existing or additional data; resulting in an iterative procedure of inference and experiment design that significantly increases the confidence in the inferred network model.

@inproceedings{Nordling2010ICSB,
abstract = {Availability of high-throughput gene expression data has lead to numerous attempts to infer network models of gene regulation based on expression changes. The low number of observations compared to the number of genes, the low signal-to-noise ratios, and the system being interampatte make the inference problem ill-posed and challenging. To solve the problem a majority of all published approaches resort to regularization, e.g. the LASSO penalty is used to find a sparse model. Regularization is known to introduce a bias, but its effect on inferred gene regulatory networks has hardly been investigated. In machine learning and compressed sensing, where regularization has been widely applied and studied, the objective is to reproduce a signal and the actual variable selection is of minor importance as long as the signal is reproduced well. In network inference, on the other hand, the variable selection is crucial since we want to identify the true topology of the network and a minimal number of links is not an aim per se. We first study the inference problem in a deterministic setting in order to gain insight and derive conditions on when the regularization causes false negative and positive links. By viewing the problem as a parameter identifiability problem, we establish three cases in which a subset of the parameters can be uniquely determined. Finally we devise conditions for invalidation of the inferred links using existing or additional data; resulting in an iterative procedure of inference and experiment design that significantly increases the confidence in the inferred network model.},
author = {Nordling, Torbj{\"{o}}rn E M and Elling W Jacobsen},
booktitle = {The 11th International Conference on Systems Biology (ICSB-2010) in Edinburgh (UK): Abstract book},
file = {:Users/tn/Articles/Mendeley_collection/The 11th International Conference on Systems Biology (ICSB-2010) in Edinburgh (UK) Abstract book/Nordling, Jacobsen_2010.pdf:pdf},
keywords = {Gene regulatory networks,LASSO,Network inference,Regularization,Sparse networks},
month = {oct},
title = {{Sparsity is a means and not an aim in inference of gene regulatory networks}},
year = {2010}
}

• T. E. M. Nordling and E. W. Jacobsen, Design of perturbations is the key to inference of tumour specific gene regulation, 2010.

Tumour development requires alteration of the normal gene regulation of involved cell types. Mapping of these alterations and inference of the resulting local disease network is therefore crucial to improve our understanding of tumour progression and develop novel cures. Based on the number of known alterations and subtypes of each form of cancer, we assume that the network inference needs to be based on subtype and cell specific expression data to obtain the necessary specific knowledge. We have identified design of perturbations as the key to successful inference of such locally altered gene regulatory networks. Analysis of published gene expression data sets reveal that the variation in expression is concentrated to significantly fewer “characteristic modes” (Holter et al. 2000) or “eigengenes” (Alter et al. 2000) than both the number of recorded assays and the number of measured genes. In other words, the responses obtained in standard experiments are typically concentrated to a subset of the gene space. This is an advantage when considering modelling for predicting gene responses to external perturbations, since the model only needs to capture the characteristic modes correctly for this purpose. However, it seriously hampers network inference, since it implies that models with widely different network structure are practically indistinguishable based on standard response data. To infer the structure we need to design specific perturbations that yield a sufficiently strong signal also for perturbations that are attenuated by the system, i.e., excite the weak modes of the network. The perturbations needed depend on the unknown system and we have therefore developed an iterative design, which we here demonstrate on two published gene expression data sets (Lorenz et al. 2009, Gardner et al. 2003). Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A. 2000 Aug 29; 97(18): 10101-6. Gardner TS, di Bernardo D, Lorenz D, Collins JJ, Inferring genetic networks and identifying compound mode of action via expression profiling. Science. 2003 Jul 4; 301(5629): 102-5. Holter NS, Mitra M, Maritan A, Cieplak M, Banavar JR, Fedoroff NV. Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc Natl Acad Sci U S A. 2000 Jul 18; 97(15): 8409-14. Lorenz DR, Cantor CR, Collins JJ. A network biology approach to aging in yeast. Proc Natl Acad Sci U S A. 2009 Jan 27; 106(4): 1145-50.

@misc{Nordling2010DMM,
abstract = {Tumour development requires alteration of the normal gene regulation of involved cell types. Mapping of these alterations and inference of the resulting local disease network is therefore crucial to improve our understanding of tumour progression and develop novel cures. Based on the number of known alterations and subtypes of each form of cancer, we assume that the network inference needs to be based on subtype and cell specific expression data to obtain the necessary specific knowledge. We have identified design of perturbations as the key to successful inference of such locally altered gene regulatory networks. Analysis of published gene expression data sets reveal that the variation in expression is concentrated to significantly fewer characteristic modes'' (Holter et al. 2000) or eigengenes'' (Alter et al. 2000) than both the number of recorded assays and the number of measured genes. In other words, the responses obtained in standard experiments are typically concentrated to a subset of the gene space. This is an advantage when considering modelling for predicting gene responses to external perturbations, since the model only needs to capture the characteristic modes correctly for this purpose. However, it seriously hampers network inference, since it implies that models with widely different network structure are practically indistinguishable based on standard response data. To infer the structure we need to design specific perturbations that yield a sufficiently strong signal also for perturbations that are attenuated by the system, i.e., excite the weak modes of the network. The perturbations needed depend on the unknown system and we have therefore developed an iterative design, which we here demonstrate on two published gene expression data sets (Lorenz et al. 2009, Gardner et al. 2003). Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A. 2000 Aug 29; 97(18): 10101-6. Gardner TS, di Bernardo D, Lorenz D, Collins JJ, Inferring genetic networks and identifying compound mode of action via expression profiling. Science. 2003 Jul 4; 301(5629): 102-5. Holter NS, Mitra M, Maritan A, Cieplak M, Banavar JR, Fedoroff NV. Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc Natl Acad Sci U S A. 2000 Jul 18; 97(15): 8409-14. Lorenz DR, Cantor CR, Collins JJ. A network biology approach to aging in yeast. Proc Natl Acad Sci U S A. 2009 Jan 27; 106(4): 1145-50.},
author = {Nordling, Torbj{\"{o}}rn E M and Elling W Jacobsen},
booktitle = {MGH-KI-Cell press Days of Molecular Medicine, Stockholm (Sweden)},
file = {:Users/tn/Articles/Mendeley_collection/MGH-KI-Cell press Days of Molecular Medicine, Stockholm (Sweden)/Nordling, Jacobsen_2010.pdf:pdf},
howpublished = {MGH-KI-Cell press Days of Molecular Medicine, Stockholm (Sweden)},
keywords = {Cancer,Experiment design,Interampatteness,Network inference,System identification,Systems Biology},
month = {may},
title = {{Design of perturbations is the key to inference of tumour specific gene regulation}},
year = {2010}
}

### 2009

• T. E. M. Nordling and E. W. Jacobsen, Invalidating models of gene regulatory networks – the implications of characteristic and weak modes for network inference, 2009.

Analysis of published gene expression data sets reveal that the variation in expression is concentrated to significantly fewer characteristic modes’ or eigengenes’ than both the number of recorded assays and the number of measured genes. In other words, the responses obtained in standard experiments are typically concentrated to a subset of the gene space. This is an advantage when considering modelling for predicting gene responses to external perturbations, since the model only needs to capture the characteristic modes correctly for this purpose. However, it seriously hampers network inference, since it implies that models with widely different network structure are practically indistinguishable based on standard response data. Furthermore, as we show here, the presence of characteristics modes implies that it is easy to validate and hard to invalidate false model structures. The information required to invalidate a false model is hidden in the weak modes that contribute only weakly to the gene response data and therefore are largely hidden in the measurement noise. Here we use two published gene expression data sets and an in silico gene regulatory network to illustrate the principal differences between validation and invalidation of models of gene regulatory networks. All three systems have a high degree of interampatteness (see ref.), i.e. some perturbations are amplified while others are attenuated by the system. The response of an interampatte system to random perturbations can be desccribed well based on the characteristic modes only, implying that it is easy to validate any model that predicts the characteristic modes correctly. To invalidate a model we need to design specific perturbations that yield a sufficiently strong signal also for perturbations that are attenuated by the system, i.e., that excites the weak modes of the network. From a biological perspective it is trivial to realize that amplification and attenuation of perturbations are equally important for biological function, and hence a proper model should be able to predict both the weak and characteristic modes correctly. We stress that the common assumption that the quality of a model can be judged based on its ability to predict response data only holds for systems with a low degree of interampatteness. Nordling TEM, Jacobsen EW Interampatteness–a generic property of biochemical networks. IET Systems Biology, 2009, in press.

@misc{Nordling2009EPBS,
abstract = {Analysis of published gene expression data sets reveal that the variation in expression is concentrated to significantly fewer characteristic modes' or eigengenes' than both the number of recorded assays and the number of measured genes. In other words, the responses obtained in standard experiments are typically concentrated to a subset of the gene space. This is an advantage when considering modelling for predicting gene responses to external perturbations, since the model only needs to capture the characteristic modes correctly for this purpose. However, it seriously hampers network inference, since it implies that models with widely different network structure are practically indistinguishable based on standard response data. Furthermore, as we show here, the presence of characteristics modes implies that it is easy to validate and hard to invalidate false model structures. The information required to invalidate a false model is hidden in the weak modes that contribute only weakly to the gene response data and therefore are largely hidden in the measurement noise. Here we use two published gene expression data sets and an in silico gene regulatory network to illustrate the principal differences between validation and invalidation of models of gene regulatory networks. All three systems have a high degree of interampatteness (see ref.), i.e. some perturbations are amplified while others are attenuated by the system. The response of an interampatte system to random perturbations can be desccribed well based on the characteristic modes only, implying that it is easy to validate any model that predicts the characteristic modes correctly. To invalidate a model we need to design specific perturbations that yield a sufficiently strong signal also for perturbations that are attenuated by the system, i.e., that excites the weak modes of the network. From a biological perspective it is trivial to realize that amplification and attenuation of perturbations are equally important for biological function, and hence a proper model should be able to predict both the weak and characteristic modes correctly. We stress that the common assumption that the quality of a model can be judged based on its ability to predict response data only holds for systems with a low degree of interampatteness. Nordling TEM, Jacobsen EW Interampatteness--a generic property of biochemical networks. IET Systems Biology, 2009, in press.},
author = {Nordling, Torbj{\"{o}}rn E M and Elling W Jacobsen},
booktitle = {Engineering Principles in Biological systems conference, Hinxton (UK)},
file = {:Users/tn/Articles/Mendeley_collection/Engineering Principles in Biological systems conference, Hinxton (UK)/Nordling, Jacobsen_2009.pdf:pdf},
howpublished = {Engineering Principles in Biological systems conference, Hinxton (UK)},
keywords = {Experiment design,System identification,Systems Biology},
month = {oct},
title = {{Invalidating models of gene regulatory networks - the implications of characteristic and weak modes for network inference}},
year = {2009}
}

• T. E. M. Nordling and E. W. Jacobsen, “Interampatteness–a generic property of biochemical networks,” Iet syst biol, vol. 3, iss. 5, p. 388–403, 2009. doi:10.1049/iet-syb.2009.0008

Analysis of gene expression data sets reveals that the variation in expression is concentrated to significantly fewer characteristic modes’ or eigengenes’ than the number of both recorded assays and measured genes. Previous works have stressed the importance of these characteristic modes, but neglected the equally important weak modes. Herein a generic system property – interampatteness – is defined that explains the previous feature, and assigns equal weight to the characteristic and weak modes. An interampatte network is characterised by strong INTERactions enabling simultaneous AMPlification and ATTEnuation of different signals. It is postulated that biochemical networks are interampatte, based on published experimental data and theoretical considerations. Existence of multiple time-scales and feedback loops is shown to increase the degree of interampatteness. Interampatteness has strong implications for the dynamics and reverse engineering of the network. One consequence is highly correlated changes in gene expression in response to external perturbations, even in the absence of common transcription factors, implying that interampatte gene regulatory networks erroneously may be assumed to have co-expressed/co-regulated genes. Data compression or reduction of the system dimensionality using clustering, singular value decomposition, principal component analysis or some other data mining technique results in a loss of information that will obstruct reconstruction of the underlying network. [Includes supplementary material]

@article{Nordling:2009:Interampatte,
abstract = {Analysis of gene expression data sets reveals that the variation in expression is concentrated to significantly fewer characteristic modes' or eigengenes' than the number of both recorded assays and measured genes. Previous works have stressed the importance of these characteristic modes, but neglected the equally important weak modes. Herein a generic system property - interampatteness - is defined that explains the previous feature, and assigns equal weight to the characteristic and weak modes. An interampatte network is characterised by strong INTERactions enabling simultaneous AMPlification and ATTEnuation of different signals. It is postulated that biochemical networks are interampatte, based on published experimental data and theoretical considerations. Existence of multiple time-scales and feedback loops is shown to increase the degree of interampatteness. Interampatteness has strong implications for the dynamics and reverse engineering of the network. One consequence is highly correlated changes in gene expression in response to external perturbations, even in the absence of common transcription factors, implying that interampatte gene regulatory networks erroneously may be assumed to have co-expressed/co-regulated genes. Data compression or reduction of the system dimensionality using clustering, singular value decomposition, principal component analysis or some other data mining technique results in a loss of information that will obstruct reconstruction of the underlying network. [Includes supplementary material]},
author = {Nordling, Torbj{\"{o}}rn E M and Elling W Jacobsen},
doi = {10.1049/iet-syb.2009.0008},
file = {:Users/tn/Articles/Mendeley_collection/IET Syst Biol/Nordling, Jacobsen_2009.pdf:pdf},
journal = {IET Syst Biol},
keywords = {Corresponding author,Experiment design,Interampatte,Journal,Properties of biosystems},
mendeley-tags = {Corresponding author,Journal},
month = {sep},
number = {5},
pages = {388--403},
title = {{Interampatteness--a generic property of biochemical networks}},
url = {http://dx.doi.org/10.1049/iet-syb.2009.0008},
volume = {3},
year = {2009}
}

### 2008

• T. E. M. Nordling and E. W. Jacobsen, On inference of gene regulatory networks – identification of MIMO systems with different time-scalesLuleå, Sweden: Reglermöte 2008, 2008.
[BibTeX] [Abstract]

Gene regulatory networks can be viewed as MIMO systems consisting of strongly interacting systems with widely differing time-scales. This poses specific problems in inferring the network structure from measurements of gene activities. We here present an iterative experiment design algorithm for identification of such systems, aimed at identifying the network structure with a specified accuracy using a minimum number of steady-state experiments and measurements. The design is based on a state-space description of the network, in which identification of the network structure corresponds to determination of the Jacobian from measurements of all state variables, i.e. gene activities. Each step of the design strives to span a new direction of the state-space with a certain variance. We also present an upper bound on the relative estimation error.We apply the proposed design to a biological example where the primary aim is to identify the intrinsic feedback within the system. A cell can be seen as a chemical plant, where all processes are controlled by the genome. Based on environmental and internal signals the gene regulatory system controls the production of the cell. Toxic substances have to be maintained on a low level, while indispensable substances are obtained in sufficient quantities. The same pathway may contain both toxic and indispensable intermediates linked through both fast and slow reactions, which all need to be properly regulated. Therefore the gene regulatory system contains numerous feedback loops operating at different time-scales in a non-separable way. Biologists are currently trying to map out the gene regulatory system of various functions based on measurements of gene expression, making our example a highly relevant problem.

@misc{Nordling2008ReglerPoster,
abstract = {Gene regulatory networks can be viewed as MIMO systems consisting of strongly interacting systems with widely differing time-scales. This poses specific problems in inferring the network structure from measurements of gene activities. We here present an iterative experiment design algorithm for identification of such systems, aimed at identifying the network structure with a specified accuracy using a minimum number of steady-state experiments and measurements. The design is based on a state-space description of the network, in which identification of the network structure corresponds to determination of the Jacobian from measurements of all state variables, i.e. gene activities. Each step of the design strives to span a new direction of the state-space with a certain variance. We also present an upper bound on the relative estimation error.We apply the proposed design to a biological example where the primary aim is to identify the intrinsic feedback within the system. A cell can be seen as a chemical plant, where all processes are controlled by the genome. Based on environmental and internal signals the gene regulatory system controls the production of the cell. Toxic substances have to be maintained on a low level, while indispensable substances are obtained in sufficient quantities. The same pathway may contain both toxic and indispensable intermediates linked through both fast and slow reactions, which all need to be properly regulated. Therefore the gene regulatory system contains numerous feedback loops operating at different time-scales in a non-separable way. Biologists are currently trying to map out the gene regulatory system of various functions based on measurements of gene expression, making our example a highly relevant problem.},
author = {Nordling, Torbj{\"{o}}rn E M and Elling W Jacobsen},
booktitle = {Reglerm{\"{o}}te 2008 in Lule{\aa} (Sweden)},
howpublished = {Reglerm{\"{o}}te 2008 in Lule{\aa}, Sweden},
institution = {Reglerm{\"{o}}te 2008},
month = {jun},
publisher = {Reglerm{\"{o}}te 2008},
title = {{On inference of gene regulatory networks - identification of MIMO systems with different time-scales}},
year = {2008}
}

• M. Hellgren, T. E. M. Nordling, E. W. Jacobsen, and J. O. Höög, “Multi-level modelling of the parallell metabolism of ethanol and retinol, with implications for foetal alcohol syndrome,” in The 9th international conference on systems biology (icsb-2008) in gothenburg (sweden): abstract book, Gothenburg, Sweden, 2008.
[BibTeX] [Abstract]

Objective: Models of the human metabolism are important for understanding diseases and could serve as a powerful tool in the drug discovery process. The complexity of even a unicellular organism is tremendous and most researchers have therefore limited their modelling efforts to bacteria, or single intracellular pathways. We studied the parallel metabolism of ethanol and retinol in humans, because of its suggested physiological importance for the development of foetal alcohol syndrome. Large ethanol intake will inhibit the conversion of retinol into retinoic acid, which is a crucial transcription factor during embryonic development. In this study the objective was to construct a quantitative model that connects phenotype observations at a population, organic and intracellular level with differences in genotype and ethanol metabolism, for further prediction of the influence on the foetus. Results: We constructed a multiple compartments model, which included a detailed desccription of the ethanol and retinol metabolism in hepatic cells for different genotypes. The model has been validated using published time-series measurements of ethanol, acetaldehyde and acetate concentrations in the blood. This model correctly accounts for differences in geno- and phenotype observed within the human population. Furthermore, the model shows that the retinol metabolism is decreased by ethanol ingestion, both via a reduced intracellular NAD+ concentration, and by an inhibition of alcohol and aldehyde dehydrogenases. Conclusions: We considered the problem of multi-level modelling with a human model for the ethanol and retinol metabolism in different compartments. This links intracellular mechanisms to macroscopic observations. The model explained the connection between geno- and phenotype differences observed at a population level. This model also shows a plausible relationship between ethanol and retinol metabolism for e.g. foetal alcohol syndrome.

@inproceedings{Hellgren2008ICSB,
abstract = {Objective: Models of the human metabolism are important for understanding diseases and could serve as a powerful tool in the drug discovery process. The complexity of even a unicellular organism is tremendous and most researchers have therefore limited their modelling efforts to bacteria, or single intracellular pathways. We studied the parallel metabolism of ethanol and retinol in humans, because of its suggested physiological importance for the development of foetal alcohol syndrome. Large ethanol intake will inhibit the conversion of retinol into retinoic acid, which is a crucial transcription factor during embryonic development. In this study the objective was to construct a quantitative model that connects phenotype observations at a population, organic and intracellular level with differences in genotype and ethanol metabolism, for further prediction of the influence on the foetus. Results: We constructed a multiple compartments model, which included a detailed desccription of the ethanol and retinol metabolism in hepatic cells for different genotypes. The model has been validated using published time-series measurements of ethanol, acetaldehyde and acetate concentrations in the blood. This model correctly accounts for differences in geno- and phenotype observed within the human population. Furthermore, the model shows that the retinol metabolism is decreased by ethanol ingestion, both via a reduced intracellular NAD+ concentration, and by an inhibition of alcohol and aldehyde dehydrogenases. Conclusions: We considered the problem of multi-level modelling with a human model for the ethanol and retinol metabolism in different compartments. This links intracellular mechanisms to macroscopic observations. The model explained the connection between geno- and phenotype differences observed at a population level. This model also shows a plausible relationship between ethanol and retinol metabolism for e.g. foetal alcohol syndrome.},
author = {Mikko Hellgren and Nordling, Torbj{\"{o}}rn E M and Elling W Jacobsen and H{\"{o}}{\"{o}}g, Jan Olov},
booktitle = {The 9th International Conference on Systems Biology (ICSB-2008) in Gothenburg (Sweden): Abstract book},
file = {:Users/tn/Articles/Mendeley_collection/The 9th International Conference on Systems Biology (ICSB-2008) in Gothenburg (Sweden) Abstract book/Hellgren et al._2008.pdf:pdf},
isbn = {9781615673322},
keywords = {Systems Biology},
month = {aug},
publisher = {University Of Gothenburg, Curran Associates, Inc.},
title = {{Multi-level modelling of the parallell metabolism of ethanol and retinol, with implications for foetal alcohol syndrome}},
year = {2008}
}

• T. E. M. Nordling and E. W. Jacobsen, Ill-conditioning – a property of bio-networks, 2008.

Analysis of large gene expression datasets shows that they all share the same feature; the variance is concentrated to significantly fewer orthogonal directions than the applied perturbations span. Given that all perturbations are of same magnitude, this shows that the underlying networks are ill-conditioned. We establish ill-conditioning as a generic property of biochemical networks, resulting from the fact that all networks need to provide both signal amplification and disturbance attenuation. One consequence of ill-conditioning is the commonly observed co-expression of genes.

@misc{Nordling2008qbio,
abstract = {Analysis of large gene expression datasets shows that they all share the same feature; the variance is concentrated to significantly fewer orthogonal directions than the applied perturbations span. Given that all perturbations are of same magnitude, this shows that the underlying networks are ill-conditioned. We establish ill-conditioning as a generic property of biochemical networks, resulting from the fact that all networks need to provide both signal amplification and disturbance attenuation. One consequence of ill-conditioning is the commonly observed co-expression of genes.},
author = {Nordling, Torbj{\"{o}}rn E M and Elling W Jacobsen},
booktitle = {The 2nd annual q-bio conference on cellular information, Santa Fe (USA)},
file = {:Users/tn/Articles/Mendeley_collection/The 2nd annual q-bio conference on cellular information, Santa Fe (USA)/Nordling, Jacobsen_2008.pdf:pdf},
howpublished = {The 2nd annual q-bio conference on cellular information, Santa Fe (U.S.A.)},
keywords = {Experiment design,System identification,System properties,Systems Biology},
month = {aug},
title = {{Ill-conditioning - a property of bio-networks}},
year = {2008}
}

• T. E. M. Nordling and E. W. Jacobsen, Inference of interampatte gene regulatory networks – with application to apoptosis signallingGothenburg, Sweden: University of gothenburg, curran associates, inc., 2008.

Objective: Inference of gene regulatory networks (GRN) from quantitative expression data has the potential to reveal all interactions existing within a selected set of genes. However, microarray data typically only contain a few characteristic modes or eigengenes, even when a large number of arrays are recorded at varying experimental conditions. The reason and implications of this inherent rank deficiency has largely been neglected, even though rank deficiency caused by fewer experiments than measured genes has been addressed. We explain why the data in the former case are rank deficient, what it implies for network inference, and how to counteract it through experiment design. Results: We define interampatte systems as systems characterised by strong interactions necessary to both amplify and attenuate different signals at multiple time-scales. GRN are interampatte with strong directional dependence. This generic network property make microarray data rank deficient and gives rise to features observed as characteristic modes, eigengenes and co-expressed genes. While few modes imply that low order models can be used for data compression and prediction, it effectively prevents inference of causal interactions, since many sparse networks with completely different structure fit equally well to the dataset. We illustrate this problem using a previously published model of apoptosis signalling. Inference based on standard experiments, i.e. perturbing genes one-by-one, is shown to yield networks with the wrong structure although its predictive ability is validated using independent validation data. We present an iterative algorithm for experiment design that guarantees sufficient excitation of all network modes and demonstrate its effectiveness. Conclusions: Systematic design of perturbation experiments, where several genes are perturbed simultaneously in a controlled fashion, is necessary in order to infer the true structure of GRN from expression data. It is likely that many inferred network models with validated predictive properties have falsely identified gene interactions.

@misc{Nordling2008ICSB,
abstract = {Objective: Inference of gene regulatory networks (GRN) from quantitative expression data has the potential to reveal all interactions existing within a selected set of genes. However, microarray data typically only contain a few characteristic modes or eigengenes, even when a large number of arrays are recorded at varying experimental conditions. The reason and implications of this inherent rank deficiency has largely been neglected, even though rank deficiency caused by fewer experiments than measured genes has been addressed. We explain why the data in the former case are rank deficient, what it implies for network inference, and how to counteract it through experiment design. Results: We define interampatte systems as systems characterised by strong interactions necessary to both amplify and attenuate different signals at multiple time-scales. GRN are interampatte with strong directional dependence. This generic network property make microarray data rank deficient and gives rise to features observed as characteristic modes, eigengenes and co-expressed genes. While few modes imply that low order models can be used for data compression and prediction, it effectively prevents inference of causal interactions, since many sparse networks with completely different structure fit equally well to the dataset. We illustrate this problem using a previously published model of apoptosis signalling. Inference based on standard experiments, i.e. perturbing genes one-by-one, is shown to yield networks with the wrong structure although its predictive ability is validated using independent validation data. We present an iterative algorithm for experiment design that guarantees sufficient excitation of all network modes and demonstrate its effectiveness. Conclusions: Systematic design of perturbation experiments, where several genes are perturbed simultaneously in a controlled fashion, is necessary in order to infer the true structure of GRN from expression data. It is likely that many inferred network models with validated predictive properties have falsely identified gene interactions.},
author = {Nordling, Torbj{\"{o}}rn E M and Elling W Jacobsen},
booktitle = {The 9th International Conference on Systems Biology (ICSB-2008) in Gothenburg (Sweden): Abstract book},
file = {:Users/tn/Articles/Mendeley_collection/The 9th International Conference on Systems Biology (ICSB-2008) in Gothenburg (Sweden) Abstract book/Nordling, Jacobsen_2008.pdf:pdf},
howpublished = {The 9th International Conference on Systems Biology (ICSB-2008), Gothenburg (Sweden)},
keywords = {Apoptosis,Gene regulatory networks,Network inference},
month = {aug},
publisher = {University Of Gothenburg, Curran Associates, Inc.},
title = {{Inference of interampatte gene regulatory networks - with application to apoptosis signalling}},
year = {2008}
}

• J. Alander, A. Autere, O. Kanniainen, J. Koljonen, T. E. M. Nordling, and P. Välisuo, “Near infrared wavelength relevance detection of ultraviolet radiation-induced erythema,” Journal of near infrared spectroscopy, vol. 16, iss. 3, p. 233–241, 2008. doi:10.1255/jnirs.782

The acute effects of sun-bathing on the near-infrared absorption spectra of human skin were studied by exposing the shoulders of a male test subject to bright Finnish high summer mid-day sun. The spectra were measured before, immediately after and for several days after exposure. Four different spectral. processing and classification methods were applied to the data set to identify differences caused by exposure to the sun. The spectrophotometer and measuring procedure were found to cause some systematic errors, calling for further development, even though they could, to a large extent, be compensated for computationally. Spectral regions indicating ultraviolet radiation-induced erythema were Located and the degree of erythema could be predicted correctly but the signal is weak. This paper discusses promising wavelength selection methods to study the dermal effects of exposure to the sun, as well as difficulties and remedies of near infrared spectroscopic measurements of the skin.

@article{Alander:2008,
abstract = {The acute effects of sun-bathing on the near-infrared absorption spectra of human skin were studied by exposing the shoulders of a male test subject to bright Finnish high summer mid-day sun. The spectra were measured before, immediately after and for several days after exposure. Four different spectral. processing and classification methods were applied to the data set to identify differences caused by exposure to the sun. The spectrophotometer and measuring procedure were found to cause some systematic errors, calling for further development, even though they could, to a large extent, be compensated for computationally. Spectral regions indicating ultraviolet radiation-induced erythema were Located and the degree of erythema could be predicted correctly but the signal is weak. This paper discusses promising wavelength selection methods to study the dermal effects of exposure to the sun, as well as difficulties and remedies of near infrared spectroscopic measurements of the skin.},
author = {Jarmo Alander and Antti Autere and Olli Kanniainen and Janne Koljonen and Nordling, Torbj{\"{o}}rn E M and V{\"{a}}lisuo, Petri},
doi = {10.1255/jnirs.782},
file = {:Users/tn/Articles/Mendeley_collection/Journal of Near Infrared Spectroscopy/Alander et al._2008.pdf:pdf;:Users/tn/Articles/Mendeley_collection/Journal of Near Infrared Spectroscopy/Alander et al._2008.pdf:pdf},
issn = {0967-0335},
journal = {Journal of Near Infrared Spectroscopy},
keywords = {Erythema,Journal,Near-infrared spectroscopy},
mendeley-tags = {Journal},
month = {jul},
number = {3},
pages = {233--241},
title = {{Near infrared wavelength relevance detection of ultraviolet radiation-induced erythema}},
url = {http://www.impublications.com/nir/abstract/J16_0233},
volume = {16},
year = {2008}
}

• J. Koljonen, T. E. M. Nordling, and J. Alander, “A review of genetic algorithms in near infrared spectroscopy and chemometrics: past and future,” Journal of near infrared spectroscopy, vol. 16, iss. 3, p. 189–197, 2008. doi:10.1255/jnirs.778

Global optimisation and search problems are abundant in science and engineering, including spectroscopy and its applications. Therefore, it is hardly surprising that general optimisation and search methods such as genetic algorithms (GAs) have also found applications in the area of near infrared INIRI spectroscopy. A brief introduction to genetic algorithms, their objectives and applications in NIR spectroscopy, as well as in chemometrics, is given. The most popular application for GAs in NIR spectroscopy is wavelength, or more generally speaking, variable selection. GAs are both frequently used and convenient in multi-criteria optimisation; for example, selection of pre-processing methods, wavelength inclusion, and selection of Latent variables can be optimised simultaneously. Wavelet transform has recently been applied to pre-processing of NIR data. In particular, hybrid methods of wavelets and genetic algorithms have in a number of research papers been applied to pre-processing, wavelength selection and regression with good success. In all calibrations and, in particular, when optimising, it is essential to validate the model and to avoid over-fitting. GAs have a Large potential when addressing these two major problems and we believe that many future applications will emerge. To conclude, optimisation gives good opportunities to simultaneously develop an accurate calibration model and to regulate model complexity and prediction ability within a considered validation framework.

@article{Koljonen:2008,
abstract = {Global optimisation and search problems are abundant in science and engineering, including spectroscopy and its applications. Therefore, it is hardly surprising that general optimisation and search methods such as genetic algorithms (GAs) have also found applications in the area of near infrared INIRI spectroscopy. A brief introduction to genetic algorithms, their objectives and applications in NIR spectroscopy, as well as in chemometrics, is given. The most popular application for GAs in NIR spectroscopy is wavelength, or more generally speaking, variable selection. GAs are both frequently used and convenient in multi-criteria optimisation; for example, selection of pre-processing methods, wavelength inclusion, and selection of Latent variables can be optimised simultaneously. Wavelet transform has recently been applied to pre-processing of NIR data. In particular, hybrid methods of wavelets and genetic algorithms have in a number of research papers been applied to pre-processing, wavelength selection and regression with good success. In all calibrations and, in particular, when optimising, it is essential to validate the model and to avoid over-fitting. GAs have a Large potential when addressing these two major problems and we believe that many future applications will emerge. To conclude, optimisation gives good opportunities to simultaneously develop an accurate calibration model and to regulate model complexity and prediction ability within a considered validation framework.},
author = {Janne Koljonen and Nordling, Torbj{\"{o}}rn E M and Jarmo Alander},
doi = {10.1255/jnirs.778},
file = {:Users/tn/Articles/Mendeley_collection/Journal of Near Infrared Spectroscopy/Koljonen, Nordling, Alander_2008.pdf:pdf},
issn = {0967-0335},
journal = {Journal of Near Infrared Spectroscopy},
keywords = {Ch,Genetic algorithms,Journal,Near-infrared spectroscopy},
mendeley-tags = {Journal},
month = {jul},
number = {3},
pages = {189--197},
title = {{A review of genetic algorithms in near infrared spectroscopy and chemometrics: past and future}},
url = {http://www.impublications.com/nir/abstract/J16_0189},
volume = {16},
year = {2008}
}

### 2007

• J. Koljonen, J. T. Alander, T. E. M. Nordling, and P. Välisuo, “A review on evolutionary optimisation and search methods in NIR spectroscopy,” in The 13th international conference on near infrared spectroscopy (13th icnirs) in umeå-vasa (sweden and finland): abstract book, Umeå-Vasa, Sweden and Finland, 2007.
[BibTeX] [Abstract]

Global optimisation and search problems are abundant in science and engineering, including spectroscopy and its applications. Therefore, such general global optimisation and search methods as genetic and evolutionary algorithms have found applications also in the area of near-infrared spectroscopy. This paper gives a brief introduction and review on genetic and evolutionary algorithms and their general objectives and applications in spectroscopy, especially in near-infrared spectroscopy and chemometrics. Furthermore, some potential future applications of the optimization methods in spectroscopy are highlighted. In addition, a short comparison between the applications of genetic algorithms in spectroscopy versus their applications in other areas of science and engineering is done. The review is based on our experience on evolutionary methods in various applications during more than a decade of active research of global optimisation and search methods. The most popular applications have been in wavelength, or more generally parameter, selection and in nonlinear regression, which are emphasised in this review. In a separate paper, several wavelength selection methods, including genetic algorithms, are compared.

@inproceedings{Koljonen2008ICNIRS,
abstract = {Global optimisation and search problems are abundant in science and engineering, including spectroscopy and its applications. Therefore, such general global optimisation and search methods as genetic and evolutionary algorithms have found applications also in the area of near-infrared spectroscopy. This paper gives a brief introduction and review on genetic and evolutionary algorithms and their general objectives and applications in spectroscopy, especially in near-infrared spectroscopy and chemometrics. Furthermore, some potential future applications of the optimization methods in spectroscopy are highlighted. In addition, a short comparison between the applications of genetic algorithms in spectroscopy versus their applications in other areas of science and engineering is done. The review is based on our experience on evolutionary methods in various applications during more than a decade of active research of global optimisation and search methods. The most popular applications have been in wavelength, or more generally parameter, selection and in nonlinear regression, which are emphasised in this review. In a separate paper, several wavelength selection methods, including genetic algorithms, are compared.},
address = {Ume{\aa}-Vasa, Sweden and Finland},
annote = {Abstract of A review of genetic algorithms in near infrared spectroscopy and chemometrics: past and future', which was later published in Journal of Near Infrared Spectroscopy, 16(3): 189?198, July 2008.},
author = {Janne Koljonen and Jarmo T Alander and Nordling, Torbj{\"{o}}rn E M and V{\"{a}}lisuo, Petri},
booktitle = {The 13th International Conference on Near Infrared Spectroscopy (13th ICNIRS) in Ume{\aa}-Vasa (Sweden and Finland): Abstract book},
file = {:Users/tn/Articles/Mendeley_collection/The 13th International Conference on Near Infrared Spectroscopy (13th ICNIRS) in Ume{\aa}-Vasa (Sweden and Finland) Abstract book/Koljonen et al._2007.pdf:pdf},
keywords = {Near-infrared spectroscopy,evolutionary optimisation,evolutionary strategies,genetic algorithms,genetic programming,global optimisation,search,spectroscopy},
month = {jun},
title = {{A review on evolutionary optimisation and search methods in NIR spectroscopy}},
year = {2007}
}

• T. E. M. Nordling, N. Hiroi, A. Funahashi, and H. Kitano, “Detection of functional modules in graphs depend on a linearization of the biochemical network,” in The 15th annual international conference on intelligent systems for molecular biology (ismb) and 6th european conference on computational biology (eccb) in vienna (austria): abstract book, Vienna, Austria, 2007.

Non-linear behaviour of biochemical networks, such as intracellular gene, protein or metabolic networks, is commonly represented using graphs of the underlying topology. Nodes represent abundance of molecules and edges interactions between pairs of molecules. These graphs are linear and thus based on an implicit linearization of the kinetic reactions in one or several dynamic modes of the total system. It is common to use data from different sources � experiments conducted under different conditions or even on different species � meaning that the graph will be a superposition of linearizations made in many different modes. We originally constructed and analysed a mammalian fibroblast model, including 1,343 nodes and 3,291 edges, by using data from different sources. Here we propose that the ability to distinguish different dynamic modes when composing a network graph is critical for deduction of intracellular sub-systems based on topology. The mixing of different modes makes it hard to identify functional modules, that is sub-systems that carry out a specific biological function, since the graph will contain many interactions that do not naturally occur at the same time. The ability to establish a boundary between the sub-system and its environment is critical in the definition of a module, contrary to a motif in which only internal interactions count. Identification of functional modules should therefore be done on graphs depicting the mode in which their function is carried out, i.e. graphs that only contain edges representing interactions active in the specific mode. In general, when an interaction between two molecules is established, one should always state the mode of the system in which it is active.

@inproceedings{Nordling2007ISMB,
abstract = {Non-linear behaviour of biochemical networks, such as intracellular gene, protein or metabolic networks, is commonly represented using graphs of the underlying topology. Nodes represent abundance of molecules and edges interactions between pairs of molecules. These graphs are linear and thus based on an implicit linearization of the kinetic reactions in one or several dynamic modes of the total system. It is common to use data from different sources � experiments conducted under different conditions or even on different species � meaning that the graph will be a superposition of linearizations made in many different modes. We originally constructed and analysed a mammalian fibroblast model, including 1,343 nodes and 3,291 edges, by using data from different sources. Here we propose that the ability to distinguish different dynamic modes when composing a network graph is critical for deduction of intracellular sub-systems based on topology. The mixing of different modes makes it hard to identify functional modules, that is sub-systems that carry out a specific biological function, since the graph will contain many interactions that do not naturally occur at the same time. The ability to establish a boundary between the sub-system and its environment is critical in the definition of a module, contrary to a motif in which only internal interactions count. Identification of functional modules should therefore be done on graphs depicting the mode in which their function is carried out, i.e. graphs that only contain edges representing interactions active in the specific mode. In general, when an interaction between two molecules is established, one should always state the mode of the system in which it is active.},
author = {Nordling, Torbj{\"{o}}rn E M and Noriko Hiroi and Akira Funahashi and Hiroaki Kitano},
booktitle = {The 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 6th European Conference on Computational Biology (ECCB) in Vienna (Austria): Abstract book},
file = {:Users/tn/Articles/Mendeley_collection/The 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 6th European Conference on Computational Bi./Nordling et al._2007.pdf:pdf;:Users/tn/Articles/Mendeley_collection/The 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 6th European Conference on Computational Bi./Nordling et al._2007(2).pdf:pdf},
keywords = {Systems Biology,biochemical network,functional module,linear graphs,network theory},
month = {jul},
publisher = {International Society for Computational Biology},
title = {{Detection of functional modules in graphs depend on a linearization of the biochemical network}},
year = {2007}
}

• J. Koljonen, J. T. Alander, T. E. M. Nordling, A. Autere, P. Välisuo, and O. Kanniainen, “The Effects of Erythema on Near-Infrared Absorption Spectra,” in The 13th international conference on near infrared spectroscopy (13th icnirs) in umeå-vasa (sweden and finland): abstract book, Umeå-Vasa, Sweden and Finland, 2007.

The accute effects of sun-bathing on the near-infrared absorption spectra of skin was studied by exposuring the shoulders of a male vol- unteer to bright Finnish midsummer midday sun and measuring the spectra before and after the exposure and on several days after the exposure. Several spectra processing and classification methods were applied on the data set to find the differences caused by the sun exposure. The spectrophotometer and measuring procedure were found to cause some systematic errors that were reduced computationally. It is however more advisable to avoid such errors by careful handling of the spectrometer device and overall design of the skin measurement protocol. This paper discusses the effects of sun exposure to skin and its near-infrared spectra, diffculties and remedies of skin measurements, promising computational methods, and results.

@inproceedings{Alander2008ICNIRS,
abstract = {The accute effects of sun-bathing on the near-infrared absorption spectra of skin was studied by exposuring the shoulders of a male vol- unteer to bright Finnish midsummer midday sun and measuring the spectra before and after the exposure and on several days after the exposure. Several spectra processing and classification methods were applied on the data set to find the differences caused by the sun exposure. The spectrophotometer and measuring procedure were found to cause some systematic errors that were reduced computationally. It is however more advisable to avoid such errors by careful handling of the spectrometer device and overall design of the skin measurement protocol. This paper discusses the effects of sun exposure to skin and its near-infrared spectra, diffculties and remedies of skin measurements, promising computational methods, and results.},
address = {Ume{\aa}-Vasa, Sweden and Finland},
annote = {Later published as Near infrared wavelength relevance detection of ultraviolet radiation-induced erythema' in Journal of Near Infrared Spectroscopy, 16(3): 233?242, July 2008.},
author = {Janne Koljonen and Jarmo T Alander and Nordling, Torbj{\"{o}}rn E M and Antti Autere and V{\"{a}}lisuo, Petri and Olli Kanniainen},
booktitle = {The 13th International Conference on Near Infrared Spectroscopy (13th ICNIRS) in Ume{\aa}-Vasa (Sweden and Finland): Abstract book},
file = {:Users/tn/Articles/Mendeley_collection/The 13th International Conference on Near Infrared Spectroscopy (13th ICNIRS) in Ume{\aa}-Vasa (Sweden and Finland) Abstract book/Koljonen et al._2007(2).pdf:pdf;:Users/tn/Articles/Mendeley_collection/The 13th International Conference on Near Infrared Spectroscopy (13th ICNIRS) in Ume{\aa}-Vasa (Sweden and Finland) Abstract book/Koljonen et al._2007(3).pdf:pdf},
keywords = {Erythema,Near-infrared spectroscopy},
month = {jun},
title = {{The Effects of Erythema on Near-Infrared Absorption Spectra}},
url = {ftp://garbo.uwasa.fi/cs/report05-4/sun.pdf},
year = {2007}
}

• T. E. M. Nordling, N. Hiroi, A. Funahashi, and H. Kitano, “Deduction of intracellular sub-systems from a topological description of the network.,” Molecular biosystems, vol. 3, iss. 8, p. 523–9, 2007. doi:10.1039/b702142a

Non-linear behaviour of biochemical networks, such as intracellular gene, protein or metabolic networks, is commonly represented using graphs of the underlying topology. Nodes represent abundance of molecules and edges interactions between pairs of molecules. These graphs are linear and thus based on an implicit linearization of the kinetic reactions in one or several dynamic modes of the total system. It is common to use data from different sources – experiments conducted under different conditions or even on different species – meaning that the graph will be a superposition of linearizations made in many different modes. The mixing of different modes makes it hard to identify functional modules, that is sub-systems that carry out a specific biological function, since the graph will contain many interactions that do not naturally occur at the same time. The ability to establish a boundary between the sub-system and its environment is critical in the definition of a module, contrary to a motif in which only internal interactions count. Identification of functional modules should therefore be done on graphs depicting the mode in which their function is carried out, i.e. graphs that only contain edges representing interactions active in the specific mode. In general, when an interaction between two molecules is established, one should always state the mode of the system in which it is active.

@article{Nordling:2007,
abstract = {Non-linear behaviour of biochemical networks, such as intracellular gene, protein or metabolic networks, is commonly represented using graphs of the underlying topology. Nodes represent abundance of molecules and edges interactions between pairs of molecules. These graphs are linear and thus based on an implicit linearization of the kinetic reactions in one or several dynamic modes of the total system. It is common to use data from different sources -- experiments conducted under different conditions or even on different species -- meaning that the graph will be a superposition of linearizations made in many different modes. The mixing of different modes makes it hard to identify functional modules, that is sub-systems that carry out a specific biological function, since the graph will contain many interactions that do not naturally occur at the same time. The ability to establish a boundary between the sub-system and its environment is critical in the definition of a module, contrary to a motif in which only internal interactions count. Identification of functional modules should therefore be done on graphs depicting the mode in which their function is carried out, i.e. graphs that only contain edges representing interactions active in the specific mode. In general, when an interaction between two molecules is established, one should always state the mode of the system in which it is active.},
author = {Nordling, Torbj{\"{o}}rn E M and Noriko Hiroi and Akira Funahashi and Hiroaki Kitano},
doi = {10.1039/b702142a},
file = {:Users/tn/Articles/Mendeley_collection/Molecular BioSystems/Nordling et al._2007.pdf:pdf},
issn = {1742-206X},
journal = {Molecular BioSystems},
keywords = {Animals,Apoptosis,Biological,Cell Physiological Phenomena,Fibroblasts,Fibroblasts: cytology,Fibroblasts: physiology,Genes,Journal,Kinetics,Mammals,Mathematics,Models,Molecular,Proteins,Proteins: chemistry,Proteins: metabolism},
mendeley-tags = {Journal},
month = {aug},
number = {8},
pages = {523--9},
pmid = {17639126},
title = {{Deduction of intracellular sub-systems from a topological description of the network.}},
url = {http://dx.doi.org/10.1039/b702142a},
volume = {3},
year = {2007}
}

• T. E. M. Nordling, N. Hiroi, and A. Funahashi, Dynamic modes in biological systems: Why should a biologist care?Long Beach, USA: California institute of technology, 2007.

Dynamic modes are essential for approximation of biochemical networks and can be viewed as regions in which the change of gene, protein and metabolite levels remains fairly constant for some time. Every interaction between two molecules is active only in certain modes; the topology of the network hence depends on the mode and mixing of modes makes it, e.g., hard to identify functional modules. In general, one should always explicitly state the mode(s) of the system that any model describes or in which experimental data was recorded. We discuss dynamic modes, using two examples: a microarray dataset recorded on S. cerevisiae, and a mechanistic nonlinear model of receptor induced apoptosis. (Ref: Nordling et al. 2007, doi:10.1039/b702142a)

@misc{Nordling2007ICSB,
abstract = {Dynamic modes are essential for approximation of biochemical networks and can be viewed as regions in which the change of gene, protein and metabolite levels remains fairly constant for some time. Every interaction between two molecules is active only in certain modes; the topology of the network hence depends on the mode and mixing of modes makes it, e.g., hard to identify functional modules. In general, one should always explicitly state the mode(s) of the system that any model describes or in which experimental data was recorded. We discuss dynamic modes, using two examples: a microarray dataset recorded on S. cerevisiae, and a mechanistic nonlinear model of receptor induced apoptosis. (Ref: Nordling et al. 2007, doi:10.1039/b702142a)},
author = {Nordling, Torbj{\"{o}}rn E M and Noriko Hiroi and Akira Funahashi},
booktitle = {The 8th International Conference on Systems Biology (ICSB-2007) in Long Beach (USA): Abstract book},
file = {:Users/tn/Articles/Mendeley_collection/The 8th International Conference on Systems Biology (ICSB-2007) in Long Beach (USA) Abstract book/Nordling, Hiroi, Funahashi_2007.pdf:pdf;:Users/tn/Articles/Mendeley_collection/The 8th International Conference on Systems Biology (ICSB-2007) in Long Beach (USA) Abstract book/Nordling, Hiroi, Funahashi_2007(2).pdf:pdf},
howpublished = {The 8th International Conference on Systems Biology (ICSB-2007), Long Beach (U.S.A.)},
keywords = {Biochemical network,Dynamic mode,Network theory,Systems Biology},
month = {oct},
publisher = {California Institute of Technology},
title = {{Dynamic modes in biological systems: Why should a biologist care?}},
year = {2007}
}

• T. E. M. Nordling and E. W. Jacobsen, “Experiment Design for Proper Excitation of Gene Regulatory Networks,” in Proceedings of foundations of systems biology in engineering (fosbe), 2nd conference, Stuttgart, Germany, 2007.

Feedback is ubiquitous in gene regulatory networks, and provide e.g., homeostasis and signal amplification. The presence of feedback has significant implications for network inference since it implies that the gene responses to perturbation experiments typically will be strongly correlated, leading to ill-conditioning of the measurement matrix. The ill-conditioning will represent a fundamental problem in network identification since it implies that some of the network interactions will be identified with gross errors. To overcome this problem, we propose herein a systematic iterative experiment design that ensures sufficient excitations of all network interactions. The method leads to combinatorial perturbation experiments, in which a number of genes are perturbed simultaneously. The effectiveness of the method is demonstrated by application to an in silico regulatory network.

@inproceedings{Nordling:2007:FOSBE,
abstract = {Feedback is ubiquitous in gene regulatory networks, and provide e.g., homeostasis and signal amplification. The presence of feedback has significant implications for network inference since it implies that the gene responses to perturbation experiments typically will be strongly correlated, leading to ill-conditioning of the measurement matrix. The ill-conditioning will represent a fundamental problem in network identification since it implies that some of the network interactions will be identified with gross errors. To overcome this problem, we propose herein a systematic iterative experiment design that ensures sufficient excitations of all network interactions. The method leads to combinatorial perturbation experiments, in which a number of genes are perturbed simultaneously. The effectiveness of the method is demonstrated by application to an in silico regulatory network.},
author = {Nordling, Torbj{\"{o}}rn E M and Elling W Jacobsen},
booktitle = {Proceedings of Foundations of Systems Biology in Engineering (FOSBE), 2nd Conference},
file = {:Users/tn/Articles/Mendeley_collection/Proceedings of Foundations of Systems Biology in Engineering (FOSBE), 2nd Conference/Nordling, Jacobsen_2007.pdf:pdf;:Users/tn/Articles/Mendeley_collection/Proceedings of Foundations of Systems Biology in Engineering (FOSBE), 2nd Conference/Nordling, Jacobsen_2007.pdf:pdf},
isbn = {978-3-8167-7436-5},
keywords = {Experiment design,Gene regulatory networks,Network inference,Perturbation experiments,Reverse engineering,System identification},
month = {sep},
publisher = {Fraunhofer IRB Verlag, Postfach 800469, 70504 Stuttgart, Germany},
title = {{Experiment Design for Proper Excitation of Gene Regulatory Networks}},
url = {http://www.ee.kth.se/php/modules/publications/reports/2007/IR-EE-RT_2007_031.pdf},
year = {2007}
}

### 2006

• T. E. {Nordling and E. W. Jacobsen}, Experiment Design for Identification of Gene Regulatory NetworksStockholm, Sweden: Reglermöte 2006, 2006.
[BibTeX] [Abstract]

We focus on how to design biological in vivo experiments that yield the information needed to determine both the structure and dynamics of biochemical networks in a particular physiological state. In engineering, experiment design is typically based on minimization of some covariance matrix measure. This procedure depends on the initial model and may provide slow convergence. A SVD based procedure may provide better estimates than D-optimal design.

@misc{Nordling2006ReglerPoster,
abstract = {We focus on how to design biological in vivo experiments that yield the information needed to determine both the structure and dynamics of biochemical networks in a particular physiological state. In engineering, experiment design is typically based on minimization of some covariance matrix measure. This procedure depends on the initial model and may provide slow convergence. A SVD based procedure may provide better estimates than D-optimal design.},
author = {{Nordling, Torbj{\"{o}}rn EM and Jacobsen}, Elling W},
booktitle = {Reglerm{\"{o}}te 2006 in Stockholm (Sweden)},
howpublished = {Reglerm{\"{o}}te 2006 in Stockholm, Sweden, June 2006},
month = {jun},
publisher = {Reglerm{\"{o}}te 2006},
title = {{Experiment Design for Identification of Gene Regulatory Networks}},
year = {2006}
}

• T. E. M. Nordling and E. W. Jacobsen, “Experiment design for optimal excitation of gene regulatory networks,” in The 7th international conference on systems biology (icsb-2006) in yokohama (japan): abstract book, 2006.

Identification of gene regulatory networks from quantitative data has attracted significant interest in recent years. The focus has mainly been on determining model structures and algorithms for fitting experimental data, while the problem of obtaining suitable experimental data largely has been neglected. In this work we focus on the problem of systematically designing in vivo/in vitro experiments that will yield the information needed to determine both the structure and dynamics of biochemical networks. As a first approximation we consider linear dynamic models valid in a particular physiological state. We propose an iterative design strategy, where selection of the perturbation, sampling time and number of samples in each experiment is based on available partial information about the system, i.e. an ill-conditioned or rank deficient measurement matrix. Three different sources of such deficiency exist: (i) unidirectionality intrinsic to the system, due to moiety conservation or strongly correlated variables, (ii) fast dynamic modes and (iii) incomplete excitation of the system. The former two can be identified and �lifted out� of the measurement matrix, while the latter require additional experimental data. Our experiment design strategy endeavours in each step to provide information perpendicular to the existing one. When all directions of the state space, spanned by the gene network, are present in the measurements matrix, the design emphasizes those directions where the least information has been obtained. Existing optimum design strategies are based on maximization of some measure of the Fisher information matrix (FIM). An a priori model of the system is needed to determine the FIM and hence good prior knowledge of the system is essential. Otherwise the design will give slow convergence, corresponding to an excessive number of experiments. Our approach requires no prior information and its effectiveness is here demonstrated through identification of in silico networks previously proposed in the literature.

@inproceedings{Nordling2006ICSB,
abstract = {Identification of gene regulatory networks from quantitative data has attracted significant interest in recent years. The focus has mainly been on determining model structures and algorithms for fitting experimental data, while the problem of obtaining suitable experimental data largely has been neglected. In this work we focus on the problem of systematically designing in vivo/in vitro experiments that will yield the information needed to determine both the structure and dynamics of biochemical networks. As a first approximation we consider linear dynamic models valid in a particular physiological state. We propose an iterative design strategy, where selection of the perturbation, sampling time and number of samples in each experiment is based on available partial information about the system, i.e. an ill-conditioned or rank deficient measurement matrix. Three different sources of such deficiency exist: (i) unidirectionality intrinsic to the system, due to moiety conservation or strongly correlated variables, (ii) fast dynamic modes and (iii) incomplete excitation of the system. The former two can be identified and �lifted out� of the measurement matrix, while the latter require additional experimental data. Our experiment design strategy endeavours in each step to provide information perpendicular to the existing one. When all directions of the state space, spanned by the gene network, are present in the measurements matrix, the design emphasizes those directions where the least information has been obtained. Existing optimum design strategies are based on maximization of some measure of the Fisher information matrix (FIM). An a priori model of the system is needed to determine the FIM and hence good prior knowledge of the system is essential. Otherwise the design will give slow convergence, corresponding to an excessive number of experiments. Our approach requires no prior information and its effectiveness is here demonstrated through identification of in silico networks previously proposed in the literature.},
author = {Nordling, Torbj{\"{o}}rn E M and Elling W Jacobsen},
booktitle = {The 7th International Conference on Systems Biology (ICSB-2006) in Yokohama (Japan): Abstract book},
file = {:Users/tn/Articles/Mendeley_collection/The 7th International Conference on Systems Biology (ICSB-2006) in Yokohama (Japan) Abstract book/Nordling, Jacobsen_2006.pdf:pdf},
keywords = {Experiment design,System identification,Systems Biology},
month = {oct},
publisher = {The Systems Biology Institute (SBI)},
title = {{Experiment design for optimal excitation of gene regulatory networks}},
year = {2006}
}

• T. E. M. Nordling and E. W. Jacobsen, “Iterative experimental design and identification of gene regulatory networks,” in The 13th nordic process control workshop (npcw’06) in lyngby (denmark): abstract book, Lyngby, Denmark, 2006, p. 27.

The reaction kinetics of gene and protein interaction networks are largely unknown. These networks are extensive and involve a large number of parameters, and it is in general difficult and costly to determine the parameters under in vivo conditions. In a particular state of the system, however, only some network interactions and underlying functions are active. In such situations a linear desccription will often be sufficient. In this work we therefore consider identification of linear dynamic models, desccribing the direct interactions between genes and proteins in biochemical reaction networks, based on time-series measurements of activities and concentrations. In theory, under mild controllability conditions, sufficient data for such models can be achieved in a single gene perturbation experiment for which the number of collected samples at least equals the number of network components. However, the combination of measurement uncertainty, nonlinearities and uneven excitation of the different network modes, usually calls for significantly more data in order to obtain a reasonable model of the network. In general a tradeoff will exist between a decreasing accuracy of the linear approximation and achieving a high signal-to-noise ratio. We here propose an iterative approach to experimental design and network identification. The data obtained from prior experiments are used to design the next experiment, with the aim of ensuring suitable excitation of all modes, while reducing the overall data requirement. Suitable excitation is, in this context, understood as achievement of a sufficient signal-to-noise ratio while avoiding nonlinear effects, so that a predefined accuracy can be met. We stress that, in order to keep the effects of nonlinearities at a minimum, excitations exceeding this level should in general be avoided. The estimation algorithm can be divvided into steps, where one state variable is selected for perturbation of an unknown magnitude in each step. The sampling time and the number of samples to be recorded are also selected in each step, after which an experiment is performed. The obtained information provides a first order quantitative desccription of the existing network interactions near the considered steady state. The proposed method is based on four assumptions: The system and the relationship between the state variables can be desccribed as a graph. Only the measured components are part of the modelled network. The unknown initial states of the system and the perturbations are not too far from the selected steady-state at which the Jacobian is estimated. The imposed perturbation stays constant over time. Thus the initial states, the magnitude of the perturbations and the network structure are assumed unknown a priori. Our discussion focus on how to select the next perturbation, sampling time and number of samples based on available partial information about the network structure. We also discuss possibilities to distinguish between if near singularity in the network Jacobian is caused by dynamics faster than the selected sampling time, strongly correlated variables, moiety conservation or poor excitation of some modes. The proposed approach is exemplified by identification of a small in silico gene regulatory network.

@inproceedings{Nordling2006NPCW,
abstract = {The reaction kinetics of gene and protein interaction networks are largely unknown. These networks are extensive and involve a large number of parameters, and it is in general difficult and costly to determine the parameters under in vivo conditions. In a particular state of the system, however, only some network interactions and underlying functions are active. In such situations a linear desccription will often be sufficient. In this work we therefore consider identification of linear dynamic models, desccribing the direct interactions between genes and proteins in biochemical reaction networks, based on time-series measurements of activities and concentrations. In theory, under mild controllability conditions, sufficient data for such models can be achieved in a single gene perturbation experiment for which the number of collected samples at least equals the number of network components. However, the combination of measurement uncertainty, nonlinearities and uneven excitation of the different network modes, usually calls for significantly more data in order to obtain a reasonable model of the network. In general a tradeoff will exist between a decreasing accuracy of the linear approximation and achieving a high signal-to-noise ratio. We here propose an iterative approach to experimental design and network identification. The data obtained from prior experiments are used to design the next experiment, with the aim of ensuring suitable excitation of all modes, while reducing the overall data requirement. Suitable excitation is, in this context, understood as achievement of a sufficient signal-to-noise ratio while avoiding nonlinear effects, so that a predefined accuracy can be met. We stress that, in order to keep the effects of nonlinearities at a minimum, excitations exceeding this level should in general be avoided. The estimation algorithm can be divvided into steps, where one state variable is selected for perturbation of an unknown magnitude in each step. The sampling time and the number of samples to be recorded are also selected in each step, after which an experiment is performed. The obtained information provides a first order quantitative desccription of the existing network interactions near the considered steady state. The proposed method is based on four assumptions: The system and the relationship between the state variables can be desccribed as a graph. Only the measured components are part of the modelled network. The unknown initial states of the system and the perturbations are not too far from the selected steady-state at which the Jacobian is estimated. The imposed perturbation stays constant over time. Thus the initial states, the magnitude of the perturbations and the network structure are assumed unknown a priori. Our discussion focus on how to select the next perturbation, sampling time and number of samples based on available partial information about the network structure. We also discuss possibilities to distinguish between if near singularity in the network Jacobian is caused by dynamics faster than the selected sampling time, strongly correlated variables, moiety conservation or poor excitation of some modes. The proposed approach is exemplified by identification of a small in silico gene regulatory network.},
annote = {Presentation},
author = {Nordling, Torbj{\"{o}}rn E M and Elling W Jacobsen},
booktitle = {The 13th Nordic Process Control Workshop (NPCW'06) in Lyngby (Denmark): Abstract book},
keywords = {Experiment design,System identification,Systems Biology},
month = {jan},
pages = {27},
publisher = {Department of Chemical Engineering, Technical University of Denmark},
title = {{Iterative experimental design and identification of gene regulatory networks}},
year = {2006}
}

• T. E. M. Nordling and E. W. Jacobsen, Experiment design for systematic excitation of gene regulatory networks, 2006.

Excitation, signal directionality and their role in experiment design for identification of gene regulatory networks are here studied from a control theoretic perspective. We limit our consideration to linear dynamic network models for a particular state of the system, e.g. the state of a healthy or disease cell. Rank deficiency of the measurement matrix, containing all measurement samples, constitutes a problem when inferring the ‘true’ system. We have identified three different sources of such deficiency: unidirectionality intrinsic to the system, fast dynamic modes and incomplete excitation of the system. It is conceptually important to distinguish and identify these cases, since they all show up as near singularities in the measurement matrix, but only the latter need to be dealt with through experiment design. Unidirectionality is caused by moiety conservations, i.e. existence of a strict algebraic relationship between some of the state variables. Dynamics faster than the sampling time also appear as (almost) linearly dependent variables. The near singularity can in the former cases be removed by accounting for the algebraic relationship and lifting out one variable for each algebraic relationship. The same is true for the latter case, unless the fast dynamics are considered important, in which case the sampling frequency needs to be increased. An experimental setup, i.e. a set of perturbations and measurements, which does not excite all directions of the system will also yield a near singular measurements matrix. This is for example common in gene regulatory networks consisting of weakly connected subsystems. The experiment design must make sure that all output directions are properly excited. The problem of excitation and directionality is here illustrated and studied for small in silico gene regulatory networks, both in the time and the frequency domain, using classical methods like singular value decomposition.

@misc{Nordling2006IWSB,
abstract = {Excitation, signal directionality and their role in experiment design for identification of gene regulatory networks are here studied from a control theoretic perspective. We limit our consideration to linear dynamic network models for a particular state of the system, e.g. the state of a healthy or disease cell. Rank deficiency of the measurement matrix, containing all measurement samples, constitutes a problem when inferring the 'true' system. We have identified three different sources of such deficiency: unidirectionality intrinsic to the system, fast dynamic modes and incomplete excitation of the system. It is conceptually important to distinguish and identify these cases, since they all show up as near singularities in the measurement matrix, but only the latter need to be dealt with through experiment design. Unidirectionality is caused by moiety conservations, i.e. existence of a strict algebraic relationship between some of the state variables. Dynamics faster than the sampling time also appear as (almost) linearly dependent variables. The near singularity can in the former cases be removed by accounting for the algebraic relationship and lifting out one variable for each algebraic relationship. The same is true for the latter case, unless the fast dynamics are considered important, in which case the sampling frequency needs to be increased. An experimental setup, i.e. a set of perturbations and measurements, which does not excite all directions of the system will also yield a near singular measurements matrix. This is for example common in gene regulatory networks consisting of weakly connected subsystems. The experiment design must make sure that all output directions are properly excited. The problem of excitation and directionality is here illustrated and studied for small in silico gene regulatory networks, both in the time and the frequency domain, using classical methods like singular value decomposition.},
author = {Nordling, Torbj{\"{o}}rn E M and Elling W Jacobsen},
booktitle = {International Workshop on Systems Biology in Maynooth (Ireland)},
file = {:Users/tn/Articles/Mendeley_collection/International Workshop on Systems Biology in Maynooth (Ireland)/Nordling, Jacobsen_2006.pdf:pdf},
howpublished = {International Workshop on Systems Biology in Maynooth (Ireland)},
keywords = {Experiment design,System identification,Systems Biology},
month = {jul},
title = {{Experiment design for systematic excitation of gene regulatory networks}},
year = {2006}
}

• E. W. Jacobsen and T. E. M. Nordling, On identification of genetic and metabolic networks, 2006.
[BibTeX]
@misc{Jacobsen2006ERNSI,
annote = {Presentation held by me instead of Elling Jacobsen.
author = {Elling W Jacobsen and Nordling, Torbj{\"{o}}rn E M},
booktitle = {The 15th ERNSI Workshop on System Identification in Link{\"{o}}ping (Sweden)},
howpublished = {15th ERNSI Workshop on System Identification in Link{{\"{o}}}ping (Sweden)},
month = {sep},
title = {{On identification of genetic and metabolic networks}},
year = {2006}
}

### 2005

• T. E. M. Nordling, J. Koljonen, J. Nyström, I. Bodén, B. Lindholm-Sethson, P. Geladi, and J. T. Alander, “Wavelength selection by genetic algorithms in near infrared spectra for melanoma diagnosis,” in Ifmbe proceedings, volume 11, 3rd european medical and biological engineering conference (embec’05) in prague (czech republic), Prague, Czech Republic, 2005.

Early, reliable and fast diagnosis of melanoma is particularly important as the number of cases is increasing. In this paper, the potential of using near infrared spectroscopy for melanoma diagnosis is studied. The classification task is complicated by a low signal-to-noise ratio and the high dimensionality of the spectral data. Thus pre-selection of wavelength variables is required. Atypical naevi samples of patients were clinically classified, using the ABCD rule, and their near infrared spectra recorded. A nonlinear clustering model for spectral based classification was calibrated to the spectra and pathologist?s classification using a genetic algorithm. The genetic algorithm optimized the spectral based classification by selecting wavelengths correlated to melanoma. Some wavelength selections allowed correct classification of all samples in our dataset. The small size of the dataset and uncertainty in the clinical classification, however, limit the conclusions that can be drawn. Evidence for the existence of spectral regions that contain information needed for melanoma diagnosis is presented.

@inproceedings{Nordling2005EMBEC,
abstract = {Early, reliable and fast diagnosis of melanoma is particularly important as the number of cases is increasing. In this paper, the potential of using near infrared spectroscopy for melanoma diagnosis is studied. The classification task is complicated by a low signal-to-noise ratio and the high dimensionality of the spectral data. Thus pre-selection of wavelength variables is required. Atypical naevi samples of patients were clinically classified, using the ABCD rule, and their near infrared spectra recorded. A nonlinear clustering model for spectral based classification was calibrated to the spectra and pathologist?s classification using a genetic algorithm. The genetic algorithm optimized the spectral based classification by selecting wavelengths correlated to melanoma. Some wavelength selections allowed correct classification of all samples in our dataset. The small size of the dataset and uncertainty in the clinical classification, however, limit the conclusions that can be drawn. Evidence for the existence of spectral regions that contain information needed for melanoma diagnosis is presented.},
author = {Nordling, Torbj{\"{o}}rn E M and Janne Koljonen and Nystr{\"{o}}m, Josefina and Bod{\'{e}}n, Ida and Lindholm-Sethson, Britta and Paul Geladi and Jarmo T Alander},
booktitle = {IFMBE Proceedings, Volume 11, 3rd European Medical and Biological Engineering Conference (EMBEC'05) in Prague (Czech Republic)},
file = {:Users/tn/Articles/Mendeley_collection/IFMBE Proceedings, Volume 11, 3rd European Medical and Biological Engineering Conference (EMBEC'05) in Prague (Czech Republic)/Nordling et al._2005.pdf:pdf},
issn = {1727-1983},
keywords = {Feature selection,Genetic Algorithms,Melanoma,Near-infrared spectrum,Variable selection},
month = {nov},
title = {{Wavelength selection by genetic algorithms in near infrared spectra for melanoma diagnosis}},
url = {ftp://ftp.uwasa.fi/cs/report05-4/EMBEC2005.pdf},
year = {2005}
}

• T. E. M. Nordling, “Issues on modelling of large-scale cellular regulatory networks,” Master’s degree project Master Thesis, Department of Numerical Analysis and Computer Science (NADA), Royal Institute of Technology (KTH), SE-10044 Stockholm, Sweden, 2005.

We have identified flexible exchange and storage of biochemical interaction data in databases, together with prolonged investment in different existing and future modelling formalisms as key issues in successful understanding of the regulatory network responsible for the connection between geno- and phenotype. This pilot study of modelling of large-scale regulatory networks starts with a medically motivated interesting question from molecular cell biology: Is enforced expression of Cdc6, activation of Cdk4/6 and Cdk2 sufficient for anchorage-independent entry of the S phase of the cell cycle? We try to construct a model for answering this question, in such a way that we can reveal obstacles for large-scale regulatory modelling, discuss their implications and possible solutions. Our model is based on 1447 reactions and contains 1343 different molecules. We used graph theory to study its topology and made the following findings: The network is scale-free and decays as a power-law, as could be expected based on earlier works. The network consists of one huge well-connected cluster. It cannot be modularised into strong components or blocks in a useful way, since we get one big component or block containing a majority of all molecules and more than a hundred tiny components or blocks with one or a few molecules. Our network does not agree with a hierarchical network model consisting of blocks linked by cut-vertices.

@mastersthesis{Nordling:2005,
abstract = {We have identified flexible exchange and storage of biochemical interaction data in databases, together with prolonged investment in different existing and future modelling formalisms as key issues in successful understanding of the regulatory network responsible for the connection between geno- and phenotype. This pilot study of modelling of large-scale regulatory networks starts with a medically motivated interesting question from molecular cell biology: Is enforced expression of Cdc6, activation of Cdk4/6 and Cdk2 sufficient for anchorage-independent entry of the S phase of the cell cycle? We try to construct a model for answering this question, in such a way that we can reveal obstacles for large-scale regulatory modelling, discuss their implications and possible solutions. Our model is based on 1447 reactions and contains 1343 different molecules. We used graph theory to study its topology and made the following findings: The network is scale-free and decays as a power-law, as could be expected based on earlier works. The network consists of one huge well-connected cluster. It cannot be modularised into strong components or blocks in a useful way, since we get one big component or block containing a majority of all molecules and more than a hundred tiny components or blocks with one or a few molecules. Our network does not agree with a hierarchical network model consisting of blocks linked by cut-vertices.},
address = {Department of Numerical Analysis and Computer Science (NADA), Royal Institute of Technology (KTH), SE-10044 Stockholm, Sweden},
author = {Nordling, Torbj{\"{o}}rn E M},
file = {:Users/tn/Articles/Mendeley_collection/Unknown/Nordling_2005.pdf:pdf},
keywords = {Modelling},
month = {aug},
school = {The Royal Institute of Technology (KTH)},
title = {{Issues on modelling of large-scale cellular regulatory networks}},
type = {Master's degree project},
url = {http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4182},
year = {2005}
}

### 2004

• T. E. M. Nordling, J. Koljonen, J. T. Alander, and P. Geladi, “Genetic Algorithms as a Tool for Wavelength Selection,” in Proceedings of the 11th finnish artificial intelligence conference (step 2004) in vantaa (finland), volume 3, Vantaa, Finland, 2004, p. 99–113.

This work is a careful implementation of a genetic algorithm (GA) for pre-selection of wavelengths, combined with partial least squares regression (PLS) for modelling of near-infrared (NIR) data. We show that NIR spectro- metry can be used for concentration measurements when background noise has not been limited and no chemical properties of the substances are known. We use an alternative brute force approach working for any convergent GA. It works by generating many solutions, preferably by using different GA parameters, and then constructing the final solution by only including variables found in a majority of all solutions. The proposed method is based on three assumptions: existence of data, varying measurement noise and selection of data wavelengths that are more fruitful than noise.

@inproceedings{Nordling2004STEP,
abstract = {This work is a careful implementation of a genetic algorithm (GA) for pre-selection of wavelengths, combined with partial least squares regression (PLS) for modelling of near-infrared (NIR) data. We show that NIR spectro- metry can be used for concentration measurements when background noise has not been limited and no chemical properties of the substances are known. We use an alternative brute force approach working for any convergent GA. It works by generating many solutions, preferably by using different GA parameters, and then constructing the final solution by only including variables found in a majority of all solutions. The proposed method is based on three assumptions: existence of data, varying measurement noise and selection of data wavelengths that are more fruitful than noise.},
author = {Nordling, Torbj{\"{o}}rn E M and Janne Koljonen and Jarmo T Alander and Paul Geladi},
booktitle = {Proceedings of the 11th Finnish Artificial Intelligence Conference (STeP 2004) in Vantaa (Finland), Volume 3},
editor = {Jarmo T Alander and Ala-Siuru, Pekka and Hy{\"{o}}tyniemi, Heikki},
file = {:Users/tn/Articles/Mendeley_collection/Proceedings of the 11th Finnish Artificial Intelligence Conference (STeP 2004) in Vantaa (Finland), Volume 3/Nordling et al._2004.pdf:pdf;:Users/tn/Articles/Mendeley_collection/Proceedings of the 11th Finnish Artificial Intelligence Conference (STeP 2004) in Vantaa (Finland), Volume 3/Nordling et al._2004.pdf:pdf},
keywords = {Feature selection,Genetic Algorithms,Near-infrared spectrum,Partial least squares,Variable selection},
month = {sep},
pages = {99--113},
publisher = {Finnish Artificial Intelligence Society (FAIS)},
title = {{Genetic Algorithms as a Tool for Wavelength Selection}},
url = {ftp://ftp.uwasa.fi/cs/report04-2/GAWavelengthSelection.ps},
volume = {3},
year = {2004}
}

### 2003

• N. Hiroi, A. Funahashi, T. E. M. Nordling, and H. Kitano, “In silico analysis for anchorage independent cell cycle start mechanisms,” in The 4th international conference on systems biology (icsb-2003) in st. louis (usa): abstract book, St. Louis, USA, 2003.
[BibTeX] [Abstract]

Extended abstract.

@inproceedings{Hiroi2003ICSB,
abstract = {Extended abstract.},
}