Denna sida på svenska

Rolf Sundberg

Denna sida på svenska

Contact

Name and title: Rolf Sundberg

rolfs@math.su.se

Orcid

0000-0002-4453-7403

Workplace: Department of Mathematics (incl. Math. Statistics)

Visiting address

Room 317

Albano hus 1

Postal address

Matematiska institutionen

106 91 Stockholm

Research group

Mathematical statistics

We focus on applications in biostochastics and biostatistics, discrete random structures, and finance and insurance.

Links

Older personal web-page

About me

I am professor of Mathematical statistics (Statistical science) at Stockholm University. I am formally retired from October 2009, but was re-employed part-time for many years, to give one or two courses per year. From 2020 onwards my involvement is much smaller. In recent years I have also been involved in some reasearch with other SU scientists.

My scientific interests

My interests are in statistical modelling and statistical methods, both theory and applications.

Fields of active research or particular competence are:

Inferential principles,
Exponential families, theory and applications, (book published 2019)
Climate models and paleoclimate statistics,
Latent factor analysis (e.g. article published 2016)
Biostatistics, in particular for molecular biology
Chemometrics (regression and multivariate calibration),
Sampling survey inference (in particular model-based inference)
Statistical methods in stereology,
Use of experimental design
Applied statistics in statistical consulting. (annual course for many years)

Courses given

2004, autumn, I gave a course on Statistical theory for exponential families.

2005, spring teerm, I taught Linear statistical models and Statistical consulting methodology (this time in English). I also led a course on Statistics for microarrays.

2006, spring term, I taught Linear statistical models and Principles of statistical inference (graduate level; based on a book manuscript by David Cox).

In the period 2007 to 2012, I gave two courses per year on Master/Ph.D. level:
Statistical models, based on my own lecture notes on parametric statistical inference and exponential families (2019 as textbook from Cambridge University Press), and
Statistical Consulting, involving clients and projects from the departmental consulting service.

2015, spring, I was responsible for a study group of Ph.D. students on the topic of sampling theory.

Febr. 2019 I gave the main part of an intensive 3-days course on statistical inference theory for Ph.D students in statistics, with emphasis on exponential family models inference and more general frequentist parametric inference.

2019 was the last of more than 20 successive years that I gave the statistical consulting course. In the first pandemic year, 2020, there was no course, and since 2021 Jan-Olov Persson has taken over the responsibility, and I have only given some guest lectures.

2012, 2013 and 2021 I gave part of the Linear statistical models course. 2021 I added a chapter on time series analysis to the course and its Lecture notes. Each year I have also given one or two guest lectures for this course.

Some recent talks

Sept. 2015 I gave a talk at the Past Earth Network (PEN) Conference in Crewe, England, jointly with Anders Moberg (Bolin Centre): Statistical framework for evaluation of climate model simulations by use of climate proxy data.

Dec. 2015 I gave a talk for master students i Statistics, SU: Statistical and other models for palaeo-climate research.

June 2016 I gave a talk for students participating in the Research Academy for Young Scientists (RAYS) workshop, Strängnäs.

Dec. 2016 I gave most part of an intensive 3-days course on statistical inference theory for Ph.D students in statistics, with emphasis on exponential family models and on hypothesis testing.

17 May 2017 I gave a talk at my dept: "Shaved dice" inference — Two contrasting points of view of a simple situation.

23 Oct. 2017 I gave an invited talk: "Statistical inference when dimension (much) exceeds sample size", in a conference at KTH on high-dimensional data and big data.

11 Jan. 2023 I gave a talk at my dept: Harald Cramér, Willy Feller, and the 'Institute' – between two world wars

Recent Ph.D. students under my supervision,

and their areas of research:

Marie Linder (Ph.D. dissertation 15 Jan 1999):
(Bilinear regression and second order calibration)

Anders Björkström (Licentiate exam 1998, PhD dissertation 28 Sept 2007):
(Generalized ridge regression and other regression methods for near-collinear data)

Niklas Norén (Licentiate exam 2005, PhD dissertation 7 May 2007):
(Searching in databases for information on side effects of medications; co-advisor Ralph Edwards)

Anna Stoltenberg (Licentiate exam 23 Sept. 2009):
(Statistical analysis of ordered categorical data in pharmaceutical trials; co-advisor Olivier Guilbaud, AstraZeneca)

Jelena Bojarova (Licentiate exam 2004, PhD dissertation 4 June 2010):
(Toward sequential data assimilation for NWP models using Kalman filter tools)

Ekaterina Fetisova (Licentiate exam 2015, PhD dissertation 12 Dec. 2017):
(Statistical modelling in palaeo-climatology; I was co-advisor)

Contact information

E-mail address: rolfs at math.su.se, or rolfsundberg1942 at telia.com
Older webpage at http://staff.math.su.se/rolfs/ including CV and Publication list with links
I can also be found on Researchgate

Teaching

Academic year 2017–2018 l will give the course on Statistical Consulting.

Research

Research activities 2015-2017

Palaeo-climate research: Inference about palaeo-climate simulation models by comparison with instrumental and proxy climate data. In particular, I'm one of the authors for a group of three papers published in Climate of the Past, 2012 (2) and 2015 (first author of Part 1 - Theory):
Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium — Parts 1-3 Open access

Paper with Uwe Feldmann (Saarland Univ.) on factor analysis published June 2016 in Journal of Multivariate Analysis, vol. 148, pp 49–59: Exploratory factor analysis – Parameter estimation and scores prediction with high-dimensional data. Open access

Exponential families: I'm writing on a monograph for Cambridge University Press: Statistical modelling by exponential families. A much overlapping manuscript exists as Lecture notes for the course Statistical models, last version Nov. 2016. An accepted journal manuscript for The American Statistician also belongs to this area: A note on ''shaved dice'' inference (to appear in The American Statistician 2018).

Here is a paper where my main role was the analysis of designed experiments:
Sara Gummesson, et al: Lithic raw material economy in the mesolithic: An experimental test of edged tool efficiency and durability in bone tool production. Published in Lithic Technology, Vol. 42:4, 2017.

Publications

A selection from Stockholm University publication database

Statistical Modelling by Exponential Families

2019. Rolf Sundberg.

Book

This book is a readable, digestible introduction to exponential families, encompassing statistical models based on the most useful distributions in statistical theory, including the normal, gamma, binomial, Poisson, and negative binomial. Strongly motivated by applications, it presents the essential theory and then demonstrates the theory's practical potential by connecting it with developments in areas like item response analysis, social network models, conditional independence and latent variable structures, and point process models. Extensions to incomplete data models and generalized linear models are also included. In addition, the author gives a concise account of the philosophy of Per Martin-Löf in order to connect statistical modelling with ideas in statistical physics, including Boltzmann's law. Written for graduate students and researchers with a background in basic statistical inference, the book includes a vast set of examples demonstrating models for applications and exercises embedded within the text as well as at the ends of chapters.

Read more about Statistical Modelling by Exponential Families
Subjectivity (Re)visited: A Corpus Study of English Forward Causal Connectives in Different Domains of Spoken and Written Language

2021. Marta Andersson, Rolf Sundberg. Discourse processes 58 (3), 260-292

Article

Through a structured examination of four English causal discourse connectives, our article tackles a gap in the existing research, which focuses mainly on written language production, and entirely lacks attests on English spoken discourse. Given the alleged general nature of English connectives commonly emphasized in the literature, the underlying question of our investigation is the potential role of the connective phrases in marking the basic conceptual distinction between objective and subjective causal event types. To this end, our study combines a traditional corpus analysis with 'predictive' statistical modeling for subjectivity variables to investigate whether and how the tendencies found in the corpus depend on the systematic preferences of the language user to encode subjectivity via a discourse connective. Our findings suggest that while certain conceptual structures are quite fundamental to the usages of English connectives, the connectives per se do not seem to have a steady part in categorization of causal events. Rather, their role pertains to the level of intended explicitness bound to specific rhetorical purposes and contexts of use.

Read more about Subjectivity (Re)visited
A Note on Shaved Dice Inference

2018. Rolf Sundberg. American Statistician 72 (2), 155-157

Article

Two dice are rolled repeatedly, only their sum is registered. Have the two dice been shaved, so two of the six sides appear more frequently? Pavlides and Perlman discussed this somewhat complicated type of situation through curved exponential families. Here, we contrast their approach by regarding data as incomplete data from a simple exponential family. The latter, supplementary approach is in some respects simpler, it provides additional insight about the relationships among the likelihood equation, the Fisher information, and the EM algorithm, and it illustrates the information content in ancillary statistics.

Read more about A Note on Shaved Dice Inference
Lithic Raw Material Economy in the Mesolithic

2017. Sara Gummesson (et al.).

Article

The foundation of this paper is lithic economy with a focus on the actual use of different lithic raw materials for tasks at hand. Our specific focus is on the production of bone tools during the Mesolithic. The lithic and osseous assemblages from Strandvägen, Motala, in east-central Sweden provide the archaeological background for the study. Based on a series of experiments we evaluate the efficiency and durability of different tool edges of five lithic raw materials: Cambrian flint, Cretaceous flint, mylonitic quartz, quartz, and porphyry, each used to whittle bone. The results show that flint is the most efficient of the raw materials assessed. Thus, a non-local raw material offers complements of functional characteristics for bone working compared to locally available quartz and mylonitic quartz. This finding provides a new insight into lithic raw material distribution in the region, specifically for bone tool production on site.

Read more about Lithic Raw Material Economy in the Mesolithic
Exploratory factor analysis-Parameter estimation and scores prediction with high-dimensional data

2016. Rolf Sundberg, Uwe Feldmann. Journal of Multivariate Analysis 148, 49-59

Article

In an approach aiming at high-dimensional situations, we first introduce a distribution-free approach to parameter estimation in the standard random factor model, that is shown to lead to the same estimating equations as maximum likelihood estimation under normality. The derivation is considerably simpler, and works equally well in the case of more variables than observations (p > n). We next concentrate on the latter case and show results of type: Albeit factor, loadings and specific variances cannot be precisely estimated unless n is large, this is not needed for the factor scores to be precise, but only that p is large; A classical fixed point iteration method can be expected to converge safely and rapidly, provided p is large. A microarray data set, with p = 2000 and n = 22, is used to illustrate this theoretical result.

Read more about Exploratory factor analysis-Parameter estimation and scores prediction with high-dimensional data
Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium - Part 3

2015. Anders Moberg (et al.). Climate of the Past 11 (3), 425-448

Article

A statistical framework for evaluation of climate model simulations by comparison with climate observations from instrumental and proxy data (part 1 in this series) is improved by the relaxation of two assumptions. This allows autocorrelation in the statistical model for simulated internal climate variability and enables direct comparison of two alternative forced simulations to test whether one fits the observations significantly better than the other. The extended framework is applied to a set of simulations driven with forcings for the pre-industrial period 1000-1849 CE and 15 tree-ring-based temperature proxy series. Simulations run with only one external forcing (land use, volcanic, small-amplitude solar, or large-amplitude solar) do not significantly capture the variability in the tree-ring data - although the simulation with volcanic forcing does so for some experiment settings. When all forcings are combined (using either the small- or large-amplitude solar forcing), including also orbital, greenhouse-gas and non-volcanic aerosol forcing, and additionally used to produce small simulation ensembles starting from slightly different initial ocean conditions, the resulting simulations are highly capable of capturing some observed variability. Nevertheless, for some choices in the experiment design, they are not significantly closer to the observations than when unforced simulations are used, due to highly variable results between regions. It is also not possible to tell whether the small-amplitude or large-amplitude solar forcing causes the multiple-forcing simulations to be closer to the reconstructed temperature variability. Proxy data from more regions and of more types, or representing larger regions and complementary seasons, are apparently needed for more conclusive results from model-data comparisons in the last millennium.

Read more about Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium - Part 3
Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium – Part 2

2012. Alistair Hind, Anders Moberg, Rolf Sundberg. Climate of the Past 8 (4), 1355-1365

Article

The statistical framework of Part 1 (Sundberg et al., 2012), for comparing ensemble simulation surface temperature output with temperature proxy and instrumental records, is implemented in a pseudo-proxy experiment. A set of previously published millennial forced simulations (Max Planck Institute – COSMOS), including both "low" and "high" solar radiative forcing histories together with other important forcings, was used to define "true" target temperatures as well as pseudo-proxy and pseudo-instrumental series. In a global land-only experiment, using annual mean temperatures at a 30-yr time resolution with realistic proxy noise levels, it was found that the low and high solar full-forcing simulations could be distinguished. In an additional experiment, where pseudo-proxies were created to reflect a current set of proxy locations and noise levels, the low and high solar forcing simulations could only be distinguished when the latter served as targets. To improve detectability of the low solar simulations, increasing the signal-to-noise ratio in local temperature proxies was more efficient than increasing the spatial coverage of the proxy network. The experiences gained here will be of guidance when these methods are applied to real proxy and instrumental data, for example when the aim is to distinguish which of the alternative solar forcing histories is most compatible with the observed/reconstructed climate.

Read more about Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium – Part 2
Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium – Part 1: Theory

2012. Rolf Sundberg, Anders Moberg, Alistair Hind. Climate of the Past 8 (4), 1339-1353

Article

A statistical framework for comparing the output of ensemble simulations from global climate models with networks of climate proxy and instrumental records has been developed, focusing on near-surface temperatures for the last millennium. This framework includes the formulation of a joint statistical model for proxy data, instrumental data and simulation data, which is used to optimize a quadratic distance measure for ranking climate model simulations. An essential underlying assumption is that the simulations and the proxy/instrumental series have a shared component of variability that is due to temporal changes in external forcing, such as volcanic aerosol load, solar irradiance or greenhouse gas concentrations. Two statistical tests have been formulated. Firstly, a preliminary test establishes whether a significant temporal correlation exists between instrumental/proxy and simulation data. Secondly, the distance measure is expressed in the form of a test statistic of whether a forced simulation is closer to the instrumental/proxy series than unforced simulations. The proposed framework allows any number of proxy locations to be used jointly, with different seasons, record lengths and statistical precision. The goal is to objectively rank several competing climate model simulations (e.g. with alternative model parameterizations or alternative forcing histories) by means of their goodness of fit to the unobservable true past climate variations, as estimated from noisy proxy data and instrumental observations.

Read more about Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium – Part 1: Theory
Flat and multimodal likelihoods and model lack of fit in curved exponential families

2010. Rolf Sundberg. Scandinavian Journal of Statistics 37 (4), 632-643

Article

It is well known that curved exponential families can have multimodal likelihoods. We investigate the relationship between flat or multimodal likelihoods and model lack of fit, the latter measured by the score (Rao) test statistic of the curved model as embedded in the corresponding full model. When data yield a locally flat or convex likelihood (root of multiplicity >1, terrace point, saddle point, local minimum), we provide a formula for in such points, or a lower bound for it. The formula is related to the statistical curvature of the model, and it depends on the amount of Fisher information. We use three models as examples, including the Behrens-Fisher model, to see how a flat likelihood, etc. by itself can indicate a bad fit of the model. The results are related (dual) to classical results by Efron from 1978.

Read more about Flat and multimodal likelihoods and model lack of fit in curved exponential families
Non-Gaussian state space models in decomposition of ice core time series in long and short time-scales

2010. Jelena Bojarova, Rolf Sundberg. Environmetrics 21 (6), 562-587

Article

Statistical modelling of six time series of geological ice core chemical data from Greenland is discussed. We decompose the total variation into long time-scale (trend) and short time-scale variations (fluctuations around the trend), and a pure noise component. Too heavy tails of the short-term variation makes a standard time-invariant linear Gaussian model inadequate. We try non-Gaussian state space models, which can be efficiently approximated by time-dependent Gaussian models. In essence, these time-dependent Gaussian models result in a local smoothing, in contrast to the global smoothing provided by the time-invariant model. To describe the mechanism of this local smoothing, we utilise the concept of a local variance function derived from a heavy-tailed density. The time-dependent error variance expresses the uncertainty about the dynamical development of the model state, and it controls the influence of observations on the estimates of the model state components. The great advantage of the derived time-dependent Gaussian model is that the Kalman filter and the Kalman smoother can be used as efficient computational tools for performing the variation decomposition. One of the main objectives of the study is to investigate how the distributional assumption on the model error component of the short time-scale variation affects the decomposition.

Read more about Non-Gaussian state space models in decomposition of ice core time series in long and short time-scales
A classical dataset from Williams, and its role in the study of supersaturated designs.

2008. Rolf Sundberg. Journal of Chemometrics 22, 436-440

Article

A Plackett–Burman type dataset from a paper by Williams (1968), with 28 observations and 24 two-level factors, has become a standard dataset for illustrating construction (by halving) of supersaturated designs (SSDs) and for a corresponding data analysis. The aim here is to point out that for several reasons this is an unfortunate situation. The original paper by Williams contains several errors and misprints. Some are in the design matrix, which will here be reconstructed, but worse is an outlier in the response values, which can be observed when data are plotted against the dominating factor. In addition, the data should better be analysed on log-scale than on original scale. The implications of the outlier for SSD analysis are drastic, and it will be concluded that the data should be used for this purpose only if the outlier is properly treated (omitted or modified).

Read more about A classical dataset from Williams, and its role in the study of supersaturated designs.
A statistical methodology for drug–drug interaction surveillance

2008. G. Niklas Norén (et al.). Statistics in Medicine 27 (16), 3057-3070

Article

Interaction between drug substances may yield excessive risk of adverse drug reactions (ADRs) when two drugs are taken in combination. Collections of individual case safety reports (ICSRs) related to suspected ADR incidents in clinical practice have proven to be very useful in post-marketing surveillance for pairwise drug–ADR associations, but have yet to reach their full potential for drug–drug interaction surveillance. In this paper, we implement and evaluate a shrinkage observed-to-expected ratio for exploratory analysis of suspected drug–drug interaction in ICSR data, based on comparison with an additive risk model. We argue that the limited success of previously proposed methods for drug–drug interaction detection based on ICSR data may be due to an underlying assumption that the absence of interaction is equivalent to having multiplicative risk factors. We provide empirical examples of established drug–drug interaction highlighted with our proposed approach that go undetected with logistic regression. A database wide screen for suspected drug–drug interaction in the entire WHO database is carried out to demonstrate the feasibility of the proposed approach. As always in the analysis of ICSRs, the clinical validity of hypotheses raised with the proposed method must be further reviewed and evaluated by subject matter experts.

Read more about A statistical methodology for drug–drug interaction surveillance
Multigene analysis can discriminate between ulcerative colitis, Crohn's disease and irritable bowel syndrome

2008. Petra von Stein, Jan-Olov Persson, Rolf Sundberg. Gastroenterology 134 (7), 1869-1881

Article

Read more about Multigene analysis can discriminate between ulcerative colitis, Crohn's disease and irritable bowel syndrome

Show all publications by Rolf Sundberg at Stockholm University

Edit the profile

Rolf Sundberg

About me

My scientific interests

Courses given

Some recent talks

Recent Ph.D. students under my supervision,

Contact information

Teaching

Research

Research activities 2015-2017

Publications

Statistical Modelling by Exponential Families

Subjectivity (Re)visited: A Corpus Study of English Forward Causal Connectives in Different Domains of Spoken and Written Language

A Note on Shaved Dice Inference

Lithic Raw Material Economy in the Mesolithic

Exploratory factor analysis-Parameter estimation and scores prediction with high-dimensional data

Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium - Part 3

Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium – Part 2

Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium – Part 1: Theory

Flat and multimodal likelihoods and model lack of fit in curved exponential families

Non-Gaussian state space models in decomposition of ice core time series in long and short time-scales

A classical dataset from Williams, and its role in the study of supersaturated designs.

A statistical methodology for drug–drug interaction surveillance

Multigene analysis can discriminate between ulcerative colitis, Crohn's disease and irritable bowel syndrome