Stockholms universitet

Rolf SundbergProfessor, emeritus

Om mig

Professor emeritus i matematisk statistik vid Stockholms universitet, där jag blev anställd första gången 1962 (som amanuens i matematik). Jag disputerade vid SU 1972, och blev därför promoverad till jubeldoktor 2022 (efter 50 år). Efter bland annat ett decennium som lektor vid KTH, så återvände jag till SU. Där har jag haft docenttjänst, varit vikarierande professor, lektor, biträdande professor, och professor.

I grundutbildningen har jag under de senaste decennierna i huvudsak haft tre återkommande kurser, som jag också själv utformat: Linjära statistiska modeller, Statistiska modeller (för master- & PhD-nivå), samt Statistisk konsultmetodik (dito).

Jag fick 2017 utmärkelsen "Årets Statistikfrämjare" av Svenska Statistikfrämjandet.
I samband med detta höll jag ett föredrag som kom att ligga till grund för artikeln "Minnen och meningar" i Statistikfrämjandets tidskrift Qvintensen 2017:2.

2020: Uppdrag som sakkunnig i ärende rörande befordran till professor, UU, och i ärende rörande adjungerad lektor, Chalmers.

Fr o m 2020 är jag en av bedömarna av ansökningar om ackreditering som statistiker (FENStatS)

Under pandemiåret 2020 hade jag nästan ingen undervisning, men ht 2021 bidrog jag med ett vikarieinhopp i Linjära statistiska modeller och litet i Statistisk konsultmetodik. 

2022: Ett sakkunniguppdrag vid Karolinska institutet, och gästföreläsningar i Linjära statistiska modeller och i Statistisk konsultmetodik.

 

För beskrivning av min vetenskapliga profil, se den engelsk-språkiga sidan.
 

 

 

 

Publikationer

Några publikationer på svenska sedan 2013 (se Diva):

Lineära Statistiska modeller. Kompendium.  Matem. inst., SU 2021

Willy Feller vid Stockholms högskola, 1934–1939: En gigant inom sannolikhetsteorin på svensk mark. (tills. m. Jesper Rydén)  Qvintensen 2021

Statistiska forskningsgruppen SFG under 70 år, 1948–2018.   Matem. inst., SU 2018

Historik över avdelningen för matematisk statistik, SU: with English summary; Matem. inst., SU 2013

Minnen och meningar. Qvintensen 2017

Kan man lära till statistisk konsult på en kurs? Qvintensen 2013

I urval från Stockholms universitets publikationsdatabas

  • Statistical Modelling by Exponential Families

    2019. Rolf Sundberg.

    Bok

    This book is a readable, digestible introduction to exponential families, encompassing statistical models based on the most useful distributions in statistical theory, including the normal, gamma, binomial, Poisson, and negative binomial. Strongly motivated by applications, it presents the essential theory and then demonstrates the theory's practical potential by connecting it with developments in areas like item response analysis, social network models, conditional independence and latent variable structures, and point process models. Extensions to incomplete data models and generalized linear models are also included. In addition, the author gives a concise account of the philosophy of Per Martin-Löf in order to connect statistical modelling with ideas in statistical physics, including Boltzmann's law. Written for graduate students and researchers with a background in basic statistical inference, the book includes a vast set of examples demonstrating models for applications and exercises embedded within the text as well as at the ends of chapters.

    Läs mer om Statistical Modelling by Exponential Families
  • Subjectivity (Re)visited: A Corpus Study of English Forward Causal Connectives in Different Domains of Spoken and Written Language

    2021. Marta Andersson, Rolf Sundberg. Discourse processes 58 (3), 260-292

    Artikel

    Through a structured examination of four English causal discourse connectives, our article tackles a gap in the existing research, which focuses mainly on written language production, and entirely lacks attests on English spoken discourse. Given the alleged general nature of English connectives commonly emphasized in the literature, the underlying question of our investigation is the potential role of the connective phrases in marking the basic conceptual distinction between objective and subjective causal event types. To this end, our study combines a traditional corpus analysis with 'predictive' statistical modeling for subjectivity variables to investigate whether and how the tendencies found in the corpus depend on the systematic preferences of the language user to encode subjectivity via a discourse connective. Our findings suggest that while certain conceptual structures are quite fundamental to the usages of English connectives, the connectives per se do not seem to have a steady part in categorization of causal events. Rather, their role pertains to the level of intended explicitness bound to specific rhetorical purposes and contexts of use.

    Läs mer om Subjectivity (Re)visited
  • A Note on Shaved Dice Inference

    2018. Rolf Sundberg. American Statistician 72 (2), 155-157

    Artikel

    Two dice are rolled repeatedly, only their sum is registered. Have the two dice been shaved, so two of the six sides appear more frequently? Pavlides and Perlman discussed this somewhat complicated type of situation through curved exponential families. Here, we contrast their approach by regarding data as incomplete data from a simple exponential family. The latter, supplementary approach is in some respects simpler, it provides additional insight about the relationships among the likelihood equation, the Fisher information, and the EM algorithm, and it illustrates the information content in ancillary statistics.

    Läs mer om A Note on Shaved Dice Inference
  • Lithic Raw Material Economy in the Mesolithic

    2017. Sara Gummesson (et al.).

    Artikel

    The foundation of this paper is lithic economy with a focus on the actual use of different lithic raw materials for tasks at hand. Our specific focus is on the production of bone tools during the Mesolithic. The lithic and osseous assemblages from Strandvägen, Motala, in east-central Sweden provide the archaeological background for the study. Based on a series of experiments we evaluate the efficiency and durability of different tool edges of five lithic raw materials: Cambrian flint, Cretaceous flint, mylonitic quartz, quartz, and porphyry, each used to whittle bone. The results show that flint is the most efficient of the raw materials assessed. Thus, a non-local raw material offers complements of functional characteristics for bone working compared to locally available quartz and mylonitic quartz. This finding provides a new insight into lithic raw material distribution in the region, specifically for bone tool production on site.

    Läs mer om Lithic Raw Material Economy in the Mesolithic
  • Exploratory factor analysis-Parameter estimation and scores prediction with high-dimensional data

    2016. Rolf Sundberg, Uwe Feldmann. Journal of Multivariate Analysis 148, 49-59

    Artikel

    In an approach aiming at high-dimensional situations, we first introduce a distribution-free approach to parameter estimation in the standard random factor model, that is shown to lead to the same estimating equations as maximum likelihood estimation under normality. The derivation is considerably simpler, and works equally well in the case of more variables than observations (p > n). We next concentrate on the latter case and show results of type: Albeit factor, loadings and specific variances cannot be precisely estimated unless n is large, this is not needed for the factor scores to be precise, but only that p is large; A classical fixed point iteration method can be expected to converge safely and rapidly, provided p is large. A microarray data set, with p = 2000 and n = 22, is used to illustrate this theoretical result.

    Läs mer om Exploratory factor analysis-Parameter estimation and scores prediction with high-dimensional data
  • Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium - Part 3

    2015. Anders Moberg (et al.). Climate of the Past 11 (3), 425-448

    Artikel

    A statistical framework for evaluation of climate model simulations by comparison with climate observations from instrumental and proxy data (part 1 in this series) is improved by the relaxation of two assumptions. This allows autocorrelation in the statistical model for simulated internal climate variability and enables direct comparison of two alternative forced simulations to test whether one fits the observations significantly better than the other. The extended framework is applied to a set of simulations driven with forcings for the pre-industrial period 1000-1849 CE and 15 tree-ring-based temperature proxy series. Simulations run with only one external forcing (land use, volcanic, small-amplitude solar, or large-amplitude solar) do not significantly capture the variability in the tree-ring data - although the simulation with volcanic forcing does so for some experiment settings. When all forcings are combined (using either the small- or large-amplitude solar forcing), including also orbital, greenhouse-gas and non-volcanic aerosol forcing, and additionally used to produce small simulation ensembles starting from slightly different initial ocean conditions, the resulting simulations are highly capable of capturing some observed variability. Nevertheless, for some choices in the experiment design, they are not significantly closer to the observations than when unforced simulations are used, due to highly variable results between regions. It is also not possible to tell whether the small-amplitude or large-amplitude solar forcing causes the multiple-forcing simulations to be closer to the reconstructed temperature variability. Proxy data from more regions and of more types, or representing larger regions and complementary seasons, are apparently needed for more conclusive results from model-data comparisons in the last millennium.

    Läs mer om Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium - Part 3
  • Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium – Part 2

    2012. Alistair Hind, Anders Moberg, Rolf Sundberg. Climate of the Past 8 (4), 1355-1365

    Artikel

    The statistical framework of Part 1 (Sundberg et al., 2012), for comparing ensemble simulation surface temperature output with temperature proxy and instrumental records, is implemented in a pseudo-proxy experiment. A set of previously published millennial forced simulations (Max Planck Institute – COSMOS), including both "low" and "high" solar radiative forcing histories together with other important forcings, was used to define "true" target temperatures as well as pseudo-proxy and pseudo-instrumental series. In a global land-only experiment, using annual mean temperatures at a 30-yr time resolution with realistic proxy noise levels, it was found that the low and high solar full-forcing simulations could be distinguished. In an additional experiment, where pseudo-proxies were created to reflect a current set of proxy locations and noise levels, the low and high solar forcing simulations could only be distinguished when the latter served as targets. To improve detectability of the low solar simulations, increasing the signal-to-noise ratio in local temperature proxies was more efficient than increasing the spatial coverage of the proxy network. The experiences gained here will be of guidance when these methods are applied to real proxy and instrumental data, for example when the aim is to distinguish which of the alternative solar forcing histories is most compatible with the observed/reconstructed climate.

    Läs mer om Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium – Part 2
  • Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium – Part 1: Theory

    2012. Rolf Sundberg, Anders Moberg, Alistair Hind. Climate of the Past 8 (4), 1339-1353

    Artikel

    A statistical framework for comparing the output of ensemble simulations from global climate models with networks of climate proxy and instrumental records has been developed, focusing on near-surface temperatures for the last millennium. This framework includes the formulation of a joint statistical model for proxy data, instrumental data and simulation data, which is used to optimize a quadratic distance measure for ranking climate model simulations. An essential underlying assumption is that the simulations and the proxy/instrumental series have a shared component of variability that is due to temporal changes in external forcing, such as volcanic aerosol load, solar irradiance or greenhouse gas concentrations. Two statistical tests have been formulated. Firstly, a preliminary test establishes whether a significant temporal correlation exists between instrumental/proxy and simulation data. Secondly, the distance measure is expressed in the form of a test statistic of whether a forced simulation is closer to the instrumental/proxy series than unforced simulations. The proposed framework allows any number of proxy locations to be used jointly, with different seasons, record lengths and statistical precision. The goal is to objectively rank several competing climate model simulations (e.g. with alternative model parameterizations or alternative forcing histories) by means of their goodness of fit to the unobservable true past climate variations, as estimated from noisy proxy data and instrumental observations.

    Läs mer om Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium – Part 1: Theory
  • Flat and multimodal likelihoods and model lack of fit in curved exponential families

    2010. Rolf Sundberg. Scandinavian Journal of Statistics 37 (4), 632-643

    Artikel

    It is well known that curved exponential families can have multimodal likelihoods. We investigate the relationship between flat or multimodal likelihoods and model lack of fit, the latter measured by the score (Rao) test statistic of the curved model as embedded in the corresponding full model. When data yield a locally flat or convex likelihood (root of multiplicity >1, terrace point, saddle point, local minimum), we provide a formula for in such points, or a lower bound for it. The formula is related to the statistical curvature of the model, and it depends on the amount of Fisher information. We use three models as examples, including the Behrens-Fisher model, to see how a flat likelihood, etc. by itself can indicate a bad fit of the model. The results are related (dual) to classical results by Efron from 1978.

    Läs mer om Flat and multimodal likelihoods and model lack of fit in curved exponential families
  • Non-Gaussian state space models in decomposition of ice core time series in long and short time-scales

    2010. Jelena Bojarova, Rolf Sundberg. Environmetrics 21 (6), 562-587

    Artikel

    Statistical modelling of six time series of geological ice core chemical data from Greenland is discussed. We decompose the total variation into long time-scale (trend) and short time-scale variations (fluctuations around the trend), and a pure noise component. Too heavy tails of the short-term variation makes a standard time-invariant linear Gaussian model inadequate. We try non-Gaussian state space models, which can be efficiently approximated by time-dependent Gaussian models. In essence, these time-dependent Gaussian models result in a local smoothing, in contrast to the global smoothing provided by the time-invariant model. To describe the mechanism of this local smoothing, we utilise the concept of a local variance function derived from a heavy-tailed density. The time-dependent error variance expresses the uncertainty about the dynamical development of the model state, and it controls the influence of observations on the estimates of the model state components. The great advantage of the derived time-dependent Gaussian model is that the Kalman filter and the Kalman smoother can be used as efficient computational tools for performing the variation decomposition. One of the main objectives of the study is to investigate how the distributional assumption on the model error component of the short time-scale variation affects the decomposition.

    Läs mer om Non-Gaussian state space models in decomposition of ice core time series in long and short time-scales
  • A classical dataset from Williams, and its role in the study of supersaturated designs.

    2008. Rolf Sundberg. Journal of Chemometrics 22, 436-440

    Artikel

    A Plackett–Burman type dataset from a paper by Williams (1968), with 28 observations and 24 two-level factors, has become a standard dataset for illustrating construction (by halving) of supersaturated designs (SSDs) and for a corresponding data analysis. The aim here is to point out that for several reasons this is an unfortunate situation. The original paper by Williams contains several errors and misprints. Some are in the design matrix, which will here be reconstructed, but worse is an outlier in the response values, which can be observed when data are plotted against the dominating factor. In addition, the data should better be analysed on log-scale than on original scale. The implications of the outlier for SSD analysis are drastic, and it will be concluded that the data should be used for this purpose only if the outlier is properly treated (omitted or modified).

    Läs mer om A classical dataset from Williams, and its role in the study of supersaturated designs.
  • A statistical methodology for drug–drug interaction surveillance

    2008. G. Niklas Norén (et al.). Statistics in Medicine 27 (16), 3057-3070

    Artikel

    Interaction between drug substances may yield excessive risk of adverse drug reactions (ADRs) when two drugs are taken in combination. Collections of individual case safety reports (ICSRs) related to suspected ADR incidents in clinical practice have proven to be very useful in post-marketing surveillance for pairwise drug–ADR associations, but have yet to reach their full potential for drug–drug interaction surveillance. In this paper, we implement and evaluate a shrinkage observed-to-expected ratio for exploratory analysis of suspected drug–drug interaction in ICSR data, based on comparison with an additive risk model. We argue that the limited success of previously proposed methods for drug–drug interaction detection based on ICSR data may be due to an underlying assumption that the absence of interaction is equivalent to having multiplicative risk factors. We provide empirical examples of established drug–drug interaction highlighted with our proposed approach that go undetected with logistic regression. A database wide screen for suspected drug–drug interaction in the entire WHO database is carried out to demonstrate the feasibility of the proposed approach. As always in the analysis of ICSRs, the clinical validity of hypotheses raised with the proposed method must be further reviewed and evaluated by subject matter experts.

    Läs mer om A statistical methodology for drug–drug interaction surveillance

Visa alla publikationer av Rolf Sundberg vid Stockholms universitet