Om mig
Professor (emeritus) i matematisk statistik vid Stockholms universitet, där jag blev anställd första gången 1962. Jag disputerade vid SU 1972, vilket inte känns så länge sedan som det faktiskt är. Efter bland annat ett decennium vid KTH, så återvände jag till SU.
Jag fick 2017 utmärkelsen "Årets Statistikfrämjare" av Svenska Statistikfrämjandet.
I samband med detta höll jag ett föredrag som kom att ligga till grund för en artikel "Minnen och meningar" i Statistikfrämjandets tidskrift Qvintensen 2017:2.
För beskrivning av min vetenskapliga profil, se den engelskspråkiga sidan.
Publikationer
I urval från Stockholms universitets publikationsdatabas
Artikel A Note on Shaved Dice Inference2018. Rolf Sundberg. American Statistician 72 (2), 155157
Two dice are rolled repeatedly, only their sum is registered. Have the two dice been shaved, so two of the six sides appear more frequently? Pavlides and Perlman discussed this somewhat complicated type of situation through curved exponential families. Here, we contrast their approach by regarding data as incomplete data from a simple exponential family. The latter, supplementary approach is in some respects simpler, it provides additional insight about the relationships among the likelihood equation, the Fisher information, and the EM algorithm, and it illustrates the information content in ancillary statistics.

2017. Sara Gummesson (et al.).
The foundation of this paper is lithic economy with a focus on the actual use of different lithic raw materials for tasks at hand. Our specific focus is on the production of bone tools during the Mesolithic. The lithic and osseous assemblages from Strandvägen, Motala, in eastcentral Sweden provide the archaeological background for the study. Based on a series of experiments we evaluate the efficiency and durability of different tool edges of five lithic raw materials: Cambrian flint, Cretaceous flint, mylonitic quartz, quartz, and porphyry, each used to whittle bone. The results show that flint is the most efficient of the raw materials assessed. Thus, a nonlocal raw material offers complements of functional characteristics for bone working compared to locally available quartz and mylonitic quartz. This finding provides a new insight into lithic raw material distribution in the region, specifically for bone tool production on site.

Artikel Exploratory factor analysisParameter estimation and scores prediction with highdimensional data2016. Rolf Sundberg, Uwe Feldmann. Journal of Multivariate Analysis 148, 4959
In an approach aiming at highdimensional situations, we first introduce a distributionfree approach to parameter estimation in the standard random factor model, that is shown to lead to the same estimating equations as maximum likelihood estimation under normality. The derivation is considerably simpler, and works equally well in the case of more variables than observations (p > n). We next concentrate on the latter case and show results of type: Albeit factor, loadings and specific variances cannot be precisely estimated unless n is large, this is not needed for the factor scores to be precise, but only that p is large; A classical fixed point iteration method can be expected to converge safely and rapidly, provided p is large. A microarray data set, with p = 2000 and n = 22, is used to illustrate this theoretical result.

2015. Anders Moberg (et al.). Climate of the Past 11 (3), 425448
A statistical framework for evaluation of climate model simulations by comparison with climate observations from instrumental and proxy data (part 1 in this series) is improved by the relaxation of two assumptions. This allows autocorrelation in the statistical model for simulated internal climate variability and enables direct comparison of two alternative forced simulations to test whether one fits the observations significantly better than the other. The extended framework is applied to a set of simulations driven with forcings for the preindustrial period 10001849 CE and 15 treeringbased temperature proxy series. Simulations run with only one external forcing (land use, volcanic, smallamplitude solar, or largeamplitude solar) do not significantly capture the variability in the treering data  although the simulation with volcanic forcing does so for some experiment settings. When all forcings are combined (using either the small or largeamplitude solar forcing), including also orbital, greenhousegas and nonvolcanic aerosol forcing, and additionally used to produce small simulation ensembles starting from slightly different initial ocean conditions, the resulting simulations are highly capable of capturing some observed variability. Nevertheless, for some choices in the experiment design, they are not significantly closer to the observations than when unforced simulations are used, due to highly variable results between regions. It is also not possible to tell whether the smallamplitude or largeamplitude solar forcing causes the multipleforcing simulations to be closer to the reconstructed temperature variability. Proxy data from more regions and of more types, or representing larger regions and complementary seasons, are apparently needed for more conclusive results from modeldata comparisons in the last millennium.

2012. Alistair Hind, Anders Moberg, Rolf Sundberg. Climate of the Past 8 (4), 13551365
The statistical framework of Part 1 (Sundberg et al., 2012), for comparing ensemble simulation surface temperature output with temperature proxy and instrumental records, is implemented in a pseudoproxy experiment. A set of previously published millennial forced simulations (Max Planck Institute – COSMOS), including both "low" and "high" solar radiative forcing histories together with other important forcings, was used to define "true" target temperatures as well as pseudoproxy and pseudoinstrumental series. In a global landonly experiment, using annual mean temperatures at a 30yr time resolution with realistic proxy noise levels, it was found that the low and high solar fullforcing simulations could be distinguished. In an additional experiment, where pseudoproxies were created to reflect a current set of proxy locations and noise levels, the low and high solar forcing simulations could only be distinguished when the latter served as targets. To improve detectability of the low solar simulations, increasing the signaltonoise ratio in local temperature proxies was more efficient than increasing the spatial coverage of the proxy network. The experiences gained here will be of guidance when these methods are applied to real proxy and instrumental data, for example when the aim is to distinguish which of the alternative solar forcing histories is most compatible with the observed/reconstructed climate.

2012. Rolf Sundberg, Anders Moberg, Alistair Hind. Climate of the Past 8 (4), 13391353
A statistical framework for comparing the output of ensemble simulations from global climate models with networks of climate proxy and instrumental records has been developed, focusing on nearsurface temperatures for the last millennium. This framework includes the formulation of a joint statistical model for proxy data, instrumental data and simulation data, which is used to optimize a quadratic distance measure for ranking climate model simulations. An essential underlying assumption is that the simulations and the proxy/instrumental series have a shared component of variability that is due to temporal changes in external forcing, such as volcanic aerosol load, solar irradiance or greenhouse gas concentrations. Two statistical tests have been formulated. Firstly, a preliminary test establishes whether a significant temporal correlation exists between instrumental/proxy and simulation data. Secondly, the distance measure is expressed in the form of a test statistic of whether a forced simulation is closer to the instrumental/proxy series than unforced simulations. The proposed framework allows any number of proxy locations to be used jointly, with different seasons, record lengths and statistical precision. The goal is to objectively rank several competing climate model simulations (e.g. with alternative model parameterizations or alternative forcing histories) by means of their goodness of fit to the unobservable true past climate variations, as estimated from noisy proxy data and instrumental observations.

2010. Rolf Sundberg. Scandinavian Journal of Statistics 37 (4), 632643
It is well known that curved exponential families can have multimodal likelihoods. We investigate the relationship between flat or multimodal likelihoods and model lack of fit, the latter measured by the score (Rao) test statistic of the curved model as embedded in the corresponding full model. When data yield a locally flat or convex likelihood (root of multiplicity >1, terrace point, saddle point, local minimum), we provide a formula for in such points, or a lower bound for it. The formula is related to the statistical curvature of the model, and it depends on the amount of Fisher information. We use three models as examples, including the BehrensFisher model, to see how a flat likelihood, etc. by itself can indicate a bad fit of the model. The results are related (dual) to classical results by Efron from 1978.

2010. Jelena Bojarova, Rolf Sundberg. Environmetrics 21 (6), 562587
Statistical modelling of six time series of geological ice core chemical data from Greenland is discussed. We decompose the total variation into long timescale (trend) and short timescale variations (fluctuations around the trend), and a pure noise component. Too heavy tails of the shortterm variation makes a standard timeinvariant linear Gaussian model inadequate. We try nonGaussian state space models, which can be efficiently approximated by timedependent Gaussian models. In essence, these timedependent Gaussian models result in a local smoothing, in contrast to the global smoothing provided by the timeinvariant model. To describe the mechanism of this local smoothing, we utilise the concept of a local variance function derived from a heavytailed density. The timedependent error variance expresses the uncertainty about the dynamical development of the model state, and it controls the influence of observations on the estimates of the model state components. The great advantage of the derived timedependent Gaussian model is that the Kalman filter and the Kalman smoother can be used as efficient computational tools for performing the variation decomposition. One of the main objectives of the study is to investigate how the distributional assumption on the model error component of the short timescale variation affects the decomposition.

2008. Rolf Sundberg. Journal of Chemometrics 22, 436440
A Plackett–Burman type dataset from a paper by Williams (1968), with 28 observations and 24 twolevel factors, has become a standard dataset for illustrating construction (by halving) of supersaturated designs (SSDs) and for a corresponding data analysis. The aim here is to point out that for several reasons this is an unfortunate situation. The original paper by Williams contains several errors and misprints. Some are in the design matrix, which will here be reconstructed, but worse is an outlier in the response values, which can be observed when data are plotted against the dominating factor. In addition, the data should better be analysed on logscale than on original scale. The implications of the outlier for SSD analysis are drastic, and it will be concluded that the data should be used for this purpose only if the outlier is properly treated (omitted or modified).

2008. G. Niklas Norén (et al.). Statistics in Medicine 27 (16), 30573070
Interaction between drug substances may yield excessive risk of adverse drug reactions (ADRs) when two drugs are taken in combination. Collections of individual case safety reports (ICSRs) related to suspected ADR incidents in clinical practice have proven to be very useful in postmarketing surveillance for pairwise drug–ADR associations, but have yet to reach their full potential for drug–drug interaction surveillance. In this paper, we implement and evaluate a shrinkage observedtoexpected ratio for exploratory analysis of suspected drug–drug interaction in ICSR data, based on comparison with an additive risk model. We argue that the limited success of previously proposed methods for drug–drug interaction detection based on ICSR data may be due to an underlying assumption that the absence of interaction is equivalent to having multiplicative risk factors. We provide empirical examples of established drug–drug interaction highlighted with our proposed approach that go undetected with logistic regression. A database wide screen for suspected drug–drug interaction in the entire WHO database is carried out to demonstrate the feasibility of the proposed approach. As always in the analysis of ICSRs, the clinical validity of hypotheses raised with the proposed method must be further reviewed and evaluated by subject matter experts.

2008. Petra von Stein, JanOlov Persson, Rolf Sundberg. Gastroenterology 134 (7), 18691881