We report on the progress of Julia for statistics
One of the important application areas for Julia lies in statistics. Dataset sizes have exploded, thus creating demands on high performance. In addition, there are elegant possibilities to harness the language foundations of Julia to create useful abstractions and better code reuse. In this mini-symposium, we report on some of this progress. The minisymposium will feature:
Handling complex survey data is a crucial task in statistics, requiring the incorporation of survey design to accurately estimate standard errors associated with survey estimates. While established software such as SAS, STATA, and SUDAAN provide this capability, the survey package in R is a popular open-source option. However, as dataset sizes grow, the need for a more efficient computing framework arises. This talk introduces the Survey package in Julia, which aims to address this need. This talk provides an overview of surveys and survey design, emphasizing the importance of accounting for survey design when estimating standard errors. It explores design-based standard errors and various methods to estimate them accurately. The presentation then delves into the implementation details and design choices of the Survey package in Julia.
GLM.jl is a fast and easy to use package, allowing its users to estimate generalized linear regression models. As a researcher in economics, Bogumil explored if the package had sufficient functionality for a standard introductory econometrics course. He implemented all examples contained in Part 1 (chapters 1 to 9) of “Introductory Econometrics: A Modern Approach”, Seventh Edition textbook by Jeffrey M. Wooldridge. In the talk, he shares his experience of the process, in particular discussing the missing functionalities. The talk is accompanied by the GitHub repository containing all the source codes for all the exercises and custom functions that fill all the gaps I found. The codes are ready to use by introductory econometrics teachers in their classes.
Here is the abstract: CRRao.jl is built as a single API for diverse statistical models. Drawing inspiration from the Zelig package in the R world, the CRRao package provides a simple and consistent API for end-users. In this talk, we will present how to implement Bayesian Analysis with the Horse-Shoe Prior using CRRao.jl. We will demonstrate how the Poisson regression model can be implemented for the English Premier League dataset, using Ridge prior, Laplace prior, Cauchy prior, and Horse-Shoe prior. Furthermore, we will show how Logistic Regression with the Horse-Shoe prior can be implemented using the Friedrich Ataxia dataset from Genome research. Additionally, we will illustrate how Gaussian Process Regression can be implemented using the CRRao API call.
In this talk, we will explore the impact of different decomposition methods on Generalized Linear Models (GLM). The presentation will be divided into the following sections:
i. Overview of GLM (5 mins): A brief introduction to Generalized Linear Models, a statistical framework used for diverse response variables. ii. Understanding Decomposition Methods in GLM (3 mins) iii. Comparison of Cholesky and QR Decompositions (5 mins): Highlighting their strengths and weaknesses in GLM estimation. iv. Improving Numerical Stability with QR Decomposition (7 mins): Exploring how QR decomposition can enhance numerical stability in GLM estimation. v. Performance Advantage of Cholesky Decomposition (3 mins) vi. Conclusions (2 mins)
This will be a simplified version of the Statistics in Julia symposium that was in Juliacon 2022 (https://www.youtube.com/watch?v=Fewunew8wU4). It reflects the areas in which good new work has happened in the past year. Of course, the material in the minisymposium will be self-contained: it will target a new viewer, it will not be produced as a diff on the previous one.