Figure.1 illustrates 4 graphs of similar metrics at a per unit scale, taking un-logged independent and dependent variables. To find the actual values we need to "unlog" them, by applying the exponential. Using the function that we have estimated using the Exponential Growth curve, if we want to predict for 2 weeks after the last day of the dataset, day 68, we simply put t= 68 into the formula and the model predicts 3355 infections on that day. In this formula, y is the number of cases and x is the time. This shows that starting from 1 person and with a growth factor of 2 per person, we obtain more than 16000 cases after 14 days. The statsmodels table gives the values for a and b under coef (in the middle): The value const is the value for a in our Linear Regression: 0.4480; The value Time is the value for b in our Linear Regression: 0.1128; Therefore we can now fill in the Linear Regression function. Let's take back our formula for Linear Regression: The statsmodels table gives the values for a and b under coef (in the middle): Therefore we can now fill in the Linear Regression function. But we need to do some rewriting on the Exponential Growth function, because Linear Regression can only estimate formulas that look as below: First, we need to rewrite the formula in a form that has the shape of the Linear Regression. In this posting we will build upon that by extending Linear Regression to multiple input variables giving rise to Multiple Regression, the workhorse of statistical learning. During the research work that I'm a part of, I found the topic of polynomial regressions to be a bit more difficult to work with on Python. Multiple Regression Using Statsmodels. If you want to follow along, you can use those example data and a short Python notebook. Real life epidemiologists would test different types of models besides exponential growth and do extensive work on model validation, while this has not been done for the current example. Linear regression is used as a predictive model that assumes a linear relationship between the dependent variable (which is the variable we are trying to predict/estimate) and the independent variable/s (input variable/s used in the prediction).For example, you may use linear regression to predict the price of the stock market (your dependent variable) based on the following Macroeconomics input variables: 1. Interest Rate 2. statsmodels uses the same algorithm as above to find the maximum likelihood estimates. The images below show the relationship of sqft of living and price. Statsmodels is built on top of NumPy, SciPy, and matplotlib, but it contains more advanced functions for statistical testing and modeling that you won't find in numerical libraries like NumPy or SciPy. Statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation and inference for statistical models. Linear Regression allows us to estimate the best values for a and b in the following formula, given empirical observations for y and x. The regression model based on ordinary least squares is an instance of the class statsmodels.regression.linear_model.OLS. If we want to represent this graphically, we start to see a graph that looks a lot like the very alarming curves that we see concerning the Coronavirus: Now, we know that this graph has more or less the right shape, but we need to make an additional step to make our analysis useful. With the current outbreak of the Coronavirus going on, we hear a lot about Exponential Growth. Logarithms allow to rewrite the function in the correct form: STEP 1 — The first step in the Python Notebook is to import the data and apply the log transformation: STEP 2 — Then we use the statsmodels library to estimate the Linear Regression function: STEP 3 — Make the prediction function based on the table. This class represents a parametric covariance model for a Gaussian process as described in the work of Paciorek et al. Regression with (Seasonal) ARIMA errors (SARIMAX) is a time series regression model that brings together two powerful regression models namely, Linear Regression, and ARIMA (or Seasonal ARIMA). A regression model, such as linear regression, models an output value based on a linear combination of input values. This technique can be used on time series where input variables are taken as observations at previous time steps, called lag variables. As its name implies, statsmodels is a Python library built specifically for statistics. When looking at the data, we only have the number of cases per day, and not the growth factor. Its density is given by, $$f_{EDM}(y|\theta,\phi,w) = c(y,\phi,w)$$ Exponential Growth is characterized by the following formula: To make this more clear, I will make a hypothetical case in which: We first need to plug the values for a and b in the formula to obtain the formula for our specific epidemic: Then we can use this formula to compute the value of y for each value of t from 0 to 14. The exponential smoothing method for time series forecasting can be used as an alternative to the popular Box-Jenkins ARIMA family of methods.