Prof Sujit K Sahu

 
          

   

Welcome to my third year project page for 2020-2021.

This project is on statistical modelling of a real life spatio-temporal data set of your choice. Observations that vary in both space and time are called spatio-temporal data. Example data set includes air pollution, precipitation (rainfall), disease specific (Covid-19) death rates, brain imaging, ocean characteristics such as temperature, salinity and chlorophyll levels. Please scroll down for some examples.

Data science and analytical techniques are to be used to extract the scientific information, e.g. long term trend in global warming, hidden in these large data sets. Example of data science techniques include regression modelling and validation methods. Intuitively one can expect that spatio-temporal regression models that exploit the spatio-temporal dependence in the data will perform better than regression models with iid error distribution assumption. This is indeed true for most data sets and you will have the opportunity to experience these results yourself.

This project will enhance regression modelling for an application of your choice from among several that would be offered. The ultimate objective in each case will be to find the best regression model for the data set and make predictive inference by drawing a map such as the ones given below. The R-package, bmstdr, pronounced as BM star, developed by Prof Sahu, will be used to accomplish all modelling and mapping tasks. The project will also benefit from an accessible textbook written by Prof Sahu on the same topic of Bayesian modelling of spatio-temporal data with R.

The project will suit students with a wide range of interests in theory and application.

  • A mathematically strong and motivated student can develop the theory behind the modelling so that new models can be fitted.
  • A student with interests in data analytics and data science can analyse a brand new spatio-temporal data set of their choice.
  • A student aiming to gain key skills in R programming can develop and enhance the bmstdr package.
  • It is possible to mix and match some of the above project concepts, i.e. theoretical development, application analytics and software development, depending on your own interest and dedication.

    A third year BSc student worked on a very similar project in 2018-2019 and based on the project it has been possible to publish the research paper: Spatio-temporal Bayesian modeling of precipitation using rain gauge data from the Hubbard Brook Experimental Forest, New Hampshire, USA. Please feel free to have a look.

    Further practical examples:

    1. Number of Covid-19 deaths per million people upto September, 2020.

      global trend
    2. Annual percentage trend in ocean chlorophyll levels.

      global trend
    3. Annual average temperature in the north Atlantic in 2003 and average air pollution in New York.

      annual temperature air pollution in New York
    4. Annual precipitation and trend map of Hubbard Brook experimental forest in New Hampshire, USA

      rolling average trend map
    5. Air pollution modelling maps for eastern USA

      rolling average trend map
    6. Air pollution and their sd map of England and Wales.

      rolling average trend map