A Project to Calculate COVID Moving Average — Spring Batch[Part 1]
This is a series of posts, all related to the same project, but each one is focused on specific issues.
Well, we’re thinking about developing a project to calculate COVID moving average and store it somewhere to make further analysis. In addition, we need to deploy it somewhere and arrange some kind of timer or scheduler to systematically update the data day by day.
First, let’s understand what is the moving average:
“Moving average is a simple, technical analysis tool. Moving averages are usually calculated to identify the trend direction of a stock or to determine its support and resistance levels.”
Similarly to stocks, with COVID we want to understand if the situation has a tendency of going up, steady or down. The longer the time period for the moving average is, the greater the lag. In order to do it, we will create a Spring Batch program that will be divided in three parts:
- Access source which contains “raw” COVID data;
- Calculate Moving average;
- Store moving average data.
If you do not know Spring Batch this link is a good source to learn it. So, I’ll try to explain the strategy adopted in a high level fashion. The idea is the construction of three jobs. Job 1: It will access this open public database that contains COVID-related information; Job 2: It will aggregate the COVID information to our database; and Job3: It will calculate the moving average and store it on our database;
1. Creation of a Basic Spring Batch Job
Go to Spring Initializer and create the minimal Spring Batch project. This means choosing all the default options and a Java version that is more suitable to your environment (I’ll use Java 8). DON’T forget to add those two dependencies: Spring Batch and Spring Web. Next, you can press Generate and open the project on your favourite development environment (I’ll use Eclipse).
After that, we need to think about our three jobs to be constructed. In fact, we are not going to create jobs, but rather create Tasklets. Why is that? Because if you look for the definition of a job you’ll see that jobs are a collection of steps in a certain order. On the other hand, Tasklets are meant to perform a single and simple task within a step.
2. Tasklets Creation
Let’s create three Tasklets:
- ReadDataTasklet.java [to read the open COVID Database];
- ReadFirebaseTasklet.java [it will read the data that is already stored on our database and compare to the newest data from the open COVID Database (spoiler alert!! I am going to use Firebase) ];
- ProcessDataTasklet.java [to process all the information and store the Moveable Average]
3. Tasklets Coordenation
In order to coordenate the Tasklets we need a Configuration file. In this file a single job is set to define the Tasklets in execution order. See the image below:
The entire file can be seen here. But let’s explain each line one by one.
Lines 1 to 9: Our single Job is created. First the command preventRestart() will prevent our Tasklets to be restarted. Then, we first execute readSourceDataTasklet() to collect new information (if available) about COVID. Next, readLocalDataTasklet() will check if there is any difference between our current COVID data and the new information received. Finally, saveDataTasklet() will calculate the moving average and store it on our database (Firebase);
Lines 11, 16 and 20: They will create the Tasklets before their execution.
4. Moving Average Calculation
In order explain how we calculate the moving average, let’s consider the following scenario:
“You want to calculate today’s moving average of deaths” [considering last week’s data]
You have to:
- Collect the number of deaths of the past 6 days;
- Collect today’s number of deaths;
- Sum everything and divide them by seven;
- The result is today’s moving average.
Lines 3 to 9: The number of deaths of these seven days is loaded.
Line 10: The moving average is calculated with three scales and rounding mode.
Line 12: Maybe an Exception will be thrown if the interval of days is smaller than 7 days. It’s ok, just stop processing.
The entire file is here.
5. Conclusion
Of course one might be wondering: Where do we store all those calculated data? Which version control tool to use? The first answer is: at Firebase.
In short, Firebase is Google’s mobile application development platform. Even though our application is a batch one, we can easily use Firebase Realtime Database to store our data. This will be detailed in part 2. In relation to code versioning, Github is the right choice because it connects to Heroku easily. This will be detailed in part 3.
Thank you.