In this directory you will find two kinds of files.  One are "Base" files (part of the filename) and the other are "Projection" files (Proj is part of the filename).  There is a third kind of file, a variation on the projection file, that has a number that comes after the "Proj" part (eg.  USA_Proj5.csv).  Files with a number after the Proj are files in which forward (projections) have been clamped at a rate indicated by the number.  For example, USA_Proj5.csv is a file that asswumes 5% growth in infections/day.

The base files are a compendium of the files available on the Johns Hopkins / WHO dataset.  I simply oranize the data from that dataset into invidiual country-specific datasets (and state-specific, for the United States).  That data is reflected in the "Base" files.  Here is a line from one of them (note that none of these files has a header row):

"USA","2020-03-20 00:00:00",19101,391,244,147,2.047010000,62.404090000

First field is location.  United States in this case.
Second field is a date/timestamp.  Only the date is relevant.  It represents a picture of that region on that date.

Next field is the number of confirmed infected.  It was one 19,101 on March 20th, 2020.

Fourth field is total number of known dead plus known recovered.  391 in this case.

Fifth field is number dead.  244 in this case.

Sixth field is number recovered.  147 here.

Seventh field is % resolved.  This is simply field #4 (391) divided into known infected (19,1010) = 2.04%

Seventh field is % dead.  This is simply the number dead (244) divided by the number dead (244) plus the number recovered (147).  244 / (244+147) = 62.40%

The projection files contain some of the same information as base files plus FUTURE projections based on the trends observed in the base files.  Past data is represented by a PAST tag in the data.  Let's look at two of those:

"PAST: USA","2020-03-19 00:00:00",13680,200,108,75.6999743128693,69.4915254237288,1.88679245283019
"PAST: USA","2020-03-20 00:00:00",19101,244,147,39.6271929824561,22,36.1111111111111

First field is again location, with an additional tag (PAST:) that tells you this is not computed data but represents what was actually reported.

Next field is the timestamp, same format as for a Base file.

Third field is total # confirmed.  Same as a base file.

Fourth field is # dead.  Same as base file.

Fifth field is # recovered.  Same as base file.

Sixth field is the day to day change in infection rate.  On 2020-03-19 there were 13,680 infections.  On 2020-03-20 there were 19,101.  A 39.62% increase day to day.

Seventh field is the same as the sixth field, except for deaths, not infections.

Eight field is same as seventh field, except for recoveries, not deaths.

Now let's look at the next line after the 2020-03-20 line:

"FUTURE: USA","2020-3-21 00:00:00",27174,320,321,42.2699163403454,31.4947595955729,118.963257560187

The FUTURE: tag tells you this is a PROJECTION made by my software.  Otherwise all fields are the same as for a PAST: line.

However, the number of infections, 27174, is what I project for 2020-3-21 based on the moving average of the past.  You see the caluclation of that moving average in the 42.26% rate for infections -- that 42.26% rate was derived by looking at the moving average rate of the past.

Same for the death and recovery numbers and percentages.  One last thing, let's look at the day after tomorrow:

"FUTURE: USA","2020-3-22 00:00:00",38869,418,699,43.0356900636275,30.477428868954,117.310990094073

You will notice that some rates, like infections, have gone up!  43.03% is now the infection rate.  Death rates, at 30.48% have gone down.  Why?

My software looks at the past numbers and not only computes a moving average, but computes a trend.  So while the moving average may be larger than the day average, as it is for deaths, the trend over that average may be for it to go down, as it also does for deaths.

Any questions?  greg@littlebear.com