Saturday, September 15, 2018

Leap Year Days in 365-day Moving Average of a Time Series?

Suppose you have a data time series that is taken every day. Say, like Arctic sea ice extent from NSIDC.

Let's say you want to calculate it's moving annual average -- over 1-yr, 12-mths, 365.25 days.

How, exactly, do you account for leap year days in such a moving average?

PS: I was born on a leap year day, February 29th -- the only baby in the hospital to have been delivered on that day -- but don't worry about upsetting me no matter whatever you propose. I've heard all the jokes, and I like being only a decade and a half old, more or less. It makes me feel just a bit special.

PPS: I also defended my PhD thesis on a Feb 29th. After my hour-long 4:00 pm presentation they voted me up just one hour before the hour I was born, so I can say I got my Phd when I was 27.

5 comments:

William Connolley said...

Probably, via some bodge. If you go via monthly averages, it's easy, obvs. If you want the multi-year average of a given day number, you probably just ignore the leap-day. After all, the day to day correlation is high, so you're not losing much. HadCM3 and before used to run with 360 day years, for convenience.

J said...

Lots of ways to do this. It sounds like you're asking for a rolling mean centered on each day, with the averaging window covering a 365.25 day period. In that case, one option is to use the day itself, plus the 182 days on either side, plus 1/8th of the days on the extreme ends of the window (i.e., use them, but weight them each at 1/8th). That gets you 365.25 days.

Another option: Rescale your 366-day years to 365 days, using any of various methods, and calculate a 365-day rolling average.

Yet another: Convert each day to decimal year (YYYY + [Julian - 0.5)/365, for a 365-day year, or divide by 366 in leap years). Then average over an 0.5-year-wide window.

J said...

Actually, it's probably better to rescale your data in blocks of four years, so that the solstices line up. In other words, think of your data as falling in blocks of 1461 days (=four 365.25-day years). Resample them into blocks of 1460 days (=four 365-day years). Each day's value in the 1460-day cycle is an interpolation (or weighted average) of the two adjacent day's measurements in the 1461-day cycle.

Or you could do this with Fourier analysis.

J said...

This matters when calculating normals and anomalies (i.e., the long-term average value on a given day of the year, and the departure from that average on a given day of a given year) ... because in some years, that day will be X days after the solstice, and in other years it will be X+1 days. Resampling from the original 1461-day (four-year) blocks into artificial 1460-day blocks will solve this problem.

In the social sciences, it's more complicated (as usual) because the day of the week matters, too. I have heard that sometimes people will perform a "seasonal adjustment" process that calculates anomalies while removing leap years and compensating for day-of-the week.

Would this matter for your example (Arctic sea ice)? I assume there's some kind of weekly cyclic signal in long-distance jet travel, creating a similar weekly cycle in contrails, which in turn would create a small weekly cycle in radiation balance over the Arctic. There's not a lot of jet traffic over the Central Arctic, but there's a fair bit over some particular regions, e.g. Baffin Bay:

https://phys.org/news/2015-05-proba-v-world-air-traffic-space.html

All of this probably falls in the category of "stuff you can ignore 99% of the time", though. For your case, just take a rolling 365-day average of the ice extent, and graph it based on decimal year. But note that prior to Jan 13, 1988 there are a bunch of days missing from your NSIDC data set. So if you're wanting to calculate rolling averages from the start of the data (Oct 1978) you'll still need to use some kind of interpolation/resampling, be it Fourier or LOESS or simple linear interpolation.

@whut said...

I use the value of 365.2422 days which averages the leap day correction over time. This is necessary for long period tidal calculations.