Dealing with Multiple Time Features

Neural Networks have had a good run since their inception in 1943. No, that’s not a typo; the basic level of a neural network, the perceptron, was introduced in 1943 before we had punch-card computers.

These come out of the realm of academics and in the hands of hobbyists. They have been utilized for every manner of classification and prediction. In particular, in time-series data, we can use any recurrent network, RNN (recurrent neural network), GRU (gated recurrent unit), LSTM (long short-term memory), and now even 1-dimensional CNNs (convolutional neural network). We can even use ResNet (residual networks) or Neural ODE (Ordinary Differential equations)

However, all of these rely on having a single time stamp feature. Moreover, with the exception of Neural ODEs and possibly well formed ResNets, these all rely on evenly spaced time stamps.

For example, hourly data, monthly data, weekly data, stocks, hourly data, and cryptos, we can even get data spaced in 15-second increments.

However, in shipping, we have neither of these luxuries! We have entirely unevenly spaced data since trucks are subject to traffic, weather, driver fatigue, global shipments, gas, etc. 

So in accurate data, we see things like order placed time stamp, expected pickup time stamp, expected delivery timestamp, actual delivery, etc. We have to keep track of timing in multiple ways. These timestamps are neither a single feature nor are any of them evenly spaced.

That means our best bets are twofold. If we want to use a recurrent network, we can place even time stamps and make each of the other features time deltas from the regular time stamps, which sort of gets what we want. The problem we face here is in cross-validation. It can be tricky to train an LSTM on an arbitrary data slice if the timestamps are irregular.

So shuffling the index can tremendously affect the training results. 
The approach we prefer at CDL1000 is to split up each timestamp into a few features, i.e., month, day, year, hour(24), a minute if applicable.
In this way, we can shuffle our index for training and testing and toss out features if they are unimportant (i.e., year and minute are less critical for too little variability or too much variability). 

Leave a Reply

%d bloggers like this: