X Tutup

40  Advanced Features

All the features we were able to extract were related to what day or time it was for a given observation. Or numbers on the form “how many since the start of the month” or “how many days since the start of the week”. And while this information can be useful, there will often be times when we want to do slight modifications that can result in huge payoffs.

Consider merchandise sale-related data. The mere indication of specific dates might become useful, but the sale amount is not likely to be affected just on the sale days, but on the surrounding days as well. Consider the American Black Friday. This day is predetermined to come every year at an easily recognized day, namely the last Friday of November. Considering its close time to Christmas and other gift-giving holidays, it is a common day for thrifty people to start buying presents.

In the extraction since we have a single indicator for the day of Black Friday

Bar chart. Dates (Nov 19-29, 2023) on x-axis, effect on y-axis. Single bar of height 1 on Nov 24 (Black Friday), all other days are zero. Represents a simple binary indicator for the holiday.
Figure 40.1: We only see the effect of a single Day

But it would make sense that since we know the day of Black Friday, that the sales will see a drop on the previous days, we can incorporate that as well.

Bar chart. Dates (Nov 19-29, 2023) on x-axis, effect on y-axis. showing  Black Friday with "before" effects. Nov 24 has value 1. Days before have  negative values that decay exponentially (Nov 23 around -0.5, Nov 22 around -0.25, etc.). Days after are zero.
Figure 40.2: Negative before effects can capture hesitancy to buy before a big sale.

On the other hand, once the sale has started happening the sales to pick up again. Since this is the last big sale before the Holidays, shoppers are free to buy their remaining presents as they don’t have to fear the item going on sale.

Bar chart. Dates (Nov 19-29, 2023) on x-axis, effect on y-axis.  Showing Black Friday with both before and after effects. Nov 24 peaks at 1. Days before have negative values. Days after have positive but decaying values, creating an asymmetric pattern around the holiday.
Figure 40.3: Positive affects effects can capture the ease of mind that no other sale will come.

The exact effects shown here are just approximate to our story at hand. But they provide a useful illustration. There is a lot of bandwidth to be given if we look at date times from a distance perspective. We can play around with “distance from” and “distance to”, different numerical transformations we saw in Numeric Overview, and signs and indicators we talked about in Chapter 39 to tailor our feature engineering to our problem.

What all these methods have in common is a reference point. For an extracted day feature, the reference point is “first of the month” and the after-function is x, or in other words “days since the time of day”. We see this in the following chart. Almost all extracted functions follow this formula

Bar chart showing day-of-month extraction over ~3 months. Each month forms a sawtooth pattern: values increase from 1 to ~30, then reset to 1. The pattern represents "days since start of month" - a simple extracted datetime feature.
Figure 40.4: Repeated increasing values.

we could just as well do the inverse and look at how many days are left in the month. This would have a before-function of x as well.

Bar chart showing "days until end of month" over ~3 months. Inverted sawtooth pattern: values start high (varying by month length - 30, 31, etc.) and decrease to 1 at month end, then jump back up. Represents "days remaining in month" feature.
Figure 40.5: Repeated increasing values.

We can do a both-sided formula by looking at “how many days are we away from a weekend”. This would have both the before and after functions be x and look like so. Here it isn’t too interesting as it is quite periodic, but using the same measure with “sale” instead of “weekend” and suddenly you have something different.

Bar chart showing "days from nearest weekend" over ~1 month. Repeating weekly pattern: Saturday and Sunday are 0, Monday and Friday are 1, Tuesday and Thursday are 2, Wednesday peaks at 3. Creates a symmetric tent pattern each week.
Figure 40.6: Repeated

There are many other functions you can use, they will depend entirely on your task at hand. A few examples are shown below for inspiration.

Faceted bar chart with 4 panels showing different transformations of day-of-month. "x": linear sawtooth (1-30). "log(x)": compressed, flattens after first week. "x^2": exponential growth within each month. "pmin(x,10)": capped at 10, creating plateaus in the second half of each month.
Figure 40.7: Repeated

What makes these calculations so neat is that they can be tailored to our task at hand and that they work with irregular events such as holidays and signup dates. These methods are not circular by definition, but they will work in many ways it. We will cover explicit circular methods in Chapter 41.

40.2 Pros and Cons

40.2.1 Pros

40.2.2 Cons

40.3 R Examples

40.4 Python Examples

X Tutup