Targets R package for managing workflows
Workflows help us to keep a clear structure of the flow we are building, allow for easier steps traceability and simplified maintenance. They are especially useful when dealing with data science work, where heavy computations take time to run. In R world first popular package to deal with pipelines was drake. It allows not only to build the workflow but also skips up-to-date steps when rerun, hence speeding up the overall execution. This is especially useful when dealing with multiple modelling techniques and deciding to update parameters of just one model – then just this model will be rerun with all the dependent steps.
As drake has some limitations in regards to branching or data management, in 2021 it was superseded by targets. Targets package provides much more user-friendly experience and broader functionality than its predecessor. You can read about the detailed differences here.
Targets together with tarchetypes (pipeline archetypes for the targets package) allow to construct complex pipelines, which are readable and reproducible. In principle all you need to do is to create _targets.R file containing the linked targets.
Follow this walkthrough to understand the concept on how to leverage these packages. Building on top of that I show here how to approach 2 more complex scenarios:
- dynamically creating targets with static names and combining them together for further use
- supplying package info to the target to ensure proper execution
Contents
HOW TO
1. Dynamically creating targets with static names and combining them together for further use.
When you want to create several objects out of one, you can do that by using a branching functionality. You can either do:
- static branching – using predefined vector or data frame of values to use
- dynamic branching – using a created target with values
In each case you end up with several newly created targets, which you can use further down the pipeline. In this scenario I want to use those created targets all together, without explicitly using their names (imagine having hundreds of targets…). This is how I can do that.
Example
Let’s create a data frame, split it into chunks and then combine the targets into a list.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# _targets.R countries <- c("Brazil", "Portugal", "Poland") list( targets::tar_target(my_df, tibble::tibble( country = c("Portugal", "Brazil", "Austria", "Poland"), vals = c(54, 21, 78, 33) )), country_vals <- tarchetypes::tar_map( values = list(ctr = countries), targets::tar_target(c_val, my_df %>% dplyr::filter(country == ctr) )), ctry_list <- tarchetypes::tar_combine( country_list, country_vals, command = setNames(list(!!!.x), paste0(countries, "_val") )) ) |
Target my_df defines a data frame:
Then we have 4 targets – 1 per each country:
Those 4 targets are stored as one object – country_vals, which is not a target:
But still we can use it as a target to create next target – country_list:
As said, this is super useful when managing tens or hundreds of branches simultaneously.
2. Supplying package info to the target to ensure proper execution.
Targets can be created using custom functions. We need to make sure that those functions use building blocks from proper packages. Let’s take a look.
Say, we want to subset a time series (ts) object. For that we can use subset overloaded function with forecast package. However as the function is not from forecast space we cannot define it using standard “::” reference.
But we still can use subset on ts object:
1 2 3 4 5 6 7 8 9 10 |
subset_func <- function(series, start_series, end_series){ return(subset(series, start = start_series, end = end_series)) } ts <- timetk::tk_ts( runif(12 * 5), start = c(2018, 1), end = c(2022, 12), frequency = 12 ) subset_func (ts, 20, 30) |
Let’s see how it works with targets.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# _targets.R subset_func <- function(series, start_series, end_series){ return(subset(series, start = start_series, end = end_series)) } list( targets::tar_target(ts, timetk::tk_ts( runif(12 * 5), start = c(2018, 1), end = c(2022, 12), frequency = 12 )), targets::tar_target(sub_ts, subset_func(ts, 20, 30)) ) |
When run this code produces an error:
It tries to use regular subset function from base R. In order to make if work we need to provide the package info for targets.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# _targets.R subset_func <- function(series, start_series, end_series){ return(subset(series, start = start_series, end = end_series)) } list( targets::tar_target(ts, timetk::tk_ts( runif(12 * 5), start = c(2018, 1), end = c(2022, 12), frequency = 12 )), targets::tar_target(sub_ts, subset_func(ts, 20, 30), packages = "forecast") ) |
Now we obtain proper result:
This is really helpful, thanks