Skip to content

Organizing file storage

Many pipeline components allow you to copy files directly into a storage bucket or data lake. When you have multiple pipelines all writing to the same storage, it's essential that you adopt a rigorous structure and naming convention to keep your storage manageable. This article explains how you can use variables to name files and folder paths in a consistent manner that will help to enforce storage best practice.

Folder and file naming with variables

When writing data to a cloud storage location, for example Amazon S3, Azure Blob Storage, or Google Cloud Storage, you will specify the full folder path to the storage location in the appropriate connector property. Instead of hard coding this path into the property, you can use variables to define the path dynamically when the pipeline runs, allowing the file storage structure to be constructed dynamically as new files are loaded into it.

For example, you might want to segregate your data based on some date. You could specify the following folder location:

S3://Enterprise Data/${year}/${month}/${day}/Europe

In this example, Enterprise Data and Europe are hard-coded into the S3 path, while ${year}, ${month}, and ${day} are variables which are assigned values at pipeline runtime. In this example, if the pipeline is run and data is loaded on December 25th 2023, the file will be stored in S3://Enterprise Data/2023/December/25/Europe.

The variables used in the file path can be either pipeline variables or project variables, and must be defined before the pipeline runs. Values can be assigned to the variables at runtime by a number of methods, such as by a Python Script component.

You can use simple JavaScript expressions in place of variables. All such expressions should be enclosed within the curly braces { } that surround the variable name, as illustrated in the following examples. The expressions are evaluated at pipeline runtime to generate a folder name. As variables must be string types to put into a folder name, you will need to use the .toString() JavaScript method to convert a number or a date to a string.

Examples of use

The following set of examples is not exhaustive, but will illustrate several JavaScript expressions that are commonly useful in constructing folder names.

Use the current year as the folder name:

${(new Date().getFullYear()).toString()}

Use the current month as the folder name:

${(new Date().getMonth()).toString()}

Use the full date and time stamp as the folder name:

${(new Date()).toString()}

This creates a folder name in the following format: Wed Dec 06 2023 15:06:10 GMT+0000 (GMT).

Use the day of the month as the folder name:

${(new Date().getDate()).toString()}

Use the time of day as the folder name:

${(new Date().getHours()).toString() + ":" + (new Date().getMinutes()).toString()}

Create a unique identifier (UUID) to use as the folder name:

${'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, c => { const r = Math.random() * 16 | 0; const v = c === 'x' ? r : (r & 0x3 | 0x8); return v.toString(16); });}

This produces a folder name that is guaranteed to be unique, with a format like this: 7e35c9eb-c31e-4f03-9d60-d33b2949e25f.

Arithmetic expressions can be combined with any of these functions. For example, to set the folder name to be the year ten years ago (e.g. 2014 if it is now 2024) use:

${(new Date().getFullYear()-10).toString()}

Multiple expressions can be combined to create a nested folder structure of any required complexity. For example:

${(new Date().getFullYear()).toString() + "/" + (new Date().getMonth()+1).toString() + "/" + (new Date().getDate()).toString()}