Skip to main content

Plugins

Meltano takes a modular approach to data engineering in general and EL(T) in particular, where your project and pipelines are composed of plugins of different types, most notably extractors (Singer taps), loaders (Singer targets), utilities (dbt for transformation, Airflow/Dagster/etc. for orchestration, and much more on MeltanoHub).

Meltano provides the glue to make these components work together smoothly and enables consistent configuration and deployment.

To learn how to manage your project's plugins, refer to the Plugin Management guide.

Project Plugins

In order to use a given package as a plugin in a project, assuming it meets the requirements of the plugin type in question, Meltano needs to know:

  1. where to find the package, typically a pip package identified by its name on PyPI, public or private Git repository URL, or local directory path,
  2. what settings it supports
  3. what capabilities it supports, and finally
  4. what its configuration should be when invoked.

Together, a package's location (1) and the metadata (2) describing it in terms Meltano can understand make up the base plugin description. In your project, plugins extend this description with a specific configuration (3) and a unique name.

This means that different configurations of the same package (base plugin) would be represented in your project as separate plugins with their own unique names, that can be thought of as differently initialized instances of the same class. For example: extractors tap-postgres--billing and tap-postgres--events derived from base extractor tap-postgres, or tap-google-analytics--client-foo and tap-google-analytics--client-bar derived from base extractor tap-google-analytics.

Each plugin in a project can either:

  • inherit its base plugin description from a discoverable plugin that's supported out of the box,
  • define its base plugin description explicitly, making it a custom plugin, or
  • inherit both base plugin description and configuration from another plugin in the project.

To learn how to add a plugin to your project, refer to the Plugin Management guide.

Discoverable plugins

Base plugin descriptions for many popular extractors (Singer taps), loaders (Singer targets), and other plugins have already been collected by users and contributed to Meltano Hub, making them supported out of the box.

To find discoverable plugins refer to the lists of Extractors, Loaders, etc., on Meltano Hub.

To learn how to add a discoverable plugin to your project using a shadowing plugin definition or inheriting plugin definition, refer to the Plugin Management guide.

Variants

In the case of various popular data sources and destinations, multiple alternative implementations of Singer taps (extractors) and targets (loaders) exist, some of which are forks of an original (canonical) version that evolved in their own direction, while others were developed independently from the start.

These different implementations and their repositories typically use the same name (tap-<source> or target-<destination>) and may on the surface appear interchangeable, but often vary significantly in terms of exact behavior, quality, and supported settings.

In its index of discoverable plugins, Meltano considers these different implementations different variants of the same plugin, that share a plugin name and other source/destination-specific details (like a logo and description), but have their own implementation-specific variant name and metadata (like capabilities and settings).

Every discoverable plugin has a default variant that is known to work well and recommended for new users, which will be added to your project unless you explicitly select a different one. Users who already have experience with a different variant (or have specific reasons to prefer it) can explicitly choose to add it to their project instead of the default, so that they get the same behavior and can use the same settings as before. If the variant in question is not discoverable yet, it can be added as a custom plugin.

To learn how to add a non-default variant of a discoverable plugin to your project, refer to the Plugin Management guide.

Custom plugins

If you'd like to use a package in your project whose base plugin description isn't discoverable yet, you'll need to collect and provide this metadata yourself.

To learn how to add a custom plugin to your project using a custom plugin definition, refer to the Plugin Management guide.

Once you've got the plugin working in your project, please consider contributing its description to Meltano Hub to make it discoverable and supported out of the box for new users!

Plugin Inheritance

If you'd like to use the same package (base plugin) in your project multiple times with different configurations, you can add a new plugin that inherits from an existing one.

The new plugin will inherit its parent's base plugin description and configuration as if they were defaults, which can then be overridden as appropriate.

For performance reasons, inherited plugins with an identical pip_url to their parent share the parents underlying python virtualenv. If you would prefer to create a separate virtualenv for an inherited plugin, modify it's pip_url to be different to its parent.

To learn how to add an inheriting plugin to your project using an inheriting plugin definition, refer to the Plugin Management guide.

Lock artifacts

When you add a plugin to your project using meltano add, the discoverable plugin definition of the plugin will be downloaded and added to your project under plugins/<plugin_type>/<plugin_name>--<variant_name>.lock. This will ensure that the plugin's definition will be stable and version-controlled.

Later invocations of the plugin will use this file to determine the settings, installation source, etc.

Note that custom and inherited plugins do not get a lock file.

Types

Meltano supports the following types of plugins:

  • Extractors pull data out of arbitrary data sources.
  • Mappers perform stream map transforms on data between extractors and loaders.
  • Loaders load extracted data into arbitrary data destinations.
  • Utilities perform arbitrary tasks provided by pip packages with executables. All plugins previously referred to as transformers and orchestrators are being transistioned to utilities.
  • File bundles bundle files you may want in your project.

These plugin types are still supported but are transitioning to being referred to as Utilities:

These plugin types are deprecated:

  • Transforms transform data that has been loaded into a database (data warehouse).

Extractors

Extractors are pip packages used by meltano run or meltano invoke as part of data integration. They are responsible for pulling data out of arbitrary data sources: databases, SaaS APIs, or file formats.

Meltano supports Singer taps: executables that implement the Singer specification.

To learn which extractors are discoverable and supported out of the box, refer to the Extractors page.

Extras

Extractors support the following extras:

catalog extra

An extractor's catalog extra holds a path to a catalog file (relative to the project directory) to be provided to the extractor when it is run in sync mode using meltano elt or meltano invoke.

If a catalog path is not set, the catalog will be generated on the fly by running the extractor in discovery mode and applying the schema, selection, and metadata rules to the discovered file.

Selection filter rules are always applied to manually provided catalogs as well as discovered ones.

While this extra can be managed using meltano config or environment variables like any other setting, a catalog file is typically provided using meltano elt's --catalog option.

If the catalog does not seem to take effect, you may need to validate the capabilities of the tap.

How to use

Manage this extra:

extractors:
- name: tap-gitlab
catalog: extract/tap-gitlab.catalog.json

load_schema extra

  • Setting: _load_schema
  • Environment variable: <EXTRACTOR>__LOAD_SCHEMA, e.g. TAP_GITLAB__LOAD_SCHEMA
  • Default: $MELTANO_EXTRACTOR_NAMESPACE, which will expand to the extractor's namespace, e.g. tap_gitlab for tap-gitlab

An extractor's load_schema extra holds the name of the database schema extracted data should be loaded into, when this extractor is used in a pipeline with a loader for a database that supports schemas, like PostgreSQL or Snowflake.

The value of this extra can be referenced from a loader's configuration using the MELTANO_EXTRACT__LOAD_SCHEMA pipeline environment variable. It is used as the default value for the target-postgres and target-snowflake schema settings.

How to use

Manage this extra:

extractors:
- name: tap-gitlab
load_schema: gitlab_data

metadata extra

  • Setting: _metadata, alias: metadata
  • Environment variable: <EXTRACTOR>__METADATA, e.g. TAP_GITLAB__METADATA
  • Default: {} (an empty object)

An extractor's metadata extra holds an object describing Singer stream and property metadata rules that are applied to the extractor's discovered catalog file when the extractor is run using meltano run, meltano invoke, or meltano elt. These rules are not applied when a catalog is provided manually.

Stream (entity) metadata <key>: <value> pairs (e.g. {"replication-method": "INCREMENTAL"}) are nested under top-level entity identifiers that correspond to Singer stream tap_stream_id values. These nested properties can also be thought of and interacted with as settings named _metadata.<entity>.<key>.

Property (attribute) metadata <key>: <value> pairs (e.g. {"is-replication-key": true}) are nested under top-level entity identifiers and second-level attribute identifiers that correspond to Singer stream property names. These nested properties can also be thought of and interacted with as settings named _metadata.<entity>.<attribute>.<key>.

Unix shell-style wildcards can be used in entity and attribute identifiers to match multiple entities and/or attributes at once.

Entity and attribute names can be discovered using meltano select --list --all <plugin>.

How to use

Manage this extra:

extractors:
- name: tap-postgres
metadata:
some_stream_id:
replication-method: INCREMENTAL
replication-key: created_at
created_at:
is-replication-key: true

schema extra

  • Setting: _schema
  • Environment variable: <EXTRACTOR>__SCHEMA, e.g. TAP_GITLAB__SCHEMA
  • Default: {} (an empty object)

An extractor's schema extra holds an object describing Singer stream schema override rules that are applied to the extractor's discovered catalog file when the extractor is run using meltano elt or meltano invoke. These rules are not applied when a catalog is provided manually.

JSON Schema descriptions for specific properties (attributes) (e.g. {"type": ["string", "null"], "format": "date-time"}) are nested under top-level entity identifiers that correspond to Singer stream tap_stream_id values, and second-level attribute identifiers that correspond to Singer stream property names. These nested properties can also be thought of and interacted with as settings named _schema.<entity>.<attribute> and _schema.<entity>.<attribute>.<key>.

Unix shell-style wildcards can be used in entity and attribute identifiers to match multiple entities and/or attributes at once.

Entity and attribute names can be discovered using meltano select --list --all <plugin>.

If a schema is specified for a property that does not yet exist in the discovered stream's schema, the property (and its schema) will be added to the catalog. This allows you to define a full schema for taps such as tap-dynamodb that do not themselves have the ability to discover the schema of their streams.

How to use

Manage this extra:

extractors:
- name: tap-postgres
schema:
some_stream_id:
created_at:
type: ["string", "null"]
format: date-time

select extra

  • Setting: _select
  • Environment variable: <EXTRACTOR>__SELECT, e.g. TAP_GITLAB__SELECT
  • Default: ["*.*"]

An extractor's select extra holds an array of entity selection rules that are applied to the extractor's discovered catalog file when the extractor is run using meltano run, meltano invoke, or meltano elt. These rules are not applied when a catalog is provided manually.

A selection rule is comprised of an entity identifier that corresponds to a Singer stream's tap_stream_id value, and an attribute identifier that that corresponds to a Singer stream property name, separated by a period (.). Rules indicating that an entity or attribute should be excluded are prefixed with an exclamation mark (!). Unix shell-style wildcards can be used in entity and attribute identifiers to match multiple entities and/or attributes at once.

Entity and attribute names can be discovered using meltano select --list --all <plugin>.

While this extra can be managed using meltano config or environment variables like any other setting, selection rules are typically specified using meltano select.

How to use

Manage this extra:

extractors:
- name: tap-gitlab
select:
- project_members.*
- commits.*

If the extractor uses a dot character within stream names, you can escape it with a backslash (\). For example, if the extractor has a stream named animals.data, you can select fields using the following configuration:

extractors:
- name: tap-smoke-test
select:
- "animals\\.data.id" # Use double backslash to escape the dot in quoted strings
- animals\.data.verified # Use single backslash to escape the dot in unquoted strings

select_filter extra

  • Setting: _select_filter
  • Environment variable: <EXTRACTOR>__SELECT_FILTER, e.g. TAP_GITLAB__SELECT_FILTER
  • meltano elt CLI options: --select and --exclude
  • Default: []

An extractor's select_filter extra holds an array of entity selection filter rules that are applied to the extractor's discovered or provided catalog file when the extractor is run using meltano run, meltano invoke, or meltano elt, after schema, selection, and metadata rules are applied.

It can be used to only extract records for specific matching entities, or to extract records for all entities except for those specified, by letting you apply filters on top of configured entity selection rules.

Selection filter rules use entity identifiers that correspond to Singer stream tap_stream_id values. Rules indicating that an entity should be excluded are prefixed with an exclamation mark (!). Unix shell-style wildcards can be used in entity identifiers to match multiple entities at once.

Entity names can be discovered using meltano select --list --all <plugin>.

While this extra can be managed using meltano config or environment variables like any other setting, selection filers are typically specified using meltano elt's --select and --exclude options.

How to use

Manage this extra:

extractors:
- name: tap-gitlab
select:
- project_members.*
- commits.*
select_filter:
- commits

state extra

An extractor's state extra holds a path to a state file (relative to the project directory) to be provided to the extractor when it is run as part of a pipeline using meltano elt.

If a state path is not set, the state will be looked up automatically based on the ELT run's State ID.

While this extra can be managed using meltano config or environment variables like any other setting, a state file is typically provided using meltano elt's --state option.

How to use

Manage this extra:

extractors:
- name: tap-gitlab
state: extract/tap-gitlab.state.json

use_cached_catalog extra

  • Setting: _use_cached_catalog
  • Environment variable: <EXTRACTOR>__USE_CACHED_CATALOG, e.g. TAP_GITLAB__USE_CACHED_CATALOG
  • Default: True

An extractor's use_cached_catalog extra is a boolean flag that, when set to False, disables the use of a cached catalog file during the extractor's discovery process. By default, Meltano will cache the catalog file generated by an extractor to speed up subsequent runs. However, if the extractor's schema has changed in a way that would affect discovery output, you may want to bypass the cache to ensure the latest catalog is used.

Setting this extra to False forces the extractor to perform discovery and generate a new catalog file every time it runs, which can be useful during development or when an extractor supports dynamic catalog discovery, such as in tap-salesforce.

How to use

Manage this extra:

extractors:
- name: tap-gitlab
use_cached_catalog: false

Loaders

Loaders are pip packages used by meltano elt as part of data integration. They are responsible for loading extracted data into arbitrary data destinations: databases, SaaS APIs, or file formats.

Meltano supports Singer targets: executables that implement the Singer specification.

To learn which loaders are discoverable and supported out of the box, refer to the Loaders page.

Extras

Loaders support the following extras:

dialect extra

  • Setting: _dialect
  • Environment variable: <LOADER>__DIALECT, e.g. TARGET_POSTGRES__DIALECT
  • Default: $MELTANO_LOADER_NAMESPACE, which will expand to the loader's namespace. Note that this default has been overridden on discoverable loaders, e.g. postgres for target-postgres and snowflake for target-snowflake.

A loader's dialect extra holds the name of the dialect of the target database, so that transformers in the same pipeline can determine the type of database to connect to.

The value of this extra can be referenced from a transformer's configuration using the MELTANO_LOAD__DIALECT pipeline environment variable. It is used as the default value for dbt's target setting, and should therefore correspond to a target name in transform/profile/profiles.yml.

How to use

Manage this extra:

loaders:
- name: target-example-db
dialect: example-db

Transforms

Transform plugins are being deprecated in favor of calling dbt packages directly.

The transform plugin type is still supported for now but will eventually be phased out.

Transforms are dbt packages containing dbt models, that are used by meltano elt as part of data transformation.

Together with the dbt transformer, they are responsible for transforming data that has been loaded into a database (data warehouse) into a different format, usually one more appropriate for analysis.

When a transform is added to your project using meltano add, the dbt package Git repository referenced by its pip_url will be added to your project's transform/packages.yml and the package will be enabled in transform/dbt_project.yml.

Extras

Transforms support the following extras:

package_name extra

  • Setting: _package_name
  • Environment variable: <TRANSFORM>__PACKAGE_NAME, e.g. TAP_GITLAB__PACKAGE_NAME
  • Default: $MELTANO_TRANSFORM_NAMESPACE, which will expand to the transform's namespace, e.g. tap_gitlab for tap-gitlab

A transform's package_name extra holds the name of the dbt package's internal dbt project: the value of name in dbt_project.yml.

When a transform is added to your project using meltano add, this name will be added to the models dictionary in transform/dbt_project.yml.

The value of this extra can be referenced from a transformer's configuration using the MELTANO_TRANSFORM__PACKAGE_NAME pipeline environment variable. It is included in the default value for dbt's models setting: $MELTANO_TRANSFORM__PACKAGE_NAME $MELTANO_EXTRACTOR_NAMESPACE my_meltano_model.

How to use

Manage this extra:

transforms:
- name: dbt-facebook-ads
namespace: tap_facebook
package_name: facebook_ads

vars extra

  • Setting: _vars
  • Environment variable: <TRANSFORM>__VARS, e.g. TAP_GITLAB__VARS
  • Default: {} (an empty object)

A transform's vars extra holds an object representing dbt model variables that can be referenced from a model using the var function.

When a transform is added to your project using meltano add, this object will be used as the dbt model's vars object in transform/dbt_project.yml.

Because these variables are handled by dbt rather than Meltano, environment variables can be referenced using the env_var function instead of $VAR or ${VAR}.

How to use

Manage this extra:

{% raw %}
transforms:
- name: tap-gitlab
vars:
schema: '{{ env_var(''DBT_SOURCE_SCHEMA'') }}'
{% endraw %}

Orchestrators

Orchestrator plugins are transitioning over to being called Utilities. The new approach is to group all non-EL plugins under the utility plugin type.

The orchestrator plugin type is still supported for now but will eventually be phased out as utilities take over.

Orchestrators are pip packages responsible for orchestrating a project's scheduled pipelines.

Meltano supports Apache Airflow out of the box, but can be used with any tool capable of reading the output of meltano schedule list --format=json and executing each pipeline's meltano run command on a schedule.

When the airflow utility is added to your project using meltano add, its related file bundle will automatically be added as well.

Transformers

Transformers plugins are transitioning over to being called Utilities. The new approach is to group all non-EL plugins under the utility plugin type.

The transformer plugin type is still supported for now but will eventually be phased out as utilities take over.

Transformers are pip packages used by meltano run as part of data transformation. They are responsible for running transforms.

Meltano supports dbt and its dbt models out of the box.

When the dbt transformer is added to your project using meltano add, its related file bundle will automatically be added as well.

File bundles

File bundles are pip packages bundling files you may want in your project.

When a file bundle is added to your project using meltano add, the bundled files will automatically be added as well. The file bundle itself will not be added to your meltano.yml project file unless it contains files that are managed by the file bundle and to be updated automatically when meltano upgrade is run.

update extra

  • Setting: _update
  • Environment variable: <BUNDLE>__UPDATE, e.g. DBT__UPDATE
  • Default: {} (an empty object)

A file bundle's update extra holds an object mapping file paths (of files inside the bundle, relative to the project root) to booleans. Glob patterns are supported to allow matching of multiple files with a single path.

When a file path's value is True, the matching files are considered to be managed by the file bundle and updated automatically when meltano upgrade is run.

How to use

Manage this extra:

files:
- name: dbt
update:
transform/dbt_project.yml: false
profiles/*.yml: true

If a file path starts with a %2A, it must be wrapped in quotes to be considered valid YAML. For example, using %2A.yml to match all .yml files:

files:

  • name: dbt update: '*.yml': true

Utilities

The utility plugin type represents all non-EL plugins. Plugins that were under the transformer (e.g. dbt) and orchestrator (e.g. Airflow, Dagster) plugin types are now included as utilities.

Also any additional pip package that exposes an executable can be added to your project as a utility. Meltano has a selection of available utilities listed on MeltanoHub, or you can easily add your own custom utility.

Meltano also has an Extension Developer Kit (EDK) that can be used to integrate existing data tools with Meltano.

Custom Utilities

Any pip package that exposes an executable can be added to your project as a custom utility.

meltano add --custom utility <plugin>

# For example:
meltano add --custom utility yoyo
(namespace): yoyo
(pip_url): yoyo-migrations
(executable): yoyo

You can then invoke the executable using meltano invoke:

meltano invoke <plugin> [<executable arguments>...]

# For example:
meltano invoke yoyo new ./migrations -m "Add column to foo"

The benefit of doing this as opposed to adding the package to requirements.txt or running pip install <package> directly is that any packages installed this way benefit from Meltano's virtual environment isolation. This avoids dependency conflicts between packages.

Mappers

Mappers allow you to transform or manipulate data after extraction and before loading. Common applications include:

  • Streams/properties can be aliased to provide custom naming downstream.
  • Stream records can be filtered based on any user-defined logic.
  • Properties can be transformed inline (i.e. converting types, sanitizing PII data).
  • Properties can be removed from the stream.
  • New properties can be added to the stream.

Note that mappers are currently only available when using meltano run.

How to use

You can install mappers like any other other plugin using meltano add:

$ meltano add mapper transform-field
2024-01-01T00:25:40.604941Z [info ] Installing mapper 'transform-field'
2024-01-01T00:25:53.152127Z [info ] Installed mapper 'transform-field'

To learn more about mapper 'transform-field', visit https://github.com/transferwise/pipelinewise-transform-field

Mappers are unique in that after install you don't invoke them directly. Instead you define mappings by name and add a config object for each mapping. This config object is passed to the mapper when the mapping name is called as part of a meltano run invocation. Note that this differs from other plugins, as you're not invoking a plugin name - but referencing the mapping name instead. Additionally, the requirements for the config object itself will vary by plugin.

So given a mapper with mappings configured like so:

mappers:
- name: transform-field
variant: transferwise
pip_url: pipelinewise-transform-field
executable: transform-field
mappings:
- name: hide-gitlab-secrets
config:
transformations:
- field_id: "author_email"
tap_stream_name: "commits"
type: "MASK-HIDDEN"
- field_id: "committer_email"
tap_stream_name: "commits"
type: "MASK-HIDDEN"
- name: null-created-at
config:
transformations:
- field_id: "created_at"
tap_stream_name: "accounts"
type: "SET-NULL"

You can then invoke the mappings by name:


# hide-gitlab-secrets will resolve to mapping with the same name. In this case, that mapping will perform two actions.
# Transform the "author_email" field in the "commits" stream and hide the email address.
# Transform the "committer_email" field in the "commits" stream and hide the email address.
$ meltano run tap-gitlab hide-gitlab-secrets target-jsonl

# null-created-at will resolve to mapping with the same name. In this case, that mapping will perform one action.
# Transform the "created_at" field in the "accounts" stream and set it to null.
$ meltano run tap-someapi null-created-at target-jsonl

You can also invoke multiple mappings at once in series:

$ tap-someapi fix-null-id fix-country-code target-jsonl

Each mapping will execute in a unique process instance of the mapper plugin. That means that you can also call mappings that leverage the same plugin at multiple locations numerous times within the run invocation:


# Fix any null country codes using transform-field mapper.
# Set the customers region based on their country code using your own mapper.
# Mask the id if the customer is in the EU region using transform-field mapper.
$ tap-someapi fix-null-country set-region-from-country mask-id-if-eu target-jsonl