For every star on GitHub, we'll donate $2 to clean up our waterways. Star us now!
Most of Meltano’s features are available without installing any additional packages. However, some niche or environment-specific features require the installation of Python extras. The following extras are currently supported:
mssql
- Support for Microsoft SQL Servers3
- Support for using S3 as a state backendgcs
- Support for using Google Cloud Storage as a state backendazure
- Support for using Azure Blob Storage as a state backendMeltano extensions are lightweight executables which allow you to integrate existing data tools with Meltano. Extensions allow the developer to add additional features like pre/post-hooks to run before/after Meltano executes the application, as well enabling project scaffolding to be customized for each plugin. This project scaffolding was previously accomplished via files bundles.
The Meltano Extension Developer Kit (EDK) was created to make it easier for developers to build Meltano extensions. For more information on how to build your own EDK-based Meltano extension, see the EDK docs.
Meltano has traditionally assigned a plugin type to each plugin based on their functionality. These plugin types were/are used in the Meltano codebase to activate plugin type specific features (i.e. piping Singer taps and targets together, running dbt deps before each run, or compiling and removing the Airflow config.cfg configuration file to avoid storing sensitive credentials). This approach caused some challenges around getting new features implemented and accepted by the entire user base of Meltano because only one implementation was allowed:
The new approach is to group all non-EL plugins under the utility
plugin type, leaving only the following plugin types:
As part of this new approach the logic for non-EL plugin specific features has been extracted out of the Meltano codebase and into a Meltano extension. This allows the community to iterate on plugin features more quickly without relying on merging features into the Meltano core repository. It also allows the community to develop many variants of the wrapper logic for a plugin with different features.
The transformer and orchestrator plugin types are still supported for now but will eventually be phased out as utilities take over.
This FAQ section is for tap-airbyte-wrapper which is a Singer tap which enables any Airbyte source to be used as a Meltano extractor.
The Singer specification was started in 2016 by Stitch Data. It specified a data transfer format that would allow any number of data systems, called taps, to send data to any data destinations, called targets. Airbyte was incorporated in 2020 and created their own specification that was heavily inspired by Singer. There are differences, but the core of each specification is sending new-line delimited JSON data from STDOUT of a tap to STDIN of a target.
A community member used the Meltano Singer SDK to write tap-airbyte-wrapper. This wrapper connector calls the Docker image for a given Airbyte Source and translates the messages into a Singer-compatible format. The output of the tap is conformed to the Singer standard and then can be sent to any Singer target, many of which are listed on MeltanoHub.
We first recommend you read through this FAQ to see if your question has been answered. We next recommend searching the Meltano docs to see if your answer is there.
If you still need help we’d recommend either filing an issue in the Meltano Github repository or join our Slack community to ask there.
We also have weekly Office Hours where you can talk to the Meltano team and ask any questions you have to us!
No. This integration makes it possible to directly run Airbyte Source Connectors within your Meltano project. There’s no need to run the Airbyte UI or API to use this feature.
Airbyte destinations are not supported with tap-airbyte-wrapper. This is a feature that could be added in the future.
Any connector that is listed on MeltanoHub as being maintained by airbyte
can be installed and run as you would any other connector. Use meltano add extractor <tap>
to add and install.
Based on the current implementation and testing, the main challenge is putting these connectors into production with Meltano. See the next question for more details on this.
The short answer is: It depends where its deployed!
The main potential challenge with putting this into production is if you use Docker to package and deploy your Meltano project. If you’re able to set up Docker-in-docker, then a Dockerized Meltano project with tap-airbyte-wrapper will work as expected. Since each Airbyte connector is itself a Docker image, running inside a dockerized Meltano project would require “docker in docker” which has some possible challenges around privileges and permissions on certain systems.
For example, AWS ECS does not support docker in docker and Meltano Cloud does not currently have plans to support container based connectors at this time either. Although a simple EC2 instance with Docker available would work as expected.
Within Github Codespaces, adding support for docker-in-docker was as easy as adding a few lines to the devcontainer.json file.
It is also possible to run Meltano on GitHub Actions. It’s likely possible to update this action to supporting docker-in-docker as well.
There are several articles on the web which discuss docker-in-docker in more detail:
We are interested in exploring this challenge with the community. Please share in this discussion or connect with us in Slack about your experiences with attempting to put this into production.
This integration will work with any dockerized Airbyte source, whether it was made with their CDK or not.
To configure your Airbyte connector as a custom plugin in Meltano you can copy the configuration shown below into your meltano.yml file then replace name
, and the value
for the airbyte_spec.image to reference your docker image. You can configure your connector without defining all of the settings and their metadata.
- name: tap-pokeapi # REPLACE THIS WITH YOUR CONNECTOR NAME
variant: airbyte
executable: tap-airbyte
namespace: tap_airbyte
pip_url: git+https://github.com/MeltanoLabs/tap-airbyte.git
capabilities:
- catalog
- state
- discover
- about
- stream-maps
- schema-flattening
settings:
- description: Airbyte image to run
kind: string
label: Airbyte Spec Image
name: airbyte_spec.image
value: airbyte/source-pokeapi # REPLACE THIS WITH YOUR IMAGE NAME
- description: Airbyte image tag
kind: string
label: Airbyte Spec Tag
name: airbyte_spec.tag
value: latest
- [INSERT OTHER SETTINGS HERE]
Running Airbyte sources with Meltano brings a number of benefits to those connectors. With Meltano you get the ability to run stream maps to adjust data on the fly, you can define multiple environments to override configuration depending on where the EL pipeline is run, and you get the benefits of version control since everything is defined in your meltano.yml file.
We also recently added support for alternative state backends to Meltano. This means if you want to store the incremental state between runs in a place other than your system database or local filesystem, you can! Currently Meltano supports AWS S3, Azure Blob Storage, and Google Cloud Storage as alternative state backends.
Since tap-airbyte-wrapper was written with the SDK, this also unlocks the BATCH
message format which can help with overall pipeline throughput to compatible targets.
Nope! Meltano manages state for you and you can use any command you normally would with Singer-based connectors.
Transferring a pipeline to Meltano would require determining your strategy for handling incremental state and overall table structure.
Because with Meltano you would be using a different loader (aka destination or target) than you would in your Airbyte pipeline, it’s likely that the table structure that would appear in your destination would be different. Due to this difference, we have two recommendations for migrating a pipeline.
The first may be possible only if your data source allows you to quickly backfill data. You can do a full sync with your Meltano pipeline to get all of the data into the new format and then switch any downstream process to point at this new table. However, for many sources this may not be possible.
Another option would be to stop your Airbyte pipeline and have the Meltano pipeline start at that point. For example, if the Airbyte pipeline stops on 2022-12-31 you could have the Meltano pipeline start at 2023-01-01 and write to a different table. You could then use a tool like dbt to transform both tables into a common format and then union them together into a single source.
There are tradeoffs for both of these methods and you would have to determine where to best invest your time depending on your needs.
If you have other ways to make this transition easier, we’d love to discuss them in Slack or Office Hours with you!
meltano run
and meltano elt
? #
Yes! Since the wrapper is based on the Meltano Singer SDK these connectors work just as they would a python-based Singer connector. You can run, elt, invoke, test, and configure them as you normally would.
Yes! This is a unique feature of Meltano that is not found in a comprehensive manner in other modern EL tools. With stream maps users are able to hash, filter, duplicate, and alias any column or table during their extract. Because this Airbyte integration was built using the SDK, streams maps can be defined on the connector itself (SDK docs), or they can be added separately as a mapper in between a tap and target executing using meltano run
. See our tutorial for an example.
Yes! For each connector added to the Hub we have mapped the configuration output from the connector to our metadata spec on MeltanoHub. This means interactive configuration will work normally. Note that there was a bug with nested config which was addressed in release 2.13.0, which could affect your configuration behavior.
Errors from the Airbyte connector are output via LOG and TRACE messages. If a container has a non-zero exit code then the wrapper will output on stderr along with the failing Airbyte command. Errors from the wrapper are unlikely but if there are any then they will not have the preceding characteristics.
Submit a bug report in the airbytehq/airbyte repo.
Airbyte connectors are run inside Docker containers, this means they don’t automatically have access to your local file system. To access local files you can use the docker_mount
setting. An example of using Airbyte’s tap-file to access a CSV file in a “data” directory within my local meltano project would look like this:
config:
docker_mounts: [{"source": "/<YOUR_FULL_LOCAL_PATH>/", "target": "/local/", "type": "bind"}]
airbyte_spec:
image: airbyte/source-file
airbyte_config:
dataset_name: test_file
format: csv
url: /local/data/test.csv
Based on some testing the original community contributor did, we see a less than 5% drop in overall throughput for the same source run natively via Docker versus via Meltano. For most sources this is an acceptable change for the workflow gains from running these connectors in a tool that supports truly custom and decentralized sources and enables the software engineering best practices of version control, isolated run environments, and continuous integration.
After seeing these plugins in the wild for several weeks, hundreds of successful invocations from users in the community, and successful production use cases (e.g. Harness) we’re comfortable taking these plugins out of the experimental phase.
We hope the community continue to test these out and push the integration further. Please open any issues or features requests related to the wrapper on MeltanoLabs/tap-airbyte-wrapper.
We don’t currently have plans for supporting Airbyte, or other container based connectors, in Meltano Cloud.
If this is something you’re interested in, please reach out to us via the Meltano Cloud Waitlist or on Slack.