Set up a platform for data pipelines
Give your data team the ability to deploy and run workflows.
In this tutorial, you’ll learn how to set up a platform for data pipelines with Prefect Cloud. We’ll show you how to create workspaces, work pools, workers, and then deploy flows to the infrastructure you’ve provisioned.
Prerequisites
To complete this tutorial, you’ll need:
- Git
- Docker
- Python 3.9 or newer
- A paid Prefect Cloud account (free accounts do not support multiple workspaces)
You’ll also need to clone the Prefect demo repository and install the prefect library:
Create workspaces
We recommend one workspace per development environment. In this tutorial, we’ll create a production workspace and a staging workspace. To create multiple workspaces on Prefect Cloud, you’ll need a paid account.
- Head to https://app.prefect.cloud/ and sign in to your paid account.
- If you haven’t created a workspace yet, you’ll be prompted to choose a name for your workspace.
Name your first workspace
production
.
Next, create a staging workspace:
- Click the workspace switcher in the sidebar
- Go to Switch workspace / Create a new workspace
- Name this workspace
staging
.
You should now have two workspaces named production
and staging
.
Create work pools
Work pools let you dynamically provision infrastructure for your flows. First, create a work pool for your production workspace.
- Switch to the
production
workspace using the workspace switcher in the sidebar. - Click Work pools in the sidebar.
- Create a new Docker hybrid work pool, named
default-work-pool
and using the default configuration.
Next, repeat this steps and create a second work pool named default-work-pool
in the staging
workspace.
You should now have one work pool in each workspace.
Start workers
Because you’re using hybrid work pools, you need to start at least one worker for each work pool to process flow runs.
First, you need to authenticate to Prefect Cloud using the CLI.
Once you’ve authenticated to Prefect Cloud, open two terminal windows and run two workers:
You can use prefect cloud workspace ls
to see the fully qualified names of the workspaces in your account
Deploy and run flows
Run the following script from the demo repository to create flow runs in each workspace:
Each script invocation will take approximately one minute to finish. If you check the logs for the worker running in the staging workspace, you will see errors, this is expected.
View flow run activity
Now that you’ve deployed and run your flows, you can view the run details in the UI. If you go to the Home page for each workspace, you can see run activity.
Here’s what happened:
- The script created two deployments of the flow, one deployment per workspace.
- The script then generated multiple flow runs for each deployment.
- The activity graph on the Home page shows successful runs as green and failed runs as red.
- Notice that the flow runs in the
production
workspace succeeded, but some of the flow runs in thestaging
workspace failed.
This output is expected, you’ll learn how to debug these failures in the next tutorial.
Optional: Manage team access
If you’re a Prefect Enterprise customer, you can use Teams to manage access to your workspaces and work pools.
Next steps
In this tutorial, you successfully set up a platform for data pipelines, and then deployed and ran flows on the infrastructure you provisioned. If this doesn’t perfectly match your use case, here are some variations you can explore:
- You can set up a self-managed instance with Prefect server.
- You can provision users with SSO or just use service accounts.
- You can deploy flows with Kubernetes, serverless compute, or Prefect Managed infrastructure.
- You can write flows from scratch.
- You can automate deployments with GitHub Actions.
Next, learn how to debug a flow run when things go wrong.
Need help? Book a meeting with a Prefect Product Advocate to get your questions answered.
Was this page helpful?