Data Engineering

dbt Practice Quiz

dbt Practice Quiz — Study Guide

dbt Practice Quiz: Complete Study Guide

dbt (data build tool) has become the industry standard for transforming data inside modern data warehouses. Whether you're a data analyst writing your first SQL model or a data engineer building production pipelines, understanding dbt's core concepts will make you dramatically more productive. This guide covers everything you need to ace the quiz — from basic project structure to advanced Jinja macros.

dbt Basics & Project Structure

dbt projects follow a predictable folder layout:

my_project/
├── dbt_project.yml       # Project configuration
├── models/               # SQL transformation files
│   ├── staging/
│   └── marts/
├── seeds/                # CSV files loaded as tables
├── macros/               # Reusable Jinja functions
├── tests/                # Singular (custom) tests
└── snapshots/            # Slowly changing dimension logic

The dbt_project.yml file is the heart of your project — it defines the project name, model paths, and default materializations.

The dbt CLI

You interact with dbt through the command line:

Command	What it does
`dbt run`	Execute all models
`dbt test`	Run all tests
`dbt seed`	Load CSV seed files
`dbt compile`	Compile SQL without running
`dbt docs generate`	Build documentation site

Model selection lets you target specific models using the --select flag:

dbt run --select stg_orders — run one model

dbt run --select staging.* — run all models in the staging folder

dbt run --select +stg_orders — run stg_orders and all its upstream dependencies

The `ref()` and `source()` Functions

These two Jinja functions are the backbone of dbt's dependency graph.

`ref()` — Reference Other Models

-- models/marts/orders.sql
SELECT *
FROM {{ ref('stg_orders') }}

{{ ref('stg_orders') }} does two important things:

Resolves the correct schema/database path at compile time (so you never hardcode schema names)

Tells dbt about the dependency so models run in the right order

Think of ref() as an import statement — it connects your models together into a DAG (Directed Acyclic Graph).

`source()` — Reference Raw Data

-- models/staging/stg_orders.sql
SELECT *
FROM {{ source('raw', 'orders') }}

{{ source('raw', 'orders') }} points to a raw table that dbt *didn't* create — data loaded by your EL tool (Fivetran, Airbyte, etc.). Sources are declared in a schema.yml file and unlock freshness checks.

# models/staging/schema.yml
sources:
  - name: raw
    tables:
      - name: orders
        loaded_at_field: _loaded_at
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}

Run dbt source freshness to check whether your raw data is stale.

Materializations

A materialization controls how dbt physically stores your model in the warehouse.

Materialization	What it creates	Best for
`view`	A database view	Lightweight, frequently changing logic
`table`	A full table (rebuilt each run)	Medium-sized, stable datasets
`incremental`	Appends/merges only new rows	Large tables where full rebuilds are too slow
`ephemeral`	No physical object (CTE only)	Intermediate logic you don't need to query directly

Incremental Models

Incremental models are the solution when your table has millions of rows and you only need to process *new* data each run:

-- models/marts/fct_events.sql
{{ config(materialization='incremental', unique_key='event_id') }}
SELECT event_id, user_id, event_at
FROM {{ ref('stg_events') }}{% if is_incremental() %}
  WHERE event_at > (SELECT MAX(event_at) FROM {{ this }})
{% endif %}

{{ this }} refers to the existing table in the warehouse

unique_key tells dbt how to handle duplicates (upsert logic)

On the first run, dbt builds the full table; subsequent runs only process new rows

Ephemeral Models

An ephemeral model is never materialized in the database. Instead, dbt inlines it as a Common Table Expression (CTE) wherever it's referenced. Use ephemeral models for intermediate transformation steps that don't need to be queried independently.

# dbt_project.yml
models:
  my_project:
    intermediate:
      +materialized: ephemeral

Testing

dbt has two types of tests to validate your data.

Generic Tests (Schema Tests)

Defined in schema.yml, these are reusable tests applied to columns:

models:
  - name: stg_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'delivered']

unique and not_null are generic tests — dbt ships with four built-in ones: unique, not_null, accepted_values, and relationships.

Singular Tests

Singular tests are custom SQL files in the tests/ folder. They return rows that *fail* the test — if zero rows are returned, the test passes:

-- tests/assert_orders_total_positive.sql
SELECT order_id
FROM {{ ref('fct_orders') }}
WHERE total_amount < 0

Seeds

Seeds are CSV files that dbt loads into your warehouse as tables. They're perfect for small reference datasets like country codes or status mappings:

seeds/
└── country_codes.csv

Run dbt seed to load them. Reference seeds just like models: {{ ref('country_codes') }}.

Jinja & Macros

dbt uses Jinja templating to add logic to SQL. You've already seen {{ ref() }} and {{ source() }} — those are Jinja expressions.

Macros are reusable Jinja functions defined in the macros/ folder:

-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name) %}
  ({{ column_name }} / 100.0)::NUMERIC(10,2)
{% endmacro %}

Use it in any model:

SELECT
  order_id,
  {{ cents_to_dollars('amount_cents') }} AS amount_dollars
FROM {{ ref('stg_orders') }}

Macros eliminate repetition — write once, reuse everywhere.

Key Takeaways

ref() connects dbt models together and builds the dependency graph; source() connects to raw, externally-loaded tables and enables freshness monitoring.

Choose your materialization based on data size and query patterns: view for lightweight logic, table for medium datasets, incremental for large tables (use unique_key for upsert behavior), and ephemeral for intermediate CTEs.

Generic tests (unique, not_null, etc.) are declared in schema.yml; singular tests are custom SQL files in the tests/ folder that return failing rows.

Seeds are CSV files loaded with dbt seed — great for small lookup tables referenced with {{ ref() }}.

Jinja and macros make your SQL DRY and dynamic — master {{ ref() }}, {{ source() }}, {{ this }}, and {% if is_incremental() %} and you'll handle most real-world dbt patterns.