Data Engineering

dbt Practice Quiz

dbt Practice Quiz — Study Guide

dbt Practice Quiz: Complete Study Guide

dbt (data build tool) has become the industry standard for transforming data inside modern data warehouses. Whether you're a data analyst writing your first SQL model or a data engineer building production pipelines, understanding dbt's core concepts will make you dramatically more productive. This guide covers everything you need to ace the quiz — from basic project structure to advanced Jinja macros.


dbt Basics & Project Structure

dbt projects follow a predictable folder layout:

my_project/
├── dbt_project.yml       # Project configuration
├── models/               # SQL transformation files
│   ├── staging/
│   └── marts/
├── seeds/                # CSV files loaded as tables
├── macros/               # Reusable Jinja functions
├── tests/                # Singular (custom) tests
└── snapshots/            # Slowly changing dimension logic

The dbt_project.yml file is the heart of your project — it defines the project name, model paths, and default materializations.

The dbt CLI

You interact with dbt through the command line:

CommandWhat it does
dbt runExecute all models
dbt testRun all tests
dbt seedLoad CSV seed files
dbt compileCompile SQL without running
dbt docs generateBuild documentation site
Model selection lets you target specific models using the --select flag:
  • dbt run --select stg_orders — run one model
  • dbt run --select staging.* — run all models in the staging folder
  • dbt run --select +stg_orders — run stg_orders and all its upstream dependencies

  • The ref() and source() Functions

    These two Jinja functions are the backbone of dbt's dependency graph.

    ref() — Reference Other Models

    -- models/marts/orders.sql
    SELECT *
    FROM {{ ref('stg_orders') }}

    {{ ref('stg_orders') }} does two important things:

  • Resolves the correct schema/database path at compile time (so you never hardcode schema names)
  • Tells dbt about the dependency so models run in the right order
  • Think of ref() as an import statement — it connects your models together into a DAG (Directed Acyclic Graph).

    source() — Reference Raw Data

    -- models/staging/stg_orders.sql
    SELECT *
    FROM {{ source('raw', 'orders') }}

    {{ source('raw', 'orders') }} points to a raw table that dbt *didn't* create — data loaded by your EL tool (Fivetran, Airbyte, etc.). Sources are declared in a schema.yml file and unlock freshness checks.

    # models/staging/schema.yml
    sources:
      - name: raw
        tables:
          - name: orders
            loaded_at_field: _loaded_at
            freshness:
              warn_after: {count: 12, period: hour}
              error_after: {count: 24, period: hour}

    Run dbt source freshness to check whether your raw data is stale.


    Materializations

    A materialization controls how dbt physically stores your model in the warehouse.

    MaterializationWhat it createsBest for
    viewA database viewLightweight, frequently changing logic
    tableA full table (rebuilt each run)Medium-sized, stable datasets
    incrementalAppends/merges only new rowsLarge tables where full rebuilds are too slow
    ephemeralNo physical object (CTE only)Intermediate logic you don't need to query directly

    Incremental Models

    Incremental models are the solution when your table has millions of rows and you only need to process *new* data each run:

    -- models/marts/fct_events.sql
    {{ config(materialization='incremental', unique_key='event_id') }}

    SELECT event_id, user_id, event_at FROM {{ ref('stg_events') }}

    {% if is_incremental() %} WHERE event_at > (SELECT MAX(event_at) FROM {{ this }}) {% endif %}

  • {{ this }} refers to the existing table in the warehouse
  • unique_key tells dbt how to handle duplicates (upsert logic)
  • On the first run, dbt builds the full table; subsequent runs only process new rows
  • Ephemeral Models

    An ephemeral model is never materialized in the database. Instead, dbt inlines it as a Common Table Expression (CTE) wherever it's referenced. Use ephemeral models for intermediate transformation steps that don't need to be queried independently.

    # dbt_project.yml
    models:
      my_project:
        intermediate:
          +materialized: ephemeral


    Testing

    dbt has two types of tests to validate your data.

    Generic Tests (Schema Tests)

    Defined in schema.yml, these are reusable tests applied to columns:

    models:
      - name: stg_orders
        columns:
          - name: order_id
            tests:
              - unique
              - not_null
          - name: status
            tests:
              - accepted_values:
                  values: ['placed', 'shipped', 'delivered']

    unique and not_null are generic tests — dbt ships with four built-in ones: unique, not_null, accepted_values, and relationships.

    Singular Tests

    Singular tests are custom SQL files in the tests/ folder. They return rows that *fail* the test — if zero rows are returned, the test passes:

    -- tests/assert_orders_total_positive.sql
    SELECT order_id
    FROM {{ ref('fct_orders') }}
    WHERE total_amount < 0


    Seeds

    Seeds are CSV files that dbt loads into your warehouse as tables. They're perfect for small reference datasets like country codes or status mappings:

    seeds/
    └── country_codes.csv

    Run dbt seed to load them. Reference seeds just like models: {{ ref('country_codes') }}.


    Jinja & Macros

    dbt uses Jinja templating to add logic to SQL. You've already seen {{ ref() }} and {{ source() }} — those are Jinja expressions.

    Macros are reusable Jinja functions defined in the macros/ folder:

    -- macros/cents_to_dollars.sql
    {% macro cents_to_dollars(column_name) %}
      ({{ column_name }} / 100.0)::NUMERIC(10,2)
    {% endmacro %}

    Use it in any model:

    SELECT
      order_id,
      {{ cents_to_dollars('amount_cents') }} AS amount_dollars
    FROM {{ ref('stg_orders') }}

    Macros eliminate repetition — write once, reuse everywhere.


    Key Takeaways

  • ref() connects dbt models together and builds the dependency graph; source() connects to raw, externally-loaded tables and enables freshness monitoring.
  • Choose your materialization based on data size and query patterns: view for lightweight logic, table for medium datasets, incremental for large tables (use unique_key for upsert behavior), and ephemeral for intermediate CTEs.
  • Generic tests (unique, not_null, etc.) are declared in schema.yml; singular tests are custom SQL files in the tests/ folder that return failing rows.
  • Seeds are CSV files loaded with dbt seed — great for small lookup tables referenced with {{ ref() }}.
  • Jinja and macros make your SQL DRY and dynamic — master {{ ref() }}, {{ source() }}, {{ this }}, and {% if is_incremental() %} and you'll handle most real-world dbt patterns.