dbt Practice Quiz
dbt Practice Quiz — Study Guide
dbt Practice Quiz: Complete Study Guide
dbt (data build tool) has become the industry standard for transforming data inside modern data warehouses. Whether you're a data analyst writing your first SQL model or a data engineer building production pipelines, understanding dbt's core concepts will make you dramatically more productive. This guide covers everything you need to ace the quiz — from basic project structure to advanced Jinja macros.
dbt Basics & Project Structure
dbt projects follow a predictable folder layout:
my_project/
├── dbt_project.yml # Project configuration
├── models/ # SQL transformation files
│ ├── staging/
│ └── marts/
├── seeds/ # CSV files loaded as tables
├── macros/ # Reusable Jinja functions
├── tests/ # Singular (custom) tests
└── snapshots/ # Slowly changing dimension logicThe dbt_project.yml file is the heart of your project — it defines the project name, model paths, and default materializations.
The dbt CLI
You interact with dbt through the command line:
| Command | What it does |
|---|---|
dbt run | Execute all models |
dbt test | Run all tests |
dbt seed | Load CSV seed files |
dbt compile | Compile SQL without running |
dbt docs generate | Build documentation site |
--select flag:
dbt run --select stg_orders — run one modeldbt run --select staging.* — run all models in the staging folderdbt run --select +stg_orders — run stg_orders and all its upstream dependenciesThe ref() and source() Functions
These two Jinja functions are the backbone of dbt's dependency graph.
ref() — Reference Other Models
-- models/marts/orders.sql
SELECT *
FROM {{ ref('stg_orders') }}{{ ref('stg_orders') }} does two important things:
Think of ref() as an import statement — it connects your models together into a DAG (Directed Acyclic Graph).
source() — Reference Raw Data
-- models/staging/stg_orders.sql
SELECT *
FROM {{ source('raw', 'orders') }}{{ source('raw', 'orders') }} points to a raw table that dbt *didn't* create — data loaded by your EL tool (Fivetran, Airbyte, etc.). Sources are declared in a schema.yml file and unlock freshness checks.
# models/staging/schema.yml
sources:
- name: raw
tables:
- name: orders
loaded_at_field: _loaded_at
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}Run dbt source freshness to check whether your raw data is stale.
Materializations
A materialization controls how dbt physically stores your model in the warehouse.
| Materialization | What it creates | Best for |
|---|---|---|
view | A database view | Lightweight, frequently changing logic |
table | A full table (rebuilt each run) | Medium-sized, stable datasets |
incremental | Appends/merges only new rows | Large tables where full rebuilds are too slow |
ephemeral | No physical object (CTE only) | Intermediate logic you don't need to query directly |
Incremental Models
Incremental models are the solution when your table has millions of rows and you only need to process *new* data each run:
-- models/marts/fct_events.sql
{{ config(materialization='incremental', unique_key='event_id') }}SELECT event_id, user_id, event_at
FROM {{ ref('stg_events') }}
{% if is_incremental() %}
WHERE event_at > (SELECT MAX(event_at) FROM {{ this }})
{% endif %}
{{ this }} refers to the existing table in the warehouseunique_key tells dbt how to handle duplicates (upsert logic)Ephemeral Models
An ephemeral model is never materialized in the database. Instead, dbt inlines it as a Common Table Expression (CTE) wherever it's referenced. Use ephemeral models for intermediate transformation steps that don't need to be queried independently.
# dbt_project.yml
models:
my_project:
intermediate:
+materialized: ephemeralTesting
dbt has two types of tests to validate your data.
Generic Tests (Schema Tests)
Defined in schema.yml, these are reusable tests applied to columns:
models:
- name: stg_orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: status
tests:
- accepted_values:
values: ['placed', 'shipped', 'delivered']unique and not_null are generic tests — dbt ships with four built-in ones: unique, not_null, accepted_values, and relationships.
Singular Tests
Singular tests are custom SQL files in the tests/ folder. They return rows that *fail* the test — if zero rows are returned, the test passes:
-- tests/assert_orders_total_positive.sql
SELECT order_id
FROM {{ ref('fct_orders') }}
WHERE total_amount < 0Seeds
Seeds are CSV files that dbt loads into your warehouse as tables. They're perfect for small reference datasets like country codes or status mappings:
seeds/
└── country_codes.csvRun dbt seed to load them. Reference seeds just like models: {{ ref('country_codes') }}.
Jinja & Macros
dbt uses Jinja templating to add logic to SQL. You've already seen {{ ref() }} and {{ source() }} — those are Jinja expressions.
Macros are reusable Jinja functions defined in the macros/ folder:
-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name) %}
({{ column_name }} / 100.0)::NUMERIC(10,2)
{% endmacro %}Use it in any model:
SELECT
order_id,
{{ cents_to_dollars('amount_cents') }} AS amount_dollars
FROM {{ ref('stg_orders') }}Macros eliminate repetition — write once, reuse everywhere.
Key Takeaways
ref() connects dbt models together and builds the dependency graph; source() connects to raw, externally-loaded tables and enables freshness monitoring.view for lightweight logic, table for medium datasets, incremental for large tables (use unique_key for upsert behavior), and ephemeral for intermediate CTEs.unique, not_null, etc.) are declared in schema.yml; singular tests are custom SQL files in the tests/ folder that return failing rows.dbt seed — great for small lookup tables referenced with {{ ref() }}.{{ ref() }}, {{ source() }}, {{ this }}, and {% if is_incremental() %} and you'll handle most real-world dbt patterns.