Getting Started with DWH
DWH is a lightweight library that provides a unified interface to connect, introspect, and query popular databases. This guide will help you get up and running quickly.
Installation
Add this line to your application's Gemfile:
gem 'dwh'
And then execute:
bundle install
Or install it yourself as:
gem install dwh
Basic Usage
Creating Your First Connection
require 'dwh'
# Connect to PostgreSQL
postgres = DWH.create(:postgres, {
host: 'localhost',
database: 'mydb',
username: 'user',
password: 'password'
})
# Connect to SQLite (lightweight, embedded)
sqlite = DWH.create(:sqlite, {
file: '/path/to/analytics.db'
})
# Connect to DuckDB (in-memory)
duckdb = DWH.create(:duckdb, {
file: ':memory:'
})
Your First Query
# Execute a simple query
results = postgres.execute("SELECT * FROM users LIMIT 10")
# Results are returned as arrays by default
results.each do |row|
puts row.inspect
end
Exploring Your Database
# List all tables
tables = postgres.tables
puts "Available tables: #{tables.map(&:physical_name)}"
# Get detailed information about a table
table_info = postgres.('users')
puts "Table: #{table_info.physical_name}"
puts "Schema: #{table_info.schema}"
puts "Columns:"
table_info.columns.each do |column|
puts " #{column.name} (#{column.normalized_data_type})"
end
# Get table statistics
stats = postgres.stats('users', date_column: 'created_at')
puts "Row count: #{stats.row_count}"
puts "Date range: #{stats.date_start} to #{stats.date_end}"
Different Output Formats
# Get results as arrays (default)
array_results = postgres.execute("SELECT id, name FROM users LIMIT 5")
# Get results as hashes/objects
hash_results = postgres.execute("SELECT id, name FROM users LIMIT 5", format: :object)
# Get results as CSV string
csv_results = postgres.execute("SELECT id, name FROM users LIMIT 5", format: :csv)
# Stream large results to a file
postgres.execute_stream("SELECT * FROM large_table", File.open('output.csv', 'w'))
Streaming Large Datasets
# stream data while tracting stats and previewing data in a separate thread
stats = DWH::StreamingStats.new(10000) # num of rows to keep in memory for previewing
exec_thread = Thread.new {
postgres.execute_stream("SELECT * FROM large_table", File.open('output.csv', 'w'), stats: stats)
}
mon_thread = Thread.new{
loop do
break if exec_thread.alive?
puts stats.data.last
end
}
[exec_thread, mon_thread].each(&:join)
# Stream with block processing
postgres.stream("SELECT * FROM large_table") do |chunk|
process_chunk(chunk)
end
Advanced Usage
Connection Pooling
# Create a connection pool
pool = DWH.pool('my_postgres_pool', :postgres, {
host: 'localhost',
database: 'mydb',
username: 'user',
password: 'password'
}, size: 10, timeout: 5)
# Use the pool
pool.with do |connection|
results = connection.execute("SELECT COUNT(*) FROM users")
end
# Shutdown the pool when done
DWH.shutdown('my_postgres_pool')
Using Extra Connection Params
DWH uses an existing Ruby gem where possible to connect to each target database. When that is not possible and the db supports a REST endpoint, we will use Faraday.
Using extra_connection_params
key you can pass in a Hash of options that the target connector supports but DWH doesn't make first class. The main config options in DWH are based on required and common needs.
Sending Postgres 'connect_timeout' property supported by the PG gem
pg = DWH.create(:postgres, {
host: 'localhost',
database: 'mydb',
username: 'user',
password: 'password',
extra_connection_params: {
connect_timeout: 5
}
})
Database Functions
DWH provides a function translation layer that converts common SQL functions to database-specific syntax:
# Date truncation
postgres.truncate_date('week', 'created_at') # => DATE_TRUNC('week', created_at)
sqlserver.truncate_date('week', 'created_at') # => DATETRUNC(week, created_at)
# Date literals
postgres.date_literal('2025-01-01') # => '2025-01-01'::DATE
sqlserver.date_literal('2025-01-01') # => '2025-01-01'
# Null handling
adapter.coalesce('column1', 'column2', "'default'") # => COALESCE(column1, column2, 'default')
adapter.null_if('column1', "'empty'") # => NULLIF(column1, 'empty')
# String functions
adapter.trim('column_name') # => TRIM(column_name)
adapter.upper_case('column_name') # => UPPER(column_name)
adapter.lower_case('column_name') # => LOWER(column_name)
Core API
Standardized API across adapters:
connection : creates a reusuable connection based on config hash passed in
tables(schema: nil, catalog: nil) : returns a list of tables from the default connection or from the specified schema and catalog
metadata(table_name, schema: nil, catalog: nil) : provides metadata about a table
stats(table_name, date_column: nil) : provides table row count and date range
execute(sql, format: :array, retries: 0) : runs a query and returns in given format
execute_stream(sql, io, stats: nil) : runs a query and streams it as csv into the given io
Error Handling
begin
results = adapter.execute("SELECT * FROM non_existent_table")
rescue DWH::ExecutionError => e
puts "Query failed: #{e.}"
rescue DWH::ConnectionError => e
puts "Connection failed: #{e.}"
rescue DWH::ConfigError => e
puts "Configuration error: #{e.}"
end