Adapter Configuration
This guide covers all the database adapters supported by DWH and their specific configuration options. Each adapter is designed to work with specific database clients and provides database-specific optimizations.
PostgreSQL Adapter
The PostgreSQL adapter uses the pg
gem and provides full-featured RDBMS support.
Basic Configuration
postgres = DWH.create(:postgres, {
host: 'localhost',
port: 5432, # Default: 5432
database: 'mydb',
schema: 'public', # Default: 'public'
username: 'user',
password: 'password',
client_name: 'My Application' # Default: 'DWH Ruby Gem'
})
SSL Configuration
# Basic SSL
postgres = DWH.create(:postgres, {
host: 'localhost',
database: 'mydb',
username: 'user',
password: 'password',
ssl: true,
extra_connection_params: {
sslmode: 'require' # disable, prefer, require, verify-ca, verify-full
}
})
# Certificate-based SSL
postgres = DWH.create(:postgres, {
host: 'localhost',
database: 'mydb',
username: 'user',
ssl: true,
extra_connection_params: {
sslmode: 'verify-full',
sslrootcert: '/path/to/ca-cert.pem',
sslcert: '/path/to/client-cert.pem',
sslkey: '/path/to/client-key.pem'
}
})
Advanced Configuration
postgres = DWH.create(:postgres, {
host: 'localhost',
database: 'mydb',
username: 'user',
password: 'password',
query_timeout: 3600, # seconds, default: 3600
extra_connection_params: {
application_name: 'Data Analysis Tool',
connect_timeout: 10,
options: '-c maintenance_work_mem=256MB'
}
})
Redshift Adapter
The Redshift adapter uses the pg
gem and provides full-featured RDBMS support.
Basic Configuration
redshift = DWH.create(:redshift, {
host: 'localhost',
port: 5432, # Default: 5432
database: 'mydb',
schema: 'public', # Default: 'public'
username: 'user',
password: 'password',
client_name: 'My Application' # Default: 'DWH Ruby Gem'
})
SSL Configuration
# Basic SSL
redshift = DWH.create(:redshift, {
host: 'localhost',
database: 'mydb',
username: 'user',
password: 'password',
ssl: true,
extra_connection_params: {
sslmode: 'require' # disable, prefer, require, verify-ca, verify-full
}
})
# Certificate-based SSL
redshift = DWH.create(:postgres, {
host: 'localhost',
database: 'mydb',
username: 'user',
ssl: true,
extra_connection_params: {
sslmode: 'verify-full',
sslrootcert: '/path/to/ca-cert.pem',
sslcert: '/path/to/client-cert.pem',
sslkey: '/path/to/client-key.pem'
}
})
Advanced Configuration
redshift = DWH.create(:redshift, {
host: 'localhost',
database: 'mydb',
username: 'user',
password: 'password',
query_timeout: 3600, # seconds, default: 3600
extra_connection_params: {
application_name: 'Data Analysis Tool',
connect_timeout: 10,
options: '-c maintenance_work_mem=256MB'
}
})
Snowflake
Snowflake adapter use the REST apis (https) to connect and query. This adapter also supports Multi-Database authentication methods: Personal Access Token, Key Pair, and OAuth.
Basic connection with Personal Access Token
DWH.create(:snowflake, {
auth_mode: 'pat',
account_identifier: 'myorg-myaccount',
personal_access_token: 'your-token-here',
warehouse: 'COMPUTE_WH',
database: 'ANALYTICS',
schema: 'PUBLIC'
})
Connection with Key Pair Authentication
DWH.create(:snowflake, {
auth_mode: 'kp',
account_identifier: 'myorg-myaccount.us-east-1',
username: 'john_doe',
private_key: '/path/to/private_key.pem',
warehouse: 'COMPUTE_WH',
database: 'ANALYTICS'
})
Connecting with OAuth
This is the Snowflake OAuth mechanism. Not the External one. You must first create an OAuth security integration and apply to the releveant roles. Follow this document for more.
adapter = DWH.create(:snowflake, {
auth_mode: 'oauth',
account_identifier: 'myorg-myaccount.us-east-1',
oauth_client_id: '<YOUR_CLIENT_ID>',
oauth_client_secret: '<YOUR_CLIENT_SECRET>',
oauth_redirect_url: 'https://localhost:3030/some/path',
database: 'ANALYTICS',
client_name: 'myapp' # sent as user agent header value
})
To successfully use OAuth you have to pass the adapter valid access and refresh tokens. Or, it can generate them from a valid authorization code.
The typical flow is like so:
- Generate an authorization code by visiting the url generated by
adapter.authorization_url.
This will redirect to the configuredoauth_redirect_url.
You must be able to retrieve thecode
from there. - Take the code from above and generate new access tokens:
adapter.generate_oauth_tokens(code)
. This will return Hash with access_token and refresh_token. You can cache and reuse this until the refresh_token gets expired. This method will also apply the token to the current adapter instance. - You can apply an existing set of tokens like so:
adapter.apply_oauth_tokens(access_token: token, refresh_token: token, expires_at: Time.now)
MySQL Adapter
The MySQL adapter uses the mysql2
gem. Note that MySQL's concept of "database" maps to "schema" in DWH.
Basic Configuration
mysql = DWH.create(:mysql, {
host: '127.0.0.1', # Use 127.0.0.1 for local Docker instances
port: 3306, # Default: 3306
database: 'mydb',
username: 'user',
password: 'password',
client_name: 'My Application' # Default: 'DWH Ruby Gem'
})
SSL Configuration
# Basic SSL
mysql = DWH.create(:mysql, {
host: '127.0.0.1',
database: 'mydb',
username: 'user',
password: 'password',
ssl: true, # Defaults ssl_mode to 'required'
extra_connection_params: {
ssl_mode: 'verify_identity', # disabled, preferred, required, verify_ca, verify_identity
sslca: '/path/to/ca-cert.pem',
sslcert: '/path/to/client-cert.pem',
sslkey: '/path/to/client-key.pem'
}
})
Advanced Configuration
mysql = DWH.create(:mysql, {
host: 'mysql.example.com',
database: 'analytics',
username: 'analyst',
password: 'password',
client_name: "My App", # defaults to 'DWH Ruby Gem'
query_timeout: 1800, # seconds, default: 3600
extra_connection_params: {
encoding: 'utf8mb4',
read_timeout: 60,
write_timeout: 60,
connect_timeout: 10
}
})
SQL Server Adapter
The SQL Server adapter uses the tiny_tds
gem and supports both on-premises and Azure SQL Server.
Basic Configuration
sqlserver = DWH.create(:sqlserver, {
host: 'localhost',
port: 1433, # Default: 1433
database: 'mydb',
username: 'sa',
password: 'password',
client_name: 'My Application' # Default: 'DWH Ruby Gem'
})
Azure SQL Server
azure_sql = DWH.create(:sqlserver, {
host: 'myserver.database.windows.net',
database: 'mydb',
username: 'myuser@myserver',
password: 'password',
azure: true,
client_name: 'My Application'
})
Advanced Configuration
sqlserver = DWH.create(:sqlserver, {
host: 'sql.example.com',
database: 'analytics',
username: 'analyst',
password: 'password',
query_timeout: 1800, # seconds, default: 3600
extra_connection_params: {
container: true, # For SQL Server running in containers
use_utf16: false, # Character encoding options
timeout: 60, # Connection timeout
login_timeout: 60 # Login timeout
}
})
Multi-Database Operations
# List tables in another database
tables = sqlserver.tables(catalog: 'other_database')
# Get metadata for table in another database
= sqlserver.('other_database.dbo.my_table')
# OR
= sqlserver.('my_table', catalog: 'other_database')
DuckDB Adapter
The DuckDB adapter uses the ruby-duckdb
gem for in-process analytical queries. This requires DuckDB header files and library to already be installed.
Basic Configuration
# File-based database
duckdb = DWH.create(:duckdb, {
file: '/path/to/my/database.duckdb',
schema: 'main' # Default: 'main'
})
# In-memory database
duckdb = DWH.create(:duckdb, {
file: ':memory:'
})
Read-Only Mode
duckdb = DWH.create(:duckdb, {
file: '/path/to/readonly/database.duckdb',
duck_config: {
access_mode: 'READ_ONLY'
}
})
Advanced Configuration
duckdb = DWH.create(:duckdb, {
file: '/path/to/my/database.duckdb',
duck_config: {
access_mode: 'READ_WRITE',
max_memory: '2GB',
threads: 4,
temp_directory: '/tmp/duckdb'
}
})
SQLite Adapter
The SQLite adapter uses the sqlite3
gem for lightweight embedded database analytics. It's optimized for analytical workloads with WAL mode enabled by default for better concurrent read performance.
Basic Configuration
# File-based database
sqlite = DWH.create(:sqlite, {
file: '/path/to/my/database.sqlite'
})
# In-memory database
sqlite = DWH.create(:sqlite, {
file: ':memory:'
})
Read-Only Mode
sqlite = DWH.create(:sqlite, {
file: '/path/to/readonly/database.sqlite',
readonly: true
})
Performance Optimization
The adapter includes default optimizations for analytical workloads:
- WAL mode enabled by default for concurrent reads
- 64MB cache size
- Memory-mapped I/O (128MB)
- Temp tables stored in memory
# Customize performance settings
sqlite = DWH.create(:sqlite, {
file: '/path/to/my/database.sqlite',
timeout: 5000, # busy timeout in milliseconds, default: 5000
pragmas: {
cache_size: -128000, # 128MB cache (negative means KB)
mmap_size: 268435456, # 256MB memory-mapped I/O
temp_store: 'MEMORY', # Store temp tables in memory
synchronous: 'NORMAL' # Faster than FULL, safe with WAL
}
})
Disable WAL Mode
# Disable WAL mode if needed (e.g., for NFS or network filesystems)
sqlite = DWH.create(:sqlite, {
file: '/path/to/my/database.sqlite',
enable_wal: false
})
Advanced Configuration
sqlite = DWH.create(:sqlite, {
file: '/path/to/analytics.sqlite',
readonly: false,
enable_wal: true, # Default: true
timeout: 10000, # 10 second busy timeout
pragmas: {
journal_mode: 'WAL', # Explicitly set WAL (done by default)
cache_size: -256000, # 256MB cache
page_size: 8192, # Larger page size for analytics
mmap_size: 536870912, # 512MB memory-mapped I/O
temp_store: 'MEMORY', # Keep temp data in memory
synchronous: 'NORMAL', # Balance between safety and speed
locking_mode: 'NORMAL' # Allow multiple connections
}
})
Multiple Connections
Unlike DuckDB, SQLite allows multiple independent connections to the same database file:
# Multiple readers/writers to the same file
reader = DWH.create(:sqlite, { file: '/path/to/data.sqlite', readonly: true })
writer = DWH.create(:sqlite, { file: '/path/to/data.sqlite' })
# Both can operate concurrently with WAL mode enabled
data = reader.execute('SELECT * FROM sales')
writer.execute('INSERT INTO sales VALUES (...)')
Trino Adapter
The Trino adapter requires the trino-client-ruby
gem and works with both Trino and Presto.
Basic Configuration
trino = DWH.create(:trino, {
host: 'localhost',
port: 8080, # Default: 8080
catalog: 'hive', # Required
schema: 'default', # Optional
username: 'analyst',
password: 'password', # Optional
client_name: 'My Application' # Default: 'DWH Ruby Gem'
})
SSL Configuration
trino = DWH.create(:trino, {
host: 'trino.example.com',
port: 443,
ssl: true, # will set {ssl: {verify: false}}
catalog: 'hive',
username: 'analyst',
password: 'password',
client_name: "My App"
})
Advanced Configuration with Headers
trino = DWH.create(:trino, {
host: 'trino.example.com',
port: 8080,
catalog: 'delta_lake',
schema: 'analytics',
username: 'analyst',
query_timeout: 1800, # seconds, default: 3600
extra_connection_params: {
http_headers: {
'X-Trino-User' => 'Real User Name',
'X-Trino-Source' => 'Analytics Dashboard',
'X-Forwarded-Request' => 'client-request-id'
},
ssl: {
verify: true,
}
}
})
Apache Druid Adapter
The Druid adapter uses HTTP API calls via the faraday
gem for real-time analytics.
Basic Configuration
druid = DWH.create(:druid, {
protocol: 'http', # 'http' or 'https'
host: 'localhost',
port: 8080, # Default: 8081
client_name: 'My Application' # Default: 'DWH Ruby Gem'
})
HTTPS with Basic Authentication
druid = DWH.create(:druid, {
protocol: 'https',
host: 'druid.example.com',
port: 443,
basic_auth: 'base64_encoded_credentials', # Base64 encoded username:password
query_timeout: 600, # seconds, default: 600
open_timeout: 30 # connection timeout, default: nil
})
Advanced Configuration with Context
druid = DWH.create(:druid, {
protocol: 'https',
host: 'druid.example.com',
port: 8080,
basic_auth: 'dXNlcjpwYXNz', # base64 for 'user:pass'
extra_connection_params: {
context: {
user: 'analyst_name',
team: 'data_engineering',
priority: 10,
useCache: true
}
}
})
AWS Athena Adapter
The Athean adapter requires the aws-athena-sdk
gem and works with both Trino and Presto.
Basic Configuration
athena = DWH.create(:athena, {
region: 'us-east-1',
database: 'default',
s3_output_location: 's3://my-athena-results-bucket/queries/',
access_key_id: 'AKIAIOSFODNN7EXAMPLE',
secret_access_key: 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
catalog: 'hive', # optional will default to awsdatacatalog
database: 'default', # Optional. Db or schema
workgroup: 'my-dept-strata' # optional workgroup
})
SSL Configuration
athena = DWH.create(:athena, {
region: 'us-east-1',
database: 'default',
s3_output_location: 's3://my-athena-results-bucket/queries/',
access_key_id: 'AKIAIOSFODNN7EXAMPLE',
secret_access_key: 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
catalog: 'hive', # optional will default to awsdatacatalog
database: 'default', # Optional. Db or schema
workgroup: 'my-dept-strata', # optional workgroup
extra_connection_params: {
ssl_ca_directory: 'path/to/certs/'
}
})
Advanced Configuration with Headers
See full list of config options here: athena-api
Configuration Validation
DWH validates configuration parameters at creation time:
begin
adapter = DWH.create(:postgres, { host: 'localhost' }) # Missing required database
rescue DWH::ConfigError => e
puts "Configuration error: #{e.}"
end
Each adapter defines required and optional parameters with validation rules. Check the adapter-specific sections above for the complete list of supported parameters.