Migrating from MSSQL to Postgres: A Practical Guide
1) Summary (what, why)
- Move schema, data, and application SQL from Microsoft SQL Server (T-SQL / MSSQL) to PostgreSQL (Postgres) to reduce licensing cost, gain open‑source flexibility, or standardize on PostgreSQL features.
- Main challenges: data type differences, T-SQL → PL/pgSQL conversion, identity/sequence handling, index/constraint semantics, transaction/locking behavior, and minimizing downtime.
2) High-level steps
- Inventory and assessment
- Catalog databases, tables, views, stored procs, triggers, jobs, ETL pipelines, external dependencies, replication, and client apps.
- Capture data sizes, row counts, growth rates, peak load and query performance baselines.
- Schema conversion
- Convert DDL: tables, columns, types, constraints, indexes, sequences (IDENTITY → GENERATED/SEQUENCE).
- Map data types (common mappings below) and remove SQL Server-specific keywords (e.g., GO, FILEGROUP, WITH NOCHECK).
- Convert clustered indexes conceptually (Postgres has no clustered index; consider CLUSTER or BRIN/GIST where appropriate).
- Code conversion
- Translate T-SQL stored procedures, functions, triggers, and batch scripts to PL/pgSQL or other supported languages.
- Replace unsupported constructs (MERGE → INSERT … ON CONFLICT / UPDATE FROM; TRY/CATCH → EXCEPTION blocks).
- Rework system/catalog queries and metadata access.
- Data migration
- Choose method: pgLoader, AWS DMS, pg_dump/csv + COPY, or commercial ETL/replication tools. Use parallel load for large data.
- Handle encoding, newline differences (CRLF → LF), and escape sequences. Clean embedded newlines in text fields if using COPY.
- Preserve transactional consistency; for zero-downtime, use CDC (Debezium, AWS DMS, or vendor tools) to replicate changes.
- Testing and validation
- Schema validation, data checksum comparison (row counts, column checksums), and functional tests for queries and apps.
- Performance testing: explain plans, indexes, vacuum/analyze, and adjust queries (different optimizer behavior).
- Cutover and rollback plan
- Plan downtime window or phased cutover using replication/dual‑write strategies.
- Test rollback steps and backups; snapshot source before final migration.
- Post-migration
- Run ANALYZE, tune autovacuum, configure monitoring and alerts, and review security roles and permissions.
- Retrain ops/dev on Postgres features (extensions, backup/restore, WAL, replicas).
3) Common data type mappings (quick reference)
- INT / BIGINT → integer / bigint
- SMALLINT / TINYINT → smallint / smallint (no tinyint in Postgres)
- BIT → boolean
- VARCHAR(n) / NVARCHAR(n) → varchar(n) / varchar(n) (Postgres stores UTF-8; use text for unlimited)
- TEXT / NTEXT → text
- DATETIME / DATETIME2 → timestamp (or timestamptz if timezone needed)
- SMALLDATETIME → timestamp
- DATE → date
- TIME → time
- UNIQUEIDENTIFIER → uuid
- MONEY / SMALLMONEY → numeric(19,4) or decimal
- BINARY / VARBINARY / IMAGE → bytea
- XML → xml (Postgres has xml type)
- SQL_VARIANT → often requires redesign (no direct equivalent)
4) Tools (when to use)
- Schema conversion: AWS SCT, Microsoft SSMA, ora2pg (adaptable), Ispirer, SQLWays.
- Data copy / initial load: pgLoader, pg_dump/psql (for PG sources), bulk CSV + COPY.
- CDC / minimal downtime: AWS DMS, Debezium (Kafka), Striim, commercial CDC vendors.
- All-in-one migrations: DBConvert, Full Convert, ESF Toolkit for GUI-driven processes.
- Babelfish for PostgreSQL: run some T-SQL apps with less rewrite (supports TDS protocol and many T-SQL constructs) — useful when minimizing application changes.
- Testing & validation: custom checksum scripts, pgbench for load testing, EXPLAIN/EXPLAIN ANALYZE.
5) Practical tips & gotchas
- Identity and sequences: convert IDENTITY columns to GENERATED AS IDENTITY or separate sequences; ensure sequence values are set to max(id)+1.
- Collations and case sensitivity: SQL Server and Postgres handle collations differently — test string sorting and comparisons.
- NULL and zero-date handling: SQL Server zero-dates may need special handling; define casts and cleansers.
- Indexes: PostgreSQL has no clustered index; evaluate index types (btree, gin, gist) and consider expression/index-only strategies.
- Transactions & locking: long-running migrations can bloat autovacuum or hold long transactions—use short batches.
- Stored procedures: many T-SQL constructs and system functions must be rewritten; automation will cover ~70–80% but plan manual fixes for the rest.
- Permissions: convert roles/users to Postgres roles and map privileges carefully.
- Extensions: consider Postgres extensions (pgcrypto, citext, postgis, pg_partman) to replace or improve SQL Server functionality.
- Monitoring: configure logging, pg_stat_statements, autovacuum tuning, and regular backups.
6) Minimal example: convert simple table + data (conceptual)
- MSSQL: CREATE TABLE dbo.customers (CustomerID int IDENTITY(1,1) PRIMARY KEY, Name nvarchar(100), CreatedAt datetime);
- Postgres: CREATE TABLE customers (customerid integer GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, name varchar(100), createdat timestamp);
- Data load: export CSV from SQL Server, then: COPY customers(customerid,name,createdat) FROM ‘/path/customers.csv’ WITH (FORMAT csv, HEADER true);
7) Recommended checklist before cutover
- Inventory complete and stakeholders notified
- Automated schema conversion run + manual fixes applied
- Data type and index review complete
- Initial full data load verified via checksums
- CDC tested and lag acceptable (if used)
- All application queries and stored procs validated
- Performance baselines established and tuned
- Backup and rollback validated
If you want, I can generate a tailored migration checklist or a one-week migration plan based on an assumed medium-sized (200 GB) OLTP database.
Leave a Reply