Migrating from MSSQL to Postgres: A Practical Guide

Migrating from MSSQL to Postgres: A Practical Guide

1) Summary (what, why)

  • Move schema, data, and application SQL from Microsoft SQL Server (T-SQL / MSSQL) to PostgreSQL (Postgres) to reduce licensing cost, gain open‑source flexibility, or standardize on PostgreSQL features.
  • Main challenges: data type differences, T-SQL → PL/pgSQL conversion, identity/sequence handling, index/constraint semantics, transaction/locking behavior, and minimizing downtime.

2) High-level steps

  1. Inventory and assessment
    • Catalog databases, tables, views, stored procs, triggers, jobs, ETL pipelines, external dependencies, replication, and client apps.
    • Capture data sizes, row counts, growth rates, peak load and query performance baselines.
  2. Schema conversion
    • Convert DDL: tables, columns, types, constraints, indexes, sequences (IDENTITY → GENERATED/SEQUENCE).
    • Map data types (common mappings below) and remove SQL Server-specific keywords (e.g., GO, FILEGROUP, WITH NOCHECK).
    • Convert clustered indexes conceptually (Postgres has no clustered index; consider CLUSTER or BRIN/GIST where appropriate).
  3. Code conversion
    • Translate T-SQL stored procedures, functions, triggers, and batch scripts to PL/pgSQL or other supported languages.
    • Replace unsupported constructs (MERGE → INSERT … ON CONFLICT / UPDATE FROM; TRY/CATCH → EXCEPTION blocks).
    • Rework system/catalog queries and metadata access.
  4. Data migration
    • Choose method: pgLoader, AWS DMS, pg_dump/csv + COPY, or commercial ETL/replication tools. Use parallel load for large data.
    • Handle encoding, newline differences (CRLF → LF), and escape sequences. Clean embedded newlines in text fields if using COPY.
    • Preserve transactional consistency; for zero-downtime, use CDC (Debezium, AWS DMS, or vendor tools) to replicate changes.
  5. Testing and validation
    • Schema validation, data checksum comparison (row counts, column checksums), and functional tests for queries and apps.
    • Performance testing: explain plans, indexes, vacuum/analyze, and adjust queries (different optimizer behavior).
  6. Cutover and rollback plan
    • Plan downtime window or phased cutover using replication/dual‑write strategies.
    • Test rollback steps and backups; snapshot source before final migration.
  7. Post-migration
    • Run ANALYZE, tune autovacuum, configure monitoring and alerts, and review security roles and permissions.
    • Retrain ops/dev on Postgres features (extensions, backup/restore, WAL, replicas).

3) Common data type mappings (quick reference)

  • INT / BIGINT → integer / bigint
  • SMALLINT / TINYINT → smallint / smallint (no tinyint in Postgres)
  • BIT → boolean
  • VARCHAR(n) / NVARCHAR(n) → varchar(n) / varchar(n) (Postgres stores UTF-8; use text for unlimited)
  • TEXT / NTEXT → text
  • DATETIME / DATETIME2 → timestamp (or timestamptz if timezone needed)
  • SMALLDATETIME → timestamp
  • DATE → date
  • TIME → time
  • UNIQUEIDENTIFIER → uuid
  • MONEY / SMALLMONEY → numeric(19,4) or decimal
  • BINARY / VARBINARY / IMAGE → bytea
  • XML → xml (Postgres has xml type)
  • SQL_VARIANT → often requires redesign (no direct equivalent)

4) Tools (when to use)

  • Schema conversion: AWS SCT, Microsoft SSMA, ora2pg (adaptable), Ispirer, SQLWays.
  • Data copy / initial load: pgLoader, pg_dump/psql (for PG sources), bulk CSV + COPY.
  • CDC / minimal downtime: AWS DMS, Debezium (Kafka), Striim, commercial CDC vendors.
  • All-in-one migrations: DBConvert, Full Convert, ESF Toolkit for GUI-driven processes.
  • Babelfish for PostgreSQL: run some T-SQL apps with less rewrite (supports TDS protocol and many T-SQL constructs) — useful when minimizing application changes.
  • Testing & validation: custom checksum scripts, pgbench for load testing, EXPLAIN/EXPLAIN ANALYZE.

5) Practical tips & gotchas

  • Identity and sequences: convert IDENTITY columns to GENERATED AS IDENTITY or separate sequences; ensure sequence values are set to max(id)+1.
  • Collations and case sensitivity: SQL Server and Postgres handle collations differently — test string sorting and comparisons.
  • NULL and zero-date handling: SQL Server zero-dates may need special handling; define casts and cleansers.
  • Indexes: PostgreSQL has no clustered index; evaluate index types (btree, gin, gist) and consider expression/index-only strategies.
  • Transactions & locking: long-running migrations can bloat autovacuum or hold long transactions—use short batches.
  • Stored procedures: many T-SQL constructs and system functions must be rewritten; automation will cover ~70–80% but plan manual fixes for the rest.
  • Permissions: convert roles/users to Postgres roles and map privileges carefully.
  • Extensions: consider Postgres extensions (pgcrypto, citext, postgis, pg_partman) to replace or improve SQL Server functionality.
  • Monitoring: configure logging, pg_stat_statements, autovacuum tuning, and regular backups.

6) Minimal example: convert simple table + data (conceptual)

  • MSSQL: CREATE TABLE dbo.customers (CustomerID int IDENTITY(1,1) PRIMARY KEY, Name nvarchar(100), CreatedAt datetime);
  • Postgres: CREATE TABLE customers (customerid integer GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, name varchar(100), createdat timestamp);
  • Data load: export CSV from SQL Server, then: COPY customers(customerid,name,createdat) FROM ‘/path/customers.csv’ WITH (FORMAT csv, HEADER true);

7) Recommended checklist before cutover

  • Inventory complete and stakeholders notified
  • Automated schema conversion run + manual fixes applied
  • Data type and index review complete
  • Initial full data load verified via checksums
  • CDC tested and lag acceptable (if used)
  • All application queries and stored procs validated
  • Performance baselines established and tuned
  • Backup and rollback validated

If you want, I can generate a tailored migration checklist or a one-week migration plan based on an assumed medium-sized (200 GB) OLTP database.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *