Master MySQL Replication on Linux — A Quick, Step‑by‑Step Guide
Ready to scale reads, enable failover, and simplify backups? This quick, step‑by‑step guide to MySQL replication on Linux walks you through core concepts, asynchronous setup, GTIDs, and practical tips to get reliable replicas running fast.
Setting up MySQL replication on a Linux VPS is a powerful way to improve read scalability, enable failover strategies, and create reliable backups. This guide walks you through the essential concepts, a practical step‑by‑step setup for asynchronous replication, real‑world application scenarios, tradeoffs compared to alternatives, and hardware/software selection advice. It assumes you have root or sudo access to modern Linux distributions (Ubuntu/CentOS) and MySQL 5.7+ or MariaDB 10.x.
Why use MySQL replication?
MySQL replication copies data from one server (the primary) to one or more secondaries. It is commonly used for:
- Distributing read traffic across multiple nodes to reduce load on the primary.
- Offloading analytical queries and reporting to avoid affecting transactional performance.
- Implementing geographical read replicas for lower latency in remote regions.
- Providing warm standbys for disaster recovery and facilitating point‑in‑time restores.
The simplest and most widely used mode is asynchronous replication, where the primary does not wait for replicas to acknowledge writes. This yields better performance at the cost of possible replication lag. Newer features like semi‑sync and Group Replication address consistency and failover but require more configuration and resource planning.
How replication works: core components and flow
At a high level, replication relies on the following:
- Binary Log (binlog) on the primary: a sequential record of data changes (events).
- IO Thread on the replica: connects to the primary and fetches binlog events to a local relay log.
- SQL Thread on the replica: reads the relay log and applies events to the replica’s data files.
- GTID (Global Transaction ID): optional but recommended for easier failover and automated recovery.
Replication can be configured using either position‑based (filename + offset) or GTID‑based tracking. GTIDs simplify reconfiguration because each transaction has a unique ID, eliminating many manual steps during failover.
Prerequisites and planning
Before starting, ensure:
- Both servers run compatible MySQL/MariaDB versions.
- Network connectivity between primary and replica (port 3306 by default) with proper firewall rules.
- Sufficient disk I/O and storage on the replica; continuous applying of events may require similar performance to the primary for data‑heavy workloads.
- A strategy for backups and binlog retention: retain binlogs long enough to allow new replicas to catch up or to perform PITR (point‑in‑time recovery).
Decide on GTID vs. file/position replication. For most modern deployments, GTID replication is recommended because it streamlines promotion and resynchronization.
Step‑by‑step: Configure asynchronous replication (GTID approach)
1. Prepare the primary
Edit the MySQL configuration (e.g., /etc/mysql/my.cnf or /etc/my.cnf) and set these key options:
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_format = ROW
expire_logs_days = 7
Notes:
- server‑id must be unique per instance.
- binlog_format ROW ensures minimal ambiguity when applying changes on replicas; it’s preferred for data integrity.
- enforce_gtid_consistency prevents non‑GTID‑safe statements.
Restart MySQL: sudo systemctl restart mysql (or mysqld). Create a replication user:
CREATE USER 'repl'@'%' IDENTIFIED BY 'strongpassword';
GRANT REPLICATION SLAVE ON . TO 'repl'@'%';
FLUSH PRIVILEGES;
2. Take a consistent snapshot
You must start the replica from a snapshot aligned with the primary’s GTID state. Two options:
- Use logical dumps:
mysqldump --single-transaction --master-data=2 --set-gtid-purged=ON --all-databases > dump.sql - Use filesystem snapshot (LVM/ZFS) for large datasets to avoid dump time.
If using mysqldump, transfer dump.sql to the replica and import: mysql < dump.sql.
3. Configure the replica
On the replica, set unique server id and enable GTID in MySQL config:
server-id = 2
gtid_mode = ON
enforce_gtid_consistency = ON
relay_log = /var/log/mysql/mysql-relay-bin
Restart MySQL. Then configure replication source (note newer MySQL uses CHANGE MASTER TO … FOR CHANNEL):
CHANGE MASTER TO
MASTER_HOST='primary_ip', MASTER_USER='repl', MASTER_PASSWORD='strongpassword', MASTER_AUTO_POSITION=1;
Start replication threads:
START SLAVE; or on modern MySQL START REPLICA;
Verify with:
SHOW SLAVE STATUSG
Key fields:
- Slave_IO_Running and Slave_SQL_Running should be YES.
- Seconds_Behind_Master indicates lag (NULL may mean idle SQL thread).
- GTID_EXECUTED / GTID_PURGED show transaction sets applied.
Operational best practices
Monitoring and metrics
- Monitor replication state: use
SHOW SLAVE STATUSand build alerts on IO/SQL thread failures and high Seconds_Behind_Master. - Collect performance metrics via Prometheus exporters (mysqld_exporter) to track binlog size, replica lag, and disk I/O.
- Log and alert when relay logs grow unexpectedly—could indicate halted SQL thread or heavy rollback operations.
Handling failover
For manual failover with GTIDs:
- Promote a replica by stopping replication (
STOP SLAVE), ensuring it has applied all GTIDs, and then reconfiguring application endpoints to point to the new primary. - When reintroducing the old primary, it must be reinitialized as a replica or have its GTIDs reconciled (often easiest to reclone from the current primary).
Automated failover tools (MHA, Orchestrator) simplify this workflow by detecting primary failure, promoting an optimal replica, and replaying lost transactions as needed. Consider these for production HA.
Security and networking
- Encrypt replication traffic (enable SSL/TLS for replication user) or run replication over private networks/VPCs.
- Use strong passwords and limit the replication user’s host to known replica IPs when possible.
- Apply OS hardening: firewall rules, fail2ban, and least privilege sudo for management users.
Common issues and troubleshooting
Replica stuck or IO thread not running
Check connectivity (telnet primary 3306), verify replication user credentials, and inspect error logs (/var/log/mysql/error.log). If IO thread shows access denied, recheck GRANTs and host field for the replication user.
SQL thread errors (duplicate key, foreign key violations)
These often come from inconsistent data between primary snapshot and replica. The options are:
- Fix the offending row then
START SLAVE. - Skip the transaction with
SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;(dangerous—can cause data divergence). - Recreate the replica from a new consistent snapshot.
High replication lag
Investigate heavy writes on primary, slow queries on replica, or insufficient IO. Solutions include moving replicas to faster disks (NVMe), tuning MySQL buffer pools and disk write parameters, or deploying read‑write splitting on the application side to reduce contention.
Comparisons: replication vs clustering vs logical replication
Understand tradeoffs:
- Asynchronous replication: simple, efficient for read scaling, but potential for data loss on failover.
- Semi‑synchronous replication: primary waits for at least one replica to acknowledge, reducing data loss risk at the expense of write latency.
- Group Replication / Galera / InnoDB Cluster: provide multi‑master or virtually synchronous replication and automated conflict resolution, but are more complex and sensitive to network partitioning and latency.
- Logical replication (e.g., MySQL binlog→Kafka or Debezium): useful for CDC (change data capture) and integrating with other systems; complementary rather than replacing row‑based replication for HA/read scaling.
For most web applications that need straightforward read scaling and backups, asynchronous GTID replication on reliable VPS instances is a pragmatic and proven choice.
Hardware and hosting considerations
When selecting VPS instances for primary and replicas, consider:
- CPU: MySQL benefits from single‑threaded performance for the SQL thread; choose modern cores with good clock speeds.
- Memory: allocate adequate InnoDB buffer pool (typically 60–80% of RAM on dedicated DB hosts).
- Storage: prioritize low latency and high IOPS (SSD/NVMe) for both primary and replicas; separate data and log volumes when possible.
- Network: use private networking between instances for replication traffic; ensure consistent bandwidth and low jitter for minimal lag.
For projects on VPS, choose providers that offer predictable CPU and fast NVMe storage. If you need US‑based low‑latency replicas, consider hosting options in data centers located near your user base.
Summary
MySQL replication remains a versatile tool for scaling reads, supporting disaster recovery, and enabling reporting/offloading. For most deployments, enabling GTID‑based asynchronous replication with row‑based binlog format provides a robust balance of performance and manageability. Pay attention to consistent snapshots, monitoring, and secure network configuration. For production, plan for failover (manual or automated), ensure your VPS selection supports low latency I/O, and keep an eye on replica lag and error logs.
For developers and site operators who want reliable VPS instances to host their MySQL primary and replicas, consider USA VPS from VPS.DO — fast NVMe storage, private networking options, and scalable plans suitable for database workloads. Learn more: USA VPS by VPS.DO.