Files
honey-be/ROLLING_UPDATE_GUIDE.md

357 lines
8.2 KiB
Markdown
Raw Normal View History

2026-03-07 23:10:41 +02:00
# Rolling Update Deployment Guide
This guide explains how to perform zero-downtime deployments using the rolling update strategy.
## Overview
The rolling update approach allows you to deploy new backend code without any downtime for users. Here's how it works:
1. **Build** new backend image while old container is still running
2. **Start** new container on port 8082 (old one stays on 8080)
3. **Health check** new container to ensure it's ready
4. **Switch** Nginx to point to new container (zero downtime)
5. **Stop** old container after grace period
## Architecture
```
┌─────────────┐
│ Nginx │ (Port 80/443)
│ (Host) │
└──────┬──────┘
├───> Backend (Port 8080) - Primary
└───> Backend-New (Port 8082) - Standby (during deployment)
```
## Prerequisites
1. **Nginx running on host** (not in Docker)
2. **Backend containers** managed by Docker Compose
3. **Health check endpoint** available at `/actuator/health/readiness`
4. **Sufficient memory** for two backend containers during deployment (~24GB)
## Quick Start
### 1. Make Script Executable
```bash
cd /opt/app/backend/lottery-be
chmod +x scripts/rolling-update.sh
```
### 2. Run Deployment
```bash
# Load database password (if not already set)
source scripts/load-db-password.sh
# Run rolling update
sudo ./scripts/rolling-update.sh
```
That's it! The script handles everything automatically.
## What the Script Does
1. **Checks prerequisites**:
- Verifies Docker and Nginx are available
- Ensures primary backend is running
- Loads database password
2. **Builds new image**:
- Builds backend-new service
- Uses Docker Compose build cache for speed
3. **Starts new container**:
- Starts `lottery-backend-new` on port 8082
- Waits for container initialization
4. **Health checks**:
- Checks `/actuator/health/readiness` endpoint
- Retries up to 30 times (60 seconds total)
- Fails deployment if health check doesn't pass
5. **Updates Nginx**:
- Backs up current Nginx config
- Updates upstream to point to port 8082
- Sets old backend (8080) as backup
- Tests Nginx configuration
6. **Reloads Nginx**:
- Uses `systemctl reload nginx` (zero downtime)
- Traffic immediately switches to new backend
7. **Stops old container**:
- Waits 10 seconds grace period
- Stops old backend container
- Old container can be removed or kept for rollback
## Manual Steps (If Needed)
If you prefer to do it manually or need to troubleshoot:
### Step 1: Build New Image
```bash
cd /opt/app/backend/lottery-be
source scripts/load-db-password.sh
docker-compose -f docker-compose.prod.yml --profile rolling-update build backend-new
```
### Step 2: Start New Container
```bash
docker-compose -f docker-compose.prod.yml --profile rolling-update up -d backend-new
```
### Step 3: Health Check
```bash
# Wait for container to be ready
sleep 10
# Check health
curl http://127.0.0.1:8082/actuator/health/readiness
# Check logs
docker logs lottery-backend-new
```
### Step 4: Update Nginx
```bash
# Backup config
sudo cp /etc/nginx/conf.d/lottery.conf /etc/nginx/conf.d/lottery.conf.backup
# Edit config
sudo nano /etc/nginx/conf.d/lottery.conf
```
Change upstream from:
```nginx
upstream lottery_backend {
server 127.0.0.1:8080 max_fails=3 fail_timeout=30s;
}
```
To:
```nginx
upstream lottery_backend {
server 127.0.0.1:8082 max_fails=3 fail_timeout=30s;
server 127.0.0.1:8080 backup;
}
```
### Step 5: Reload Nginx
```bash
# Test config
sudo nginx -t
# Reload (zero downtime)
sudo systemctl reload nginx
```
### Step 6: Stop Old Container
```bash
# Wait for active connections to finish
sleep 10
# Stop old container
docker-compose -f docker-compose.prod.yml stop backend
```
## Rollback Procedure
If something goes wrong, you can quickly rollback:
### Automatic Rollback
The script automatically rolls back if:
- Health check fails
- Nginx config test fails
- Nginx reload fails
### Manual Rollback
```bash
# 1. Restore Nginx config
sudo cp /etc/nginx/conf.d/lottery.conf.backup /etc/nginx/conf.d/lottery.conf
sudo systemctl reload nginx
# 2. Start old backend (if stopped)
cd /opt/app/backend/lottery-be
docker-compose -f docker-compose.prod.yml start backend
# 3. Stop new backend
docker-compose -f docker-compose.prod.yml --profile rolling-update stop backend-new
docker-compose -f docker-compose.prod.yml --profile rolling-update rm -f backend-new
```
## Configuration
### Health Check Settings
Edit `scripts/rolling-update.sh` to adjust:
```bash
HEALTH_CHECK_RETRIES=30 # Number of retries
HEALTH_CHECK_INTERVAL=2 # Seconds between retries
GRACE_PERIOD=10 # Seconds to wait before stopping old container
```
### Nginx Upstream Settings
Edit `/etc/nginx/conf.d/lottery.conf`:
```nginx
upstream lottery_backend {
server 127.0.0.1:8082 max_fails=3 fail_timeout=30s;
server 127.0.0.1:8080 backup; # Old backend as backup
keepalive 32;
}
```
## Monitoring
### During Deployment
```bash
# Watch container status
watch -n 1 'docker ps | grep lottery-backend'
# Monitor new backend logs
docker logs -f lottery-backend-new
# Check Nginx access logs
sudo tail -f /var/log/nginx/access.log
# Monitor memory usage
free -h
docker stats --no-stream
```
### After Deployment
```bash
# Verify new backend is serving traffic
curl http://localhost/api/health
# Check container status
docker ps | grep lottery-backend
# Verify Nginx upstream
curl http://localhost/actuator/health
```
## Troubleshooting
### Health Check Fails
```bash
# Check new container logs
docker logs lottery-backend-new
# Check if container is running
docker ps | grep lottery-backend-new
# Test health endpoint directly
curl -v http://127.0.0.1:8082/actuator/health/readiness
# Check database connection
docker exec lottery-backend-new wget -q -O- http://localhost:8080/actuator/health
```
### Nginx Reload Fails
```bash
# Test Nginx config
sudo nginx -t
# Check Nginx error logs
sudo tail -f /var/log/nginx/error.log
# Verify upstream syntax
sudo nginx -T | grep -A 5 upstream
```
### Memory Issues
If you run out of memory during deployment:
```bash
# Check memory usage
free -h
docker stats --no-stream
# Option 1: Reduce heap size temporarily
# Edit docker-compose.prod.yml, change JAVA_OPTS to use 8GB heap
# Option 2: Stop other services temporarily
docker stop lottery-phpmyadmin # If not needed
```
### Old Container Won't Stop
```bash
# Force stop
docker stop lottery-backend
# If still running, kill it
docker kill lottery-backend
# Remove container
docker rm lottery-backend
```
## Best Practices
1. **Test in staging first** - Always test the deployment process in a staging environment
2. **Monitor during deployment** - Watch logs and metrics during the first few deployments
3. **Keep backups** - The script automatically backs up Nginx config, but keep your own backups too
4. **Database migrations** - Ensure migrations are backward compatible or run them separately
5. **Gradual rollout** - For major changes, consider deploying during low-traffic periods
6. **Health checks** - Ensure your health check endpoint properly validates all dependencies
7. **Graceful shutdown** - Spring Boot graceful shutdown (30s) allows active requests to finish
## Performance Considerations
- **Build time**: First build takes longer, subsequent builds use cache
- **Memory**: Two containers use ~24GB during deployment (brief period)
- **Network**: No network interruption, Nginx handles the switch seamlessly
- **Database**: No impact, both containers share the same database
## Security Notes
- New container uses same secrets and configuration as old one
- No exposure of new port to internet (only localhost)
- Nginx handles all external traffic
- Health checks are internal only
## Next Steps
After successful deployment:
1. ✅ Monitor new backend for errors
2. ✅ Verify all endpoints are working
3. ✅ Check application logs
4. ✅ Remove old container image (optional): `docker image prune`
## Support
If you encounter issues:
1. Check logs: `docker logs lottery-backend-new`
2. Check Nginx: `sudo nginx -t && sudo tail -f /var/log/nginx/error.log`
3. Rollback if needed (see Rollback Procedure above)
4. Review this guide's Troubleshooting section