Automated Cross‑Region Backup for Azure Storage (vsftpstor)

Overview:
This project demonstrates how I designed and implemented a production‑grade, automated cross‑region backup solution for the Azure Storage Account vsftpstor. Using Azure Data Factory (ADF), I built a recursive, fault‑tolerant copy pipeline that replicates all folders, containers, and blobs to a disaster‑recovery storage account (vsftpstorbackuphns) located in a different region. This solution strengthens operational resilience, supports disaster recovery requirements, and leverages Azure-native security features such as Managed Identity and RBAC.


Architecture Diagram





















Technologies Used

  • Azure Data Factory (ADF) Pipelines
  • Azure Storage Accounts (ADLS Gen2)
  • Managed Identity (MSI) Authentication
  • RBAC (Storage Blob Data Contributor)
  • Binary Datasets & Recursive File Copy
  • Blob Soft Delete & Container Soft Delete

Implementation Steps

1. Disaster Recovery Storage Account Creation

  • Created vsftpstorbackuphns in a fully ADLS Gen2‑compatible region (West US 3).
  • Enabled Hierarchical Namespace (HNS) for ADLS Gen2 filesystem operations.
  • Enabled blob and container soft delete for additional safety.
  • Assigned Storage Blob Data Contributor to the ADF managed identity.

2. Data Factory Linked Services & Datasets

  • Created Managed Identity–based linked services:
    • LS_vsftpstor_source pointing to production account
    • LS_vsftpstorbackup_sink pointing to DR account
  • Created Binary datasets for:
    • DS_vsftpstor_source
    • DS_vsftpstorbackup_sink
  • Left file system and directory fields blank to allow recursive root-level copying.

3. Pipeline Construction

  • Built PL_vsftpstor_backup with a Copy Data activity.
  • Enabled:
    • Recursive copy for all containers and folders
    • Fault tolerance to skip locked, forbidden, or invalid files
    • Default parallelism/DIU for optimized throughput
  • Published pipeline and validated configuration.

4. Successful Execution

  • Pipeline executed successfully using Debug mode.
  • All containers and blob objects were replicated to vsftpstorbackuphns.
  • Folder structure preserved exactly.

Issues Encountered & Solutions

1. Unsupported Region (ADF DFS Write Failure)

Issue: First backup account was created in South Central US, which caused EndpointUnsupportedAccountFeatures errors.
Solution: Recreated the DR account in West US 3, which fully supports ADLS Gen2 + Data Factory DFS operations.

2. Backup Account Not Visible in Linked Service Dropdown

Issue: HNS was disabled on the initial storage account.
Solution: Recreated the account with Hierarchical Namespace enabled.

3. "Test Connection" Failing in ADF

Issue: A known ADF UI limitation when Soft Delete + HNS are enabled on ADLS Gen2.
Solution: Ignored the UI test and validated through actual pipeline Debug execution, which succeeded.

4. Pipeline Publish Error (Logging)

Issue: Logging was enabled without a logging linked service.
Solution: Disabled optional logging in the Settings tab.

5. Accidental Dataset Deletion During Refresh

Solution: Recreated both datasets with correct settings and republished.


Final Outcome

  • A fully automated, secure, cross‑region backup workflow
  • No impact to production storage operations
  • Resilient against accidental deletion (soft delete)
  • Managed Identity provides secure, keyless authentication
  • Ready for scheduled executions (daily, hourly, etc.)

Next Step: Add a scheduled trigger to automate backup execution.












Architecture Diagram (Text Version)

┌─────────────────────────────┐ │ Source Storage Account │ │ (vsftpstor) │ └──────────────┬──────────────┘ │ ▼ ┌─────────────────────────────┐ │ Azure Data Factory │ │ (ADF Daily Backup Flow) │ └──────────────┬──────────────┘ │ ▼ ┌────────────────────────────────────────┐ │ Disaster Recovery Storage (vsftpstorbackuphns) │ └────────────────────────────────────────┘

Backup Schedule & 30‑Day Retention Policy

Daily Backup Schedule:
A daily scheduled trigger was added to the Azure Data Factory pipeline PL_vsftpstor_backup. The pipeline now runs automatically once per day, ensuring continuous replication of all blob data from the production storage account (vsftpstor) into the disaster recovery storage account (vsftpstorbackuphns). This process keeps the DR copy up to date without impacting production workloads.

30‑Day Retention Policy:
A Lifecycle Management rule was configured directly on the backup storage account to maintain a rolling 30‑day retention window. The rule automatically deletes any blob whose last modified date is older than 30 days. This cleanup applies strictly to the disaster recovery storage account and has no effect on the primary production storage environment. This ensures controlled storage usage and avoids excessive accumulation of outdated backup data.

Together, the automated daily backup schedule and 30‑day retention policy create a reliable, self‑maintaining disaster recovery solution with zero manual intervention required.

Previous
Previous

Deploy Azure Key Vault with Terraform

Next
Next

Secure Serverless Web Delivery on AWS (Terraform)