Platform Infrastructure Design for Security, Availability, and Scalability
This page covers the design of the interworks.cloud Platform infrastructure, with regards to security, availability and scalability.
Table of Contents
Deployment
The platform is deployed according to the distributed deployment model. This model spreads platform roles across separate physical/virtual resources, and multiple resources are deployed to support each role, to avoid potential points of failure among the deployed components. Each platform component is then installed on multiple, separate Azure resources, and components of the same role will share a dedicated availability set. All web-based components are configured in highly available web farms, served by Azure load balancers, with the option to activate on-demand scaling to distribute load and support increasing traffic/request handling. The platform relational databases reside in highly available geo-replicated Azure SQL managed instances. All component resources can be scaled up to support increased traffic demand and storage instances, and/or additional resources may be provisioned on-demand to serve and handle the additional load.
The following is a summary of the high availability features of the interworks.cloud Platform:
Use of Azure availability sets and zones for VMs and AKS clusters
Separate server farms for front-end (Storefront) and back-office (BSS) web applications
Multi-node cluster for message broker (RabbitMQ) implementation
Azure SQL Managed Instance built-in high availability solution deeply integrated with the Azure platform
Use of Azure load balancer and application gateway
Use of horizontal/vertical Pod Autoscaler for AKS clusters
Deployment Environment
The interworks.cloud Platform is mainly offered as a SaaS service, however, it can also be hosted on premises or in the cloud — either by the client, or by interworks.cloud on the client’s behalf. The deployment environment must be selected considering several factors related to scalability, flexibility and security and general maintenance of the infrastructure. Cloud-based platforms provide several resources (Virtual Machines, Storage, Networking, Load Balancers) and deployment options (managed or not) that will cover most business requirements.
Microsoft Azure is one of the fastest growing Cloud Infrastructure Platforms. It provides unparalleled flexibility, on-demand scalability, and secure, stable and reliable high performance services and resources. Additionally, Azure is generally available in 36 regions around the world and supported by a growing network of Microsoft-managed datacenters. On-premise platforms may offer more control and customization, but they may also require more maintenance and resources. If you already have a mature IT network with datacenters in appropriate locations, this may be a suitable option.
Security
Access to cloud resources and management portal are granted only to a specific group of authorized personnel. Strong Two-factor authentication (2FA) is configured to further strengthen security and protect from unauthorized access. Remote access to platform VMs is also restricted to specific users and conditional access policies are in place to allow access from specific locations. Additional Two-factor authentication is employed at operating system level for all RDP/SSH connections to the platform VMs. All critical data are stored in secure locations, providing encryption-at-rest and safeguarding of the data in order to meet organizational security and compliance commitments. Data exchanged locally between the database and the platform components may not be encrypted – with the exception of specific sensitive data (like user passwords). All other communications (between client and server or server and 3rd party end-points) are executed through secure channels enforcing SSL encryption.
Please note the following features pertaining to database security:
Latest Enterprise Edition SQL Server
Automatic patching and version updates, automated backups, built-in high availability
Quick service / resource scaling
Isolated environment
Advanced threat protection
Transparent data encryption (TDE)
For more information about platform security, please see the following page: Platform Security & Privacy | 2. Cloud Security
For certificates and policy documents pertaining to platform and information security, please see the following page: Platform Security & Privacy | 5.5 Security Audits
System Monitoring
All architecture elements of an interworks.cloud Platform deployment could present possible failures or performance degradation. It is crucial to establish monitoring processes for all applications and services to maximize the availability, performance, reliability and consumption. There’s a wide range of monitoring features that are employed and are actively used for the interworks.cloud Platform online deployment:
Azure Monitor
Application Insights - Monitors the availability, performance, and usage of the platform web applications
VM insights - Monitors and analyzes the performance and health of platform servers
Container Insights - monitors the health and performance of managed Kubernetes clusters hosted on AKS
Additional open-source monitoring solution deployed on on-premises datacenter - monitors the availability and performance of platform infrastructure servers
External monitoring service - Monitors uptime and availability of all platform public endpoints (including custom URLs used by resellers/end-customers)
Monitoring automations - Alert about unhandled application exceptions, expiring SSL certificates, automatically initialize web applications to optimize startup response times
Publicly accessible status page for all platform services and regions displays current service status, information on potential incidents, maintenance operations in progress or scheduled, as well as historical uptime and SLA metrics
Integrations with 3rd party service for real time alerts and escalations to On-Call teams 24x7
Maintenance and Upgrades
Maintenance actions are required to keep all resources healthy and keep long-term stability and functionality of the software system . More specifically:
Servers are kept up-to-date with latest OS and security updates
Servers may be replaced in case of new major OS versions or new generation VMs
Appropriate retention policies are in place for database, file and event viewer logs to ensure adequate keeping of historical data
Systematic scheduled maintenance of databases is performed to guarantee constant operation at peak performance
A new version of the interworks.cloud Platform is released weekly, with new features, improvements and bug-fixes that are in most cases critical for your business evolution.
Disaster Recovery
To meet typical needs and requirements regarding disaster recovery capabilities, any deployment should consider geo-replication technologies. All critical data, including platform SQL databases as well as other plain storage data should be backed up or replicated between separate regions with hot disaster recovery support. In this way, in the unlikely event of a disaster the platform will be able to be re-activated on another region and resume normal operations. Geo-replication of all Virtual Machines of the infrastructure (BSS, Storefront, Administration, RabbitMQ, PostgreSQL) is also necessary to ensure Business Continuity as fast as possible. Fully automated creation of a new Kubernetes cluster on any region should be ready to be initiated any time failover is required keeping this way the cost of related resources at the lowest possible level.
Please note the following Disaster Recovery features of the interworks.cloud Platform online deployment:
Azure Site Recovery implemented on all platform servers
Azure SQL Managed instance configured for Geo-replication and Failover groups
All platform related data residing in storage accounts are replicated to secondary storage accounts in another (paired) Azure region
Recovery plan implemented with appropriate automations to facilitate a complete failover to another (paired) Azure region in case of a disaster within minutes