Azure Cloud Adoption Framework: Effective Cloud Management with the Manage Phase
In the final article of the Azure Governance Blog Series, we cover the Manage Phase of the Azure Cloud Adoption Framework. With the RAMP approach (Ready, Administer, Monitor, Protect), you structure responsibilities, automate processes, and minimize risks.

Introduction
Over the past weeks, within the Azure Governance Starter Blog Series, we have explored the different phases of the Azure Cloud Adoption Framework and outlined their respective areas of responsibility, recommendations, and goals. This final blog post in the series focuses on the content of the Manage Phase of the Cloud Adoption Framework.
Understanding the Manage Phase
In the previous phases of the Cloud Adoption Framework, we dealt with planning, preparation, implementation, and securing your cloud environment. Now we turn to the scenario where your cloud environment has been successfully implemented and must continue to be managed.
At this point, the Manage Phase of the Cloud Adoption Framework comes into play to maintain the efficiency, resilience, and security of your cloud environment.
As a central methodology of the Manage Phase, Microsoft recommends using the RAMP approach. The RAMP approach helps organizations structure and strengthen their operational foundation:
- Ready: Set up teams, processes, and tools to effectively start cloud operations.
- Administer: Define and consistently execute central and workload-specific responsibilities.
- Monitor: Track and actively manage cloud health, costs, and compliance.
- Protect: Minimize risks, establish security standards, and ensure disaster recovery.
This methodology is similar to the content of the previous phases but primarily focuses on managing existing cloud resources.
Ready – Preparing Cloud Operations
Identifying Management Tasks
Managing Azure can be divided into platform-level and workload-level responsibilities.
- Central responsibilities cover the entire cloud environment, such as security policies, cost management, or overarching network topologies.
- Workload responsibilities focus on individual applications or services, such as architecture decisions, cost optimizations, or performance tuning.
While the platform team defines Azure policies, identity governance, and subscription boundaries, the workload team ensures that each application complies with existing cloud requirements, is securely designed, and uses its resources efficiently.
Building an Effective Cloud Operating Model
To make these responsibilities tangible, the CAF recommends a structured approach:
- Choose a cloud operating model: Depending on your organization’s size and maturity, your cloud teams may be centralized, decentralized, or hybrid.
- Assign platform responsibilities: A dedicated team manages central tasks such as governance, Azure subscription management, or network topologies.
- Define workload responsibilities: Independent teams take care of applications, their lifecycles, and workload-specific security measures.
- Establish roles and ownership: Every task—from CI/CD pipelines to cost optimization—needs a clearly assigned owner.
Documentation as the Backbone of Operations
To manage processes, responsibilities, and procedures effectively, the Azure Cloud Adoption Framework recommends documentation on two levels:
- Operational procedures such as change management, disaster recovery, or release processes.
- Runbooks and playbooks for recurring tasks such as scaling actions during high CPU load or failover scenarios in Azure Site Recovery.
Stored centrally and updated regularly, they ensure that every incident is handled consistently and efficiently.
Automation as a Productivity Lever
Modern cloud operations rely heavily on automation. Azure offers a variety of tools for monitoring and enforcing security and governance measures, such as:
- Azure Policy for compliance and governance.
- Microsoft Defender for Cloud and Sentinel for automated threat detection and response.
- Azure Monitor + Service Health for identifying and handling incidents or outages.
- Logic Apps and Azure Automation for workflow and change management.
Automating repeatable tasks helps your team focus on the strategic advancement of your Azure cloud.
Continuous Improvement in Operations
Management also includes optimizing and improving the existing cloud architecture. The Cloud Adoption Framework lists the following examples:
- Regular operational reviews to evaluate metrics, incidents, and risks.
- Training and upskilling teams in areas such as cost control, security operations, or business continuity and disaster recovery.
- Feedback loops to feed learnings from incidents directly back into runbooks, automations, and architectures.
Administer – Defining Responsibilities
Determining the Scope of Management
Depending on the resource types used, responsibility may shift toward Microsoft or your teams. While you retain full responsibility on-premises, you can increasingly delegate core responsibilities to Azure services when choosing IaaS, PaaS, or SaaS. Therefore, you should categorize the services you use, determine the level of management effort, and delegate accordingly to the responsible teams.
Managing Changes
The most common source of disruption in cloud environments is uncoordinated changes and adjustments. That’s why you need a robust change management process that formally records changes, evaluates their risks, assigns the appropriate approvals, and deploys them in a standardized way.
Critical changes to highly available systems require multi-stage reviews, progressive deployment models like Canary or Blue/Green, and active monitoring.
Minor adjustments to non-critical services, on the other hand, can be rolled out automatically via CI/CD, provided that tests and validations are in place. After deployment, health checks and logs verify stability, while a clearly defined rollback path ensures quick recovery in case of issues. Unauthorized changes can be prevented or detected with Azure Policy, change history analysis, and dedicated merge requests.
Managing Security Measures
Microsoft Entra ID serves as a unified platform for identity and access management. Role-based (RBAC) and attribute-based (ABAC) access controls ensure that only the minimum necessary privileges are granted. Just-in-Time permissions with Privileged Identity Management prevent permanently privileged accounts.
Secure resource configurations are created exclusively from code, supplemented by Azure Policy and Defender for Cloud. On the authentication side, multi-factor authentication (MFA) and conditional access are mandatory. Microsoft Sentinel additionally supports as an SIEM/SOAR solution for security monitoring and incident response.
Managing Compliance
Compliance means not only defining governance rules but also enforcing them. Azure Policy ensures that organizational and regulatory requirements are effectively applied in operations. This includes general policies such as allowed regions or resource types as well as industry-specific standards.
A continuous feedback loop is crucial, where resources and processes are defined, measured, deviations identified, and consistently corrected.
Managing Data
Data management includes classification, protection, and residency. Microsoft Purview supports the identification and categorization of sensitive data. Regions must be chosen deliberately and reviewed regularly to meet data residency requirements. Internal and publicly accessible workloads should be strictly separated into management groups.
Access controls such as RBAC and ABAC are tied to data classification. Soft Delete, versioning, and immutable policies protect against data loss, while access policies block unauthorized changes. Resource locks should only be used in exceptional cases, as they can interfere with IaC processes.
Managing Costs
Cost control in Azure primarily means transparency. Microsoft Cost Management provides budgets, alerts, and reports, while tagging makes cost drivers visible. Workload teams need access to their cost reports and should implement Well-Architected recommendations for optimization. Azure does not enforce fixed subscription cost limits, so budgets, alerts, and responsible usage are indispensable.
Managing Code
Code and runtime management are the responsibility of workload teams. Operational excellence is achieved through CI/CD automation, feature flags, resilient design patterns, telemetry from the first commit, and regular testing under real-world conditions.
To keep applications robust and manageable, it is recommended to consistently align with the Operational Excellence pillar of the Well-Architected Framework.
Managing Cloud Resources
Changes in the Azure Portal are practical for prototyping but dangerous in production because they create drift. Productive changes should only be made from version-controlled templates (Bicep, Terraform, or ARM).
CI/CD pipelines build, validate, and deploy. This applies to both application code and infrastructure components. Configuration drift can be detected by regularly comparing the live environment with the desired state, while a last known good state serves as a rollback reference.
Resource sprawl is ideally contained through naming and tagging standards, Azure Policy, and RBAC, and further reduced by cleanup cycles with Azure Advisor and Resource Graph.
Managing Relocation
Region moves may be necessary to meet regulatory requirements, improve performance, or optimize costs. However, each relocation involves clear risks. Therefore, acceptable downtime windows must be defined, stakeholders informed, and duplicate environments decommissioned consistently after migration. Smaller teams migrate workloads sequentially, while larger organizations orchestrate multiple workloads in parallel to leverage synergies.
Managing Operating Systems
Anyone using virtual machines must also manage operating systems. Automation with Azure Machine Configuration and Azure Automation ensures consistent provisioning and maintenance.
Patching is mandatory on both guest and host levels. With Change Tracking & Inventory using the Azure Monitoring Agent, in-guest operations can be tracked, whether VMs run in Azure, on-premises, or in another cloud.
Monitor: Actively Overseeing Cloud Health, Costs, and Compliance
Identifying the Monitoring Scope
Which components need to be monitored depends heavily on the deployment model. In on-premises environments, you are responsible for all layers from hardware through virtualization and operating systems to data, security, and costs. This scope decreases with IaaS and PaaS.
Here, Azure takes care of essential platform services, while you must still keep service health, security, compliance, data, and log outputs under control. Even with SaaS offerings, monitoring security, costs, and governance remains your responsibility. A complete inventory is crucial.
With Azure Resource Graph, you can systematically capture your resources. Using Azure Arc, you can extend this visibility by seamlessly including on-premises systems, edge deployments, and other clouds.
Planning Your Monitoring Strategy
A solid strategy begins with choosing the right operating model. Smaller organizations often benefit from centralized monitoring, which simplifies costs and governance but may create bottlenecks. Larger enterprises frequently opt for a shared model.
While a central team monitors cross-cutting topics like security, compliance, or costs, workload teams are responsible for their own applications. Next, you define which metrics and logs are truly relevant. Initially, it may seem useful to collect as much data as possible, but as maturity grows you should optimize both the scope of collection and retention periods to keep costs under control.
At the same time, regulatory requirements must be met, and forensic analysis must remain possible. Finally, you develop an escalation and alerting concept to reliably route critical events to the right teams. This strategy should be reviewed regularly and adapted iteratively to respond to new business and security requirements.
Designing a Monitoring Solution
The architecture of your monitoring solution determines how efficiently you can collect, store, and analyze data. The goal is a consolidated system that covers Azure, on-premises, other clouds, and edge environments equally.
Azure Monitor serves here as the central platform, complemented by Azure Arc for external data sources. Consolidating into fewer storage locations simplifies correlation and governance, although requirements like data residency or tenant separation may sometimes require distributed storage.
Typical destinations include Log Analytics for interactive analysis, Storage for long-term archiving, Event Hubs for integration with external SIEM systems, and Data Explorer for advanced queries. Data Collection Rules or diagnostic settings control what data is collected and where it is sent. Azure Policy helps enforce standards such as agent installation and data collection.
For larger environments, Infrastructure as Code should be used to roll out configurations consistently. Cost control is equally important: regularly review what data you collect and how long you keep it, and delete unused logs to avoid budget waste.
Configuring Monitoring
This involves collecting service health information, security events, compliance data, cost metrics, and resource utilization.
Azure Service Health provides notifications about platform-side outages and maintenance, while Resource Health documents the state of individual resources. Security-relevant data comes from Entra ID, Defender for Cloud, Sentinel, or Network Watcher. Compliance is supported by automated checks via Azure Policy and Microsoft Purview. Cost metrics are monitored with Cost Management, while Application Insights provides telemetry from within your code. Additionally, Azure Monitor Metrics continuously supplies performance data. All these signals come together to form a holistic view of your environment.
Configuring Alerts
Alerts translate monitoring data into actionable items. You define thresholds for performance and security metrics and ensure that relevant people are immediately informed. It is important to distinguish alerts by severity. Critical alerts affect core services and must be escalated immediately, while less urgent notifications can go to operational teams.
Azure Monitor Alerts offers dynamic thresholds that automatically adapt to normal operating conditions. Each alert should be assigned to an action group that includes at least an email notification. In addition, SMS, Logic Apps, or ITSM system integration can improve response times. The goal is to strike a balance between sensitivity and noise suppression: too few alerts create blind spots, while too many can cause alert fatigue.
Visualizing Monitoring Data
Data only becomes valuable when it is clearly presented. Dashboards and Workbooks in the Azure portal provide quick insights, while detailed analyses are possible through Log Analytics or Grafana. Workbooks are ideal for in-depth root cause analysis and complex correlations, while dashboards provide a clear overview for day-to-day operations.
It is important to tailor visualizations to the audience: technical teams need detailed queries, while management requires an aggregated view of costs, availability, and compliance. This way, monitoring becomes not only an operational tool but also a strategic source of information.
Protect: Minimizing Risks and Establishing Security Standards
Managing Reliability
Reliability comes from clear objectives, resilient architecture, and well-practiced recovery procedures.
The first step is to prioritize your workloads based on business impact. From this prioritization, you derive uptime SLOs, maximum tolerable downtimes, recovery objectives, and the required redundancy. A business-critical service with 99.99% uptime can only afford a few minutes of downtime per month. Active deployment across multiple regions, cross-zone redundancy, active load balancing, and a replication strategy that ensures real-time data consistency are examples of how to achieve this.
Workloads with moderate criticality often rely on an active-passive multi-region architecture with asynchronous replication, while low-priority workloads are often stable enough within a single region using zone redundancy.
Service level indicators (SLIs) such as error rates, latency, or health signals quantify the current state and serve as a benchmark for the ideal state. The Recovery Time Objective (RTO) defines how long an outage may last. The Recovery Point Objective (RPO) limits the potential data loss. Both are not based on guesswork but derived from prior analysis, historical incident data, and testing.
A pragmatic approach is to divide the annually allowed downtime of the SLO by the expected number of incidents. This results in an RTO that you can test against reality through failover drills. For architecture planning, calculate the expected availability along the critical path.
In a single region, you multiply the SLAs of the involved services; if independent paths exist, the overall outage risk decreases because failures are not cumulative. In multi-region setups, effective availability increases further, since a regional outage can be absorbed by the other region. These calculations should be repeated with every design change until the estimated uptime reliably exceeds the SLO.
Azure provides a wide range of tools to help achieve these goals efficiently. For data platforms, services such as Azure Cosmos DB, Azure SQL Database, Azure Blob Storage, and Azure Files offer native replication and backup capabilities. If this is not sufficient or external systems are involved, Azure Backup and Azure Site Recovery can also be used.
For geographic and cross-zone load balancing, Azure Front Door supports HTTP/HTTPS workloads, while Azure Traffic Manager handles non-HTTP endpoints. Within a single region, Application Gateway (L7) and Azure Load Balancer (L4) distribute traffic across availability zones and instances.
Service Health and Resource Health deliver the operational pulse, while Infrastructure as Code ensures consistency across all active and passive paths. Together, these components form the Reliability Rails that allow you to achieve SLOs in a cost-effective way.
Summary of the Phases of the Cloud Adoption Framework
To properly position the Manage Phase, I would like to recap the insights from the previous blog posts. These insights are illustrated in the following diagram.
Strategy: Preparing Within Your Organization
Define motivation & cloud mission: Before discussing concrete steps for implementing the cloud journey, your organization must clearly define the reasons for adopting cloud providers.
Define business cases: At this stage, clear advantages for using cloud providers must exist in order to secure long-term organizational support for migrating workloads and data centers.
Identify relevant stakeholders: To assess the scope of your initiative, identify all relevant stakeholders of your systems. The full extent of a cloud migration often only becomes clear afterward, so identifying stakeholders early allows you to gain an initial realistic picture.
Analyze system & application landscape: To assess the current modernization needs of your application landscape, document your existing systems and applications.
Evaluate risks & SLAs: How critical are your applications and systems? Based on existing SLAs, you can better define your migration strategy and understand the risks involved in the subsequent phases.
Establish a Cloud Strategy Team / CCoE: A cloud migration may take place over an extended period, so forming a dedicated Cloud Strategy Team can help standardize, implement, and review your cloud objectives.
Conduct an initial assessment of your organization’s cloud readiness: To avoid challenges in later phases of cloud adoption, assess whether your organization has the necessary resources and skills for the cloud journey. Based on this, you may need to hire new staff, retrain existing personnel, or bring in external expertise.
The result is a Cloud Strategy Paper that consolidates all initial insights, goals, and framework information and serves as a foundation for the subsequent phases.
Plan: Selecting the Right Cloud Provider
Define your non-negotiables for cloud usage: Switching to a specific cloud provider requires a long-term investment in hosting your application landscape. For this reason, before starting your migration initiative, you should define your non-negotiables based on strategic direction, policies, and budget.
Determine the lifecycle of your systems: Not every workload needs to be migrated to the cloud. Legacy software or mission-critical systems may, in some scenarios, remain on-premises. Therefore, identify systems that will continue to be used long-term and that support your business goals through modernization.
Define the migration strategy for individual systems: After identifying relevant systems for your cloud journey, determine the modernization measures for each. These may include adapting the hosting environment, modernizing application software, or adopting cloud-native technologies.
Define the operational model: As with your on-premises data center, you must establish dedicated teams, clear responsibilities, and defined processes for the cloud journey.
Communicate responsibilities: Your operational model should then be communicated, discussed, and iteratively refined within your organization so that every team member is aligned on the next steps, cloud mission, and goals.
Map organizational structure into the cloud environment: Management groups, subscriptions, and tagging strategies allow you to represent responsibilities, organizational structures, and ownership. A plan for mapping these dependencies should be created before starting your migration.
Define architectural design principles: Development teams need guidelines and standards for connecting to cloud services. Consistent architectural implementation makes reuse, adaptation, and monitoring of software changes easier.
Create the cloud roadmap: As in any project, milestones, migration waves, dependencies, and budget cycles should be documented in a cloud roadmap. The roadmap also serves as a foundation for planning the implementation steps.
By the end of the Plan Phase, you will have expanded your Cloud Strategy Paper into a detailed Cloud Journey that can be used as the basis for the following steps.
Ready: Technical and Provider-Specific Preparation
Select provider-specific solutions and services: In recent years, the services of the major hyperscalers have become very similar. Nevertheless, there are still differences in the functionality of individual services, which means you should address solution design early on.
Implement organizational structure: After outlining the organizational structure in the Plan Phase, it must now be transferred into the Azure environment.
Define identity and access management policies: The use of Entra ID as an IDP, managed identities, or role-based access control must also be clearly defined and communicated.
Implement network topology: Depending on your company’s locations, customer base, or future scaling goals, your network topology may need to be implemented differently. Ideally, this preparation should already be carried out in the Ready Phase.
Implement security measures: Before migrating your workloads, you should implement the recommended protection for your identities, workloads, and networks.
Define monitoring process: How do you plan to ensure the availability of your cloud infrastructure and application systems? For this reason, you should define and implement a comprehensive monitoring process from the start.
Define business continuity and disaster recovery: The worst-case scenario can occur sooner than expected, so you should establish a robust strategy for resolving critical issues as early as possible.
As a result of the Ready Phase, your organization’s cloud readiness is now successfully completed, paving the way for the actual migration.
Adopt: Migrating Your Workloads
Define technologies for data migration and connectivity: Before workload migration can begin, you should establish the services and methods for providing on-premises connectivity and data migration.
Prioritize workloads: It is best to start with workloads that are not mission-critical. With these workloads, you can gain initial experience and refine your approach for more critical systems.
Create a timeline: Migrating individual workloads can create dependencies and downtime. Therefore, you should always keep your timeline in mind.
Prepare workloads: If you already know that applications will require modifications before migration, you should prepare them in advance.
Migrate workloads: Migration, modernization, and the development of new workloads can then begin.
Decommission legacy systems: Once your on-premises systems have been migrated, you can confidently decommission your legacy systems in your on-premises data center and transfer responsibility to your cloud team.
Your systems are now migrated to the cloud, which also changes responsibilities and the way your team operates. These changes should therefore be reflected in your existing operational documentation.
Govern: Steering and Optimizing Your Cloud Environments
Establish a governance team: To ensure your cloud strategy is successfully implemented in the long term, a dedicated governance team should monitor compliance with policies, strategies, and measures.
Assess cloud risks: The security of your cloud environment cannot be neglected after completing a particular phase. Therefore, cloud risks and audits should continuously be used to uncover existing vulnerabilities.
Document cloud policies: Team members change departments or organizations. For this reason, cloud policies should be easily accessible to both current and new team members.
Enforce governance policies: While documenting and communicating governance policies is essential, strict enforcement through Azure Policies and third-party solutions is equally important.
Continuous monitoring: Information about your cloud environments forms the foundation for improvements and optimizations. Therefore, effective monitoring is indispensable.
The Govern Phase is therefore not a single step but should be understood as universally applicable. The outcome of this phase is ideally a solid governance operating model.
Secure: Securing Cloud Resources, Infrastructure, and Workloads
Establish a security team: Similar to implementing the governance strategy, it makes sense to have a dedicated security team to conduct penetration testing, manage vulnerabilities, and implement improvements.
Cloud security as a criterion for cloud adoption: Only when your workloads are securely provisioned against unauthorized access should you consider your cloud journey provisionally complete.
Landing Zones as an essential foundation: With landing zones, you can standardize environments and reuse them for future expansions. If these building blocks also meet your security requirements, reuse becomes even easier.
Automation and control: People make mistakes. This human factor can be mitigated through automation and specific control mechanisms.
Like the Govern Phase, the Secure Phase should be understood as a continuous process. The outcome is a secure and scalable cloud infrastructure.
Manage: Operating Cloud Resources and Operating Models
The Manage Phase is based on the RAMP model, which consists of the following components:
Ready: Establish a suitable operating model, clear roles, and decision-making paths. Standardize the tools used as well as the basic policies for governance, identity, and networking.
Administer: Clearly divide responsibilities between platform and workload teams and enforce them with processes for change, deployment, and configuration management. Standards such as naming/tagging, RBAC/PIM, and policy ensure compliance and smooth operations.
Monitor: Collect and correlate metrics, logs, and traces from Azure, hybrid, and multicloud environments centrally. Actively manage via alerts, dashboards, and cost reports to continuously ensure availability, security, and compliance.
Protect: Harden identities, platforms, and workloads with baselines, MFA/conditional access, Defender/Sentinel, and policy-driven compliance. Reinforce this with redundancy, backups, and regularly practiced DR processes to quickly detect, absorb, and resolve outages.
Conclusion
With the Manage Phase, we conclude the Azure Governance Starter Blog Series. In the previous posts, we covered all the key aspects of the Microsoft Cloud Adoption Framework (CAF) — from strategic foundations, planning, and implementation to governance, security, and operational management.
The Manage Phase ties these threads together. With RAMP (Ready, Administer, Monitor, Protect), you establish a resilient operating model, clearly separate responsibilities between platform and workload teams, automate recurring tasks, keep costs and compliance under control, and strengthen resilience through practiced recovery processes. This provides you with a practical guide to running Azure environments efficiently, securely, and at scale — from the first idea through to implementation.
In the following overview, you will find all previous posts of the Azure Governance Starter Blog Series — from the initial strategy phase to operational management.
- Part 1 - Governance Starter: Introduction to the Azure Cloud Adoption Framework
- Part 2 - Governance Starter: Strategy, Plan & Ready Phase
- Part 3 - Governance Starter: Adopt Phase
- Part 4 - Governance Starter: Govern Phase
- Part 5 - Governance Starter: Secure Phase
- Part 6 - Governance Starter: Manage Phase
- Outlook - Governance Audit