Image 2
View All Posts

Azure Governance with CAF Govern: Mastering Security, Costs & Compliance

The practical guide to the Govern phase of the Microsoft Cloud Adoption Framework: team, risks, policies, Azure Policy & automation, FinOps, data and AI governance with best practices and examples.

Microsoft Azure
Cloud
Governance
Image

Introduction

The Govern phase in the Microsoft Cloud Adoption Framework (CAF) ensures that Azure usage is controlled, traceable, and compliant with regulations.

In this context, governance means a good balance between organization and technology. This includes clear responsibilities, fixed rules, automated checks, and continuous monitoring of the environment. Guardrails should be established so that topics such as security, compliance, and cost control can be brought into alignment.

Positioning of the Govern Phase

The Govern phase accompanies all CAF phases (Strategy, Plan, Ready, Adopt, Manage) and is therefore used permanently as a steering instrument. The essential process of the Govern phase can be divided as follows:

  1. A governance team is assembled, responsible for the rules and their enforcement.
  2. Risks in the cloud are assessed, for example regarding costs, security, or data loss.
  3. Based on this assessment, policies are defined and documented.
  4. These policies are continuously automatically checked and enforced.
  5. Cloud governance is monitored and adapted when necessary.

At the beginning, a simple foundation is sufficient, which is gradually improved based on clear metrics. To get started, you can follow these basic principles:

  • First observe when the risk is low.
  • Block when something is clearly not allowed.
  • Implement rules as Policy as Code so that policies remain reproducible and traceable.

In our previous blog posts of this series, we have continuously worked out the individual areas of action for each CAF phase. Accordingly, the following diagram shows the areas of action of the Govern phase.

Overview of the Govern Phase in the Cloud Adoption Framework

Establishing a Governance Team

An effective governance team is small, cross-functional, and equipped with a clear mandate. It brings together experts from cloud architecture, security, operations, FinOps, data and AI governance, as well as compliance and legal. The leadership role is taken by an executive sponsor at the C-level (e.g., CIO, CTO, or CISO), who provides the mandate, clarifies escalation paths, and ensures visibility within the organization.

The governance team is responsible for the policy catalog, maintains the risk register, prioritizes measures, and reports regularly on progress. This ensures that governance does not just exist on paper, but is actively managed and improved.

Typical roles in the governance team include:

  • Governance Lead: Overall responsibility, framework and policy catalog, reporting, leadership of the governance board.
  • Cloud Platform Owner: Management of management groups, setup of landing zones, assignment of policies, automation through pipelines.
  • Security Lead: Implementation of security policies, use of Defender for Cloud, identity and access controls, threat modeling, mapping of regulatory requirements.
  • FinOps Lead: Management of budgets and forecasts, detection of cost anomalies, use of reservations and savings plans, allocation via showback/chargeback.
  • Data Governance Lead: Classification and protection of data, data lifecycle management, data loss prevention.
  • AI Governance Lead: Ensuring responsible AI, control of content safety, securing AI-powered workloads, conducting red teaming.

An important organizational principle is that platform-wide policies are set centrally at the management group or subscription level. Workload teams, on the other hand, are responsible for implementing and adhering to policies within their specific domain. This is the only way to avoid duplicate responsibilities and policy drift.

Assessing Cloud Risks

The assessment of risks in the cloud begins with a complete inventory. This includes all resources in the Azure Portal, queries via the Resource Graph, as well as artifacts from CI/CD pipelines or scripts. On this basis, a workload-independent risk catalog is built that describes typical threats and vulnerabilities.

Several Azure tools are available for identifying concrete findings:

  • Defender for Cloud provides insights into security configurations and threats.
  • Azure Advisor recommends optimizations in cost, security, performance, and availability.
  • Entra ID Secure Score evaluates identity and access policies.
  • Purview supports the classification and control of sensitive data.
  • Cost Management highlights cost risks and anomalies.

Prioritization is usually carried out by multiplying Probability × Impact. Optionally, the impact can be calculated monetarily, for example using the Annualized Loss Expectancy (ALE). This creates a more objective basis for classifying risks.

Each risk should also be linked to an owner who is responsible for addressing it. To reduce risks in a structured way, the following methodology can be applied:

  • Avoid by changing the design,
  • Reduce through additional controls,
  • Transfer via insurance or external contracts,
  • Accept with documented approval.

Documenting Governance Policies

Governance policies are precise target specifications that describe what is allowed, required, or prohibited in the cloud environment. They form the basis for technical controls but are not JSON expressions or implementations themselves.

Each policy should be clearly and transparently structured. Typical components include:

  • ID: Unique identifier of the policy
  • Category: Classification, e.g., security, cost, or data
  • Risk Reference: Link to an entry in the risk register
  • Statement: Precise directive, formulated as must, should, or must not
  • Purpose: The reason the policy exists
  • Scope: Area of application, e.g., management group, subscription, or resource type
  • Monitoring: Definition of how compliance will be verified
  • Remediation: Procedure in case of violations – whether only audited, automatically remediated, or blocked
  • Change Process: Regulated changes, e.g., via RFC procedures, regular reviews, or event-based adjustments

Policies are generally formulated to be workload-independent, so they remain transferable to different scenarios. Exceptions must be clearly documented and approved to prevent uncontrolled growth and policy drift.

A central example is the tagging standard. To ensure metadata is maintained consistently, the policy defines mandatory fields and permitted value ranges, for example:

TagPurpose
OwnerAccountability
CostCenterCost allocation
EnvironmentLifecycle/Environment
DataClassData protection & controls
BusinessUnitReporting/Showback
AppNameWorkload assignment

The value ranges are clearly defined, and any deviation or exception must be explicitly approved. This ensures responsibilities, cost centers, and data classifications can be properly tracked and technically evaluated.

Enforcing Policies through Automation

The technical core of the Govern phase is Azure Policy. Policies are usually assigned above the subscriptions to management groups, so they are inherited automatically. To keep policies consistent and manageable, they are bundled into initiatives (policy sets).

Policy types and effects include:

  • Deny: blocks violations (e.g., use of prohibited regions or resource types)
  • Audit / AuditIfNotExists: reports deviations and is ideal for starting with the monitor-first principle
  • DeployIfNotExists (DINE): automatically enforces requirements (e.g., diagnostic settings, private endpoints)
  • Modify / Append: adds or normalizes fields (e.g., tags, minimum TLS version)

Best practices for policy definition include:

  • Definition Location: Place policies as high as possible in the platform hierarchy (e.g., platform management group).
  • Parameterization instead of duplicates: Set values flexibly via parameters rather than defining policies multiple times.
  • Use initiatives: Manage parameters centrally and bundle policies consistently to avoid duplicate assignments.
  • Consider lifecycle: Start in audit mode and measure drift, then gradually switch to enforce (Deny/Modify) with a clearly planned transition phase.

Policies, initiatives, and assignments should also be version-controlled in a repository and deployed via CI/CD pipelines across stages (Dev, Test, Prod). Pull requests and code reviews ensure that changes are made in a controlled and traceable way. CI/CD pipelines can also be extended with linters, dry runs, and reporting to ensure proper configuration.

Automating Cost Control

Cost management is a central part of cloud governance, as uncontrolled spending can quickly become a risk to budgets and financial viability. The foundation is the setup of budgets per subscription or resource group, linked with automatic notifications to FinOps teams and responsible product owners. This ensures that overruns are detected early.

Equally important is continuous right-sizing of resources. This involves checking whether the instance sizes and services used really match actual usage. Oversized resources can be downsized, and unused workloads can be shut down.

For predictable and recurring workloads, reservations and savings plans are a good option. They allow resources to be used at a lower cost over longer periods if utilization remains stable. Regular monitoring of coverage (how many workloads are covered) and utilization (how effectively reservations are used) is essential to maximize financial benefits.

Another important building block is anomaly detection. Unexpected cost increases caused by misconfigurations, forgotten resources, or misuse can thus be detected and addressed early.

Finally, tag-based cost allocation provides transparency. When resources are consistently tagged with metadata such as Owner, Business Unit, or Cost Center, expenses can be clearly assigned to specific projects or departments. This enables showback (cost reporting) or chargeback (direct cost allocation), which in turn strengthens cost awareness across business units.

Continuous Monitoring and Optimization

Governance is only as effective as its transparency and responsiveness. Policies only realize their value when they are continuously monitored, assessed, and adapted. For this purpose, it is advisable to introduce a governance workbook that consolidates key perspectives such as compliance, security, identity, and cost.

Based on this, metrics and targets can be defined that allow objective evaluation. These include, for example, policy compliance rate, average time to remediate deviations, budget variances, or indicators for the quality of identity and access controls.

Numerous platforms and signals are available for monitoring. Azure Policy provides insight into the status of policies and initiatives, shows drift trends over time, and makes exceptions transparent. Defender for Cloud supports with Secure Score, regulatory mappings, and recommendations, while Microsoft Entra ID provides insights into identity policies, e.g., via Secure Score, access reviews, or effectiveness of conditional access.

Cost control can also be tightly integrated by continuously comparing budgets, forecasts, and actuals, and by detecting anomalies early. In addition, Azure Advisor as well as Service and Resource Health help make optimization opportunities and operational events visible. With Azure Monitor, metrics, logs, and traces can be collected, visualized in workbooks, and forwarded via alerts to responsible teams. Integration into existing ITSM systems such as ServiceNow or Jira ensures that deviations flow directly into operational processes.

For advanced security requirements, Microsoft Sentinel can be used. As a SIEM/SOAR solution, it correlates signals, detects threats, automates incident handling, and enables proactive hunting.

Optimization follows a clear logic. Risks with high relevance are strictly enforced with automated remediation measures and well-defined escalation paths. Risks with lower impact start in audit mode, so that deviations are first made visible and evaluated together with workload teams. Only in a second step are stricter measures such as deny or modify policies activated. A continuous feedback loop ensures that lessons learned are fed back into governance, for example through adjustments to policy statements, parameters, or scopes. Microsoft-provided built-in policies should also be reviewed and updated regularly to incorporate new best practices or regulatory requirements.

Governance thus becomes a living process that not only sets static guardrails but also enables continuous improvement. With clear metrics such as policy compliance rate, short response times for security-relevant deviations, low budget variances, and high rates of identity security and multi-factor authentication, effectiveness becomes transparent and verifiable. The result is a governance system that remains adaptable over time and steers cloud usage both securely and efficiently.

Core Governance Areas in Practice

Once governance objectives, policies, and technical enforcement via Azure Policy are defined, their effectiveness becomes visible in seven tightly integrated domains. Each domain connects Risk → Policy → Technical Control → Operations and relies on policies/initiatives, landing zone structures, and continuous evaluation.

Security (Access & Exposure)

Identities, permissions, and the exposed attack surface form the first line of defense in the cloud. In Microsoft Entra ID, mechanisms such as Conditional Access and Privileged Identity Management (PIM) ensure that privileged roles can only be used with just-in-time access. At the Azure resource level, role- and attribute-based access controls (RBAC/ABAC) restrict users’ scope of action and thereby minimize the risk of misuse or mistakes.

Landing zones also enforce clear guardrails at the network level. Sensitive platform services (PaaS) are connected exclusively via private endpoints, with public network access disabled by default. In addition, up-to-date protocol and cryptographic standards such as TLS 1.2+ are enforced as mandatory, and cryptographic keys are rotated regularly.

These requirements are technically implemented via policies. They prevent public exposure of resources, enforce private endpoints or firewalls, mandate diagnostic settings, allow only approved regions or SKUs, and protect Key Vaults with purge protection and firewall rules.

As a result, the attack surface is significantly reduced without blocking development teams in their work. Instead, secure standards are automatically enforced so that innovation and protection remain equally ensured.

Compliance

Compliance requirements such as GDPR or ISO 27001 are not only documented in the cloud but directly enforced through policies. The starting point is regulatory initiatives, i.e., predefined policy sets such as the Microsoft Cloud Security Benchmark, assigned to management groups or subscriptions. Defender for Cloud consolidates the status of these initiatives, calculates the Secure Score, shows the level of compliance with regulatory standards, and highlights concrete deviations.

When requirements are higher, standard policies alone are not sufficient. In such cases, additional attestation paths are established—for example, the use of confidential computing or customer-managed keys (CMKs). This makes it possible to operate workloads in demonstrably compliant modes and meet regulatory requirements without gaps.

Cost Management

Cost management is not an optional reporting add-on but a central governance duty. Budgets and cost alerts are configured at the subscription or resource group level and linked with forecasts and clear responsibilities. Equally important is the quality of the resource tags used. Fields such as Owner, CostCenter, or Environment ensure that costs can be transparently assigned and that showback or chargeback models function reliably.

An effective FinOps model also relies on systematically eliminating idle or orphaned resources and continuously right-sizing so that instances match actual needs. For predictable and recurring workloads, reservations and savings plans are used. Governance regularly monitors both coverage and utilization to maximize financial benefits. Unexpected cost increases or anomalies are proactively addressed through anomaly detection.

Operations

Stable cloud operations are achieved through standardized diagnostics and clearly defined responsibilities. All logs and metrics are centrally collected in workspaces via diagnostic settings, enabling consistent evaluation. Alerts are not handled in isolation but forwarded via action groups or connected ITSM systems directly to the responsible parties. Service and resource health data is also monitored proactively so that potential issues can be detected and addressed early.

Resource Management

Resource management forms the organizational backbone of cloud governance. At its core is the management group hierarchy, extending from the tenant root through platform and landing zone levels to business units, regions, or individual environments. This structure ensures that policies are consistently inherited and responsibilities are clearly traceable.

A well-designed subscription strategy separates production from non-production environments, accounts for organizational units, and reflects regulatory requirements. Within subscriptions, resource groups are structured around workloads and their lifecycles.

Guardrails are defined through policies. They restrict allowed regions, resource types, and SKUs, enforce strict naming and tagging standards, and mandate diagnostic settings. This creates a consistent framework within which workloads can be operated securely and efficiently.

Transparency is ensured by the Azure Resource Graph, which provides centralized inventory of all resources and highlights deviations or drift across the environment. These insights are indispensable for audits as well as targeted corrections, making resource management a central governance control instrument.

Data Control

Data control is a core part of cloud governance, ensuring the secure and compliant handling of information. With Microsoft Purview, data sources can be inventoried and classified. This not only shows which datasets exist but also how they are connected. Sensitivity labels and data loss prevention (DLP) rules complement this classification and enable uniform handling of sensitive information.

Another key focus is the data lifecycle. Policies enforce retention periods, archiving, and the use of immutable storage. Encryption is also a foundational principle. Customer-managed keys (CMKs) or hardware security modules (managed HSMs) ensure data is always effectively protected. Transport encryption is considered a non-negotiable baseline standard.

In addition, egress controls ensure that data leaves the network only through approved paths. Mechanisms such as Private Link or dedicated approved endpoints prevent uncontrolled data exfiltration.

The result is strong evidence that not only meets data protection requirements but also provides regulators and auditors with verifiable proof of compliant operations at any time.

AI Governance

The general principles of cloud governance also apply to AI workloads, complemented by specific responsible AI guidelines. A central element is securing content through content safety mechanisms and filters that prevent harmful or inappropriate outputs.

When using retrieval-augmented generation (RAG) architectures, the data sources used are documented so that data flows remain transparent and verifiable at all times. In addition, telemetry on prompts and outputs ensures that audits are possible and that issues can be detected early.

Controlling data flows also plays a decisive role here. Egress controls ensure that AI systems only communicate with approved endpoints and that no uncontrolled data exfiltration occurs. To keep risks under continuous review, the models and processes used are regularly tested and evaluated through red teaming.

Governance is not applied afterwards but integrated directly into the software development lifecycle (SDLC). Gates in build and release pipelines ensure that policies and approvals are strictly enforced. This creates a framework in which AI workloads can be operated responsibly, transparently, and in compliance with regulations.

Conclusion

The Govern phase of the Microsoft Cloud Adoption Framework combines organizational and technical measures into a clear framework that aligns security, cost control, compliance, and innovation.

Key success factors include a governance team with clear roles, a transparent risk register, a consistent policy catalog, and rigorous enforcement through Azure Policy and policy-as-code pipelines. Continuous monitoring and feedback loops ensure that governance does not remain static but evolves in line with business needs, regulatory requirements, and the technical capabilities of the platform.

By following these principles, organizations not only establish reliable guardrails for cloud usage but also build trust with business units, management, and regulators. Thus, governance becomes not a perceived brake but an enabler for secure and efficient innovation in Azure.


Interested in Working Together?

We look forward to hearing from you.

Don't like forms?

mertkan@henden-consulting.de