Ultimate Guide to Fail at Least Privilege in Cloud (and the Hard Lessons I Learned)

Least privilege is a defense-in-depth strategy that everyone talks about. While I first heard it a few years back this seemed to be a magical solution to a good number of security issues I faced.

Wikipedia defines least privilege as “every module must be able to access only the information and resources that are necessary for its legitimate purpose”.

This term pops up in any conversation involving authorization or its related buzzwords (Zero Trust, Microsegmentation, etc).

Despite everyone agreeing on the importance and the need for it, I haven’t found any articles or blog posts that talk about the hurdles while going that path (especially the non-technical aspects).

Whenever I search about this term I’m made to go down the rabbit hole to find articles that just say enable some cloud service and you’ll have the least privilege. Or buy our product and you’ll have the least privilege.

In the last few years working around cloud and cloud-native space, I have made multiple mistakes when trying to achieve the least privilege in the cloud. Unlike other blog posts I’ve written, this is a blog post showing all the ways I failed to implement least privilege.

Why publish all mistakes and failures?

Well, my understanding of what works comes from my experience of what didn’t work. I’m publishing it so my future self and others interested in least privilege can avoid making the same mistakes again.

Let’s get started with the mistakes I made and the lessons I learned from them.

Everyone talks about the least privilege in the cloud. The majority of these “least privileges” are just less privileges.

Let’s take an example of the IAM policy for an application to upload and download images on an S3 bucket.

Different Least Privileged Policies
Different least privileged IAM policies

The majority of security professionals I talked to are fine with policy 1 or 2 and consider them as the least privilege - as it restricts access to the application to a single bucket.

Some might consider the 3rd policy to be the actual least privilege - as it restricts access to a single bucket and the extension of the files.

However, there will be a minority of professionals who will argue that the 3rd policy is still not least privileged as there’s no restriction on the size of objects uploaded and no check on actual content type of objects that will be uploaded.

There’s no one correct answer that suits all. You will need to find the best position in the least privileged “spectrum”. You need to find what works for your organization.

Interestingly, in this least privilege spectrum, you are screwed at either end.

On one end, there are a lot of things that can be done if the application is compromised. On the other end, developers will curse you (from the bottom of their hearts?) on the number of requests they need to raise to make minor updates to IAM policies.

Least Privilege Spectrum
The Least Privilege Spectrum

Let’s say you read some fancy article saying how one achieved the least privilege with AWS Access Advisor and you wanted to try the same.

Sounds simple.

If your engineering people (especially the ones taking decisions) think least privilege is affecting developer productivity, your least privilege initiative will be scrapped.

If no one at the top level (CISO, CTO, HOE, etc) can get a buy-in for the least privilege, your initiative will end before it’s fully rolled out.

There are other common ways to not achieve the least privilege.

If you devs are into bad engineering practices - say reusing the same IAM roles for all their applications just to avoid requests to get permissions, your initiative fails.

If your devs don’t use IaC to create and manage resources, your initiative fails.

If your engineering processes just give production access to anyone who asks for it, your initiative fails.

Before doing any technical automation to try to go the journey of least privilege, getting alignment is mandatory.

For developers, non-production environments are virtual playgrounds. They create resources there. They test and tweak the performance. They make mistakes and learn from the mistakes. They experiment with multiple configurations before moving their features to prod.

Starting development with the least privilege in mind can counter the development efforts. There’s a high chance that devs are unaware of the most granular IAM permissions needed for the application or the new feature to work.

Enabling guardrails is a much better approach in non-prod environments than aiming to achieve less/least privilege. Guardrails such as region restriction, allowed EC2 instance sizes, etc can reduce the blast radius when your non-prod resources are compromised.

A clear segregation and isolation of your prod and non-prod workloads will do wonders.

It’s intuitive to think that if you achieve the least privilege, you are immune to major breaches. Do you think the same? Don’t worry, I have been there till I realized otherwise.

Least privilege is not a guarantee against high-impact breaches.

AWS IAM keys with privileges only to upload and download KYC documents can be leaked on GitHub. Your Cognito identity pool can grant you the least privileged access but your Cognito user pool could be misconfigured.

SQL injection or IDOR in your application with “least privileges” might still allow attackers to read/write to critical data in your databases. You’ll have to pentest your applications and protect them.

Least privilege is a defense in depth strategy at best - not the primary defense for your applications.

A few AWS folks I’ve talked to have enabled least privilege automation to fetch recommendations from AWS Access Advisor and apply Permission Boundaries or update the IAM policies altogether.

This does a great job in certain places. When developers deprecate features, AWS Access Advisor removes the IAM permissions that are no longer needed.

But it will have a fair share of exceptions.

Just because a bucket deletion functionality was not used in the last few months, it doesn’t mean an application no longer needs it. Just because your DevOps/Infra team member didn’t use a bunch of functionalities, it doesn’t mean they will never use it in the future.

When these exceptions occur, the Dev/DevOps team must have a way to roll back to previous policies. Or maybe handle such situations by momentarily allowing the addition of extra permissions via the portal with less friction.

Also, have you thought about this question: What’s the fallback if your CI/CD system goes down?

And probably this one: Should you try to reduce the privileges of your intentionally created “Admin” IAM group containing just a handful of people?

/ultimate-guide-to-fail-at-least-privilege-cloud/image-3.png

Have you ever wondered about the least privilege for your CI/CD systems? Especially those systems where you execute terraform to create other cloud resources.

CI/CD systems are designed to have high privileges. So do your in-house SSO applications that grant access to others.

I still haven’t found an answer to what can be the least privilege here.

In such scenarios where the least privilege is not possible, I do recommend having compensatory controls - Just-In-Time elevated access for resolving P0/P1 incidents, denying dangerous actions like deleting VPC flow logs, disabling GuardDuty, etc even if a person is an admin.

If you consider VPS providers as cloud providers, there are a good number of providers today still having primitive privileges - ReadOnly, Manage, Admin, Owner across all services in their platform. Your possible least privilege on these platforms is still not the least.

You can’t achieve granular privileges if your cloud provider doesn’t support in the first place.

Let’s say you want to focus on major cloud providers only - like AWS, GCP, and Azure.

There are known unknowns. For example, AWS Access Advisor doesn’t find the least privilege with resource-based policies. Access Advisor also doesn’t support services like Amazon SNS, Amazon API Gateway, etc.

To overcome these known unknowns, you will have to set up additional tooling to find the least privileges in these cases.

Then, there are unknown unknowns. These are things like undocumented Cloudtrail APIs, protocol mutation in AWS APIs, etc. We don’t know them and even if we know, it falls on the cloud provider’s promise of the shared responsibility.

If you have been reading this blog post, I’ll try to end with this one. True least privilege will not stop at the cloud.

Remember the 4Cs? - Cloud, Cluster, Container, and Code. The least privilege applies to almost all.

Let’s say you achieved the least privilege in the cloud. 😊

What about service accounts and their permissions in your Kubernetes cluster?

What if your applications are running as privileged containers in pods? Or what if the applications are running within the “root” user account in the container?

What about the privileges of DB credentials configured with your application?

What about the access level to Kafka topics it publishes to or consumes from?

The list of questions goes on.

Applying least privilege methodology for all at once is yet another recipe for disaster. However, take one at a time, solve it for your organization, learn from mistakes, and pick the next one to solve.


That’s all in this post. I hope I helped you avoid a few mistakes I made myself. See you in the next one. 👋

If you have any doubts/ideas/suggestions, feel free to reach out on LinkedIn.

PS: A few reached out to me about #100DaysOfAzureSecurity and asked why they aren’t receiving any emails. Sorry. I had paused it due to personal work. Will continue publishing it from next week.