Did you completely remove secrets from git repository? Really?

Removing secrets from git repo is straightforward. With help of BFG Cleaner and privileges to force push the modified history, it’s a piece of cake.

I believed this until I found I was partially wrong - removing something from git history doesn’t remove them from git repository’s history.


TL;DR:

To delete secrets from a git repository, modifying the history and force-pushing it might not be sufficient if you use the GitHub Pull Requests feature. The PRs save a read-only copy of the code modifications. To remove the secret, you must de-reference/delete the GitHub PRs and trigger GitHub repo garbage collection to delete commits with secrets.


Have you ever committed anything sensitive in a git repository?

Sensitive like AWS credentials, SaaS login credentials, prod DB passwords, or even a CSV file containing customers’ SSNs.

If yes, that’s okay. All that matters to you (most probably) before committing a piece of code is that the code works. Unless we have something configured (like pre-commit hooks) to detect secrets, we can’t be aware of everything we commit into git repositories every time.

Pre-commit hooks to detect secrets are not foolproof solution. Most (if not all) tools search for specific regexes or high entropy. If you add a new regex for a particular secret type in the future and you find that such secrets are hardcoded already, you will have to revoke/rotate it or maybe remove it from git history.

Revoke secrets or remove from history
The Dilemma: revoke secrets or remove from git history

Getting back to the topic, if you have committed secrets into code and you can regenerate these secrets, take the easiest route and revoke (or rotate) it.

From experience, I’ve seen folks occasionally take the red pill (removing from git history) if they:

  • Committed something like SSNs, details and email IDs of clients, etc that cannot be rotated
  • Don’t know where the secrets have been configured in prod/dev environments (just bad secrets management) and fear those envs going down because of revocation/rotation

This blog post is for those taking the red pill, especially for repos on GitHub Enterprise Server (Self-Hosted)!

The Story

It’s storytime.

You might be very familiar with this.

It’s been a busy month. I have been working through sprints, adding new and optimizing existing code based on product requirements. The code that I was writing for the last few days finally works on the local system. 🎉🎉🎉

I raised a PR with the code. My colleague who’s been busy debugging some issues peeks into my code, adds a few comments to improve code quality, and gives a 👍 for my PR.

Finally, PR gets merged. Work well done!

But wait, I notice that I have hardcoded AWS credentials in PR #2. 🤦‍♂️


Hardcoded AWS credentials in PR #2
Hardcoded AWS credentials in PR #2

I don’t want to revoke it because a few other developers are using the same AWS credential for some other project(s). So I add another commit to remove it.


PR #3 to remove AWS creds
PR #3 to remove AWS creds

We both know git remembers everything that’s committed - the hardcoded secret, the commit which introduced AWS keys, and the commit that removed it.

I will take the extra step of removing it from git history and force-push it.

BFG cleaner is a great tool to help remove almost anything from git history. Sensitive strings, credentials, or even complete files containing sensitive data. If you are on Mac, you can easily install it using brew install bfg.

NOTE: A precondition to delete any sensitive string/file using BFG Cleaner is that the string/file should not be present in the HEAD commit. In this story, I’ve already deleted the AWS credential in the last commit.

Steps to remove sensitive strings using BFG cleaner looks like this:

  1. Clone the repository using the --mirror flag: git clone --mirror [email protected]:badshah/secret-repo.git
  2. Save the strings we need to remove - AKIAEXAMPLE123456KEY and 1111112222223333334444445555556666667777 to secrets.txt.
  3. Execute the command: bfg --replace-text secrets.txt secret-repo.git
  4. BFG will update all commits in all branches and tags but doesn’t delete any unwanted files (if any). If I wanted to delete a sensitive file in this repository, I need to execute: cd secret-repo.git && git reflog expire --expire=now --all && git gc --prune=now --aggressive. (In this example, as I’m just replacing sensitive text, there’s no need to execute this).
  5. Finally, force push the changes git push --force

Force pushing modified git history
Force pushing modified git history

There’s some error when pushing. But let’s see if the push has removed the AWS credentials from the code.


Checking git history from GitHub
Checking git history from GitHub shows REMOVED

Hurray!!

I have successfully removed it from the git history.

How about the PRs?

I think the force push must have updated them as well.


Commits in PR
Code changes in PR

Ah… I’m wrong. The git history doesn’t remember the AWS credentials. I can click on the commit ID from commit history and still see **REMOVED** in the code. The GitHub repository remembers it because the original commit is still present in the the PR #2 commit list.

Anyone visiting the PR or the commit URL directly can still see the secrets.

Visiting the commit from PR shows the credentials along with an error message.


The original commit
The original commit

GitHub allows you to create, update and close PRs. It also allows you to comment (even debate 😜) and approve PRs. But it doesn’t allow you to delete PRs even if you are the repo owner.

The error message in git push output told this earlier. The forced push was unable to update the “read-only” branches that belonged to GitHub PRs.


Forced push output
Forced push output

There’s no option to remove the “read-only” behavior of PR branches. The two helpful PR options we will have are - de-referencing and deleting.

De-referencing PRs deletes the relationship between the PR and the commits it introduced. After de-referencing PRs, you can know who created the PR, at what time, and the comments made on PR but not the commits or code changes.

Deleting PRs is more straightforward - it just deletes the PR and any associated information.

De-referencing and deleting PRs are actions only accessible to GitHub Enterprise Admins. If you want to remove secrets added through PRs in public/private repositories on GitHub.com, you need to contact GitHub Support.

Getting back to the story, the next steps are:

  1. Find which PRs reference the particular git commit that introduced AWS creds
  2. De-reference or delete those PRs and make the commit URLs inaccessible.

The following steps work if you have GitHub Enterprise Admin access and have access to the GitHub Enterprise Server via SSH. If you are following this blog post to delete a secret in your own git repository, replace github.enterprise.domain.com and badshah/secret-repo with your GH hostname and repo name.

To find PRs referencing the particular git commit that introduced the AWS secret:

  1. Copy the commit ID. In my case, it’s b851075285678c5234ec7416528108f148a216e5
  2. Log in to GitHub Enterprise Server and execute the command ghe-repo badshah/secret-repo -c 'git for-each-ref --contains b851075285678c5234ec7416528108f148a216e5'

PRs referencing commit with secrets
PRs referencing commit with secrets

You see two PRs (#2 and #3) referencing the commit with AWS creds. If it’s a repo that has a lot of active contributors and a lot of PRs, the number of PRs can be more.

Steps to de-reference PRs:

  1. SSH into the GitHub server and start the GHE console using ghe-console -y
  2. Execute the following script:
    1
    2
    3
    4
    
    repo = that "badshah/secret-repo"
    pr_numbers = [2, 3]
    prs = repo.pull_requests.select { |pr| pr_numbers.include?(pr.number) }
    prs.each(&:destroy_tracking_refs)
    
  3. Quit the GHE console using quit
  4. Start the GitHub garbage collection: ghe-repo-gc -v --prune badshah/secret-repo
  5. On the GitHub web app, head over to https://github.enterprise.domain.com/stafftools/repositories/badshah/secret-repo, select “Network”, and then click “Invalidate Git cache”.

Once de-referenced, the PRs appear as follows:


De-referenced PR
De-referenced PR

De-referenced PR code changes
De-referenced PR code changes

Also, as the original commit that introduced the AWS secret is no longer referenced by any branches, the garbage collection deletes. Accessing the commit URL gives a 404 error.


Visiting the original commit URL
404 when visiting the original commit URL

If you don’t want to preserve the PR information (like comments, etc.), you can delete the PR instead of de-referencing it.

  1. Log in to the GitHub server and start the GHE Console: ghe-console -y
  2. Execute the following script:
    1
    2
    3
    4
    5
    6
    7
    
    repo = that "badshah/secret-repo"
    pr_numbers = [2, 3]
    prs = repo.pull_requests.select { |pr| pr_numbers.include?(pr.number) }
    prs.each do |pr|
      pr.destroy_tracking_refs
      pr.issue.destroy
    end
    
  3. Quit the GHE console using quit
  4. Start the GitHub garbage collection: ghe-repo-gc -v --prune badshah/secret-repo
  5. On the GitHub web app, head over to https://github.enterprise.domain.com/stafftools/repositories/badshah/secret-repo, select “Network”, and then click “Invalidate Git cache”.

As I showed you in this blog post, removing something from git repo is not just modifying git history. It requires you to modify the git history, force-push the modified history, and also take extra steps to de-reference/delete PRs.

While this blog post shows you how to “completely” remove secrets from GitHub repositories, using this should be the last step. Do all possible things to prevent secrets getting into git repos, say by developer awareness, pre-commit hooks, pre-receive hooks, etc.


PS: If you have a way (other than static regex checks on pre-receive hooks) to detect or prevent secrets/sensitive data from getting into git repos, I’m very interested. Please send a DM to @bnchandrapal or email to badshah -at- badshah.io.