Did you completely remove secrets from git repository? Really?
Removing secrets from git repo is straightforward. With help of BFG Cleaner and privileges to force push the modified history, it’s a piece of cake.
I believed this until I found I was partially wrong - removing something from git history doesn’t remove them from git repository’s history.
TL;DR:
To delete secrets from a git repository, modifying the history and force-pushing it might not be sufficient if you use the GitHub Pull Requests feature. The PRs save a read-only copy of the code modifications. To remove the secret, you must de-reference/delete the GitHub PRs and trigger GitHub repo garbage collection to delete commits with secrets.
Have you ever committed anything sensitive in a git repository?
Sensitive like AWS credentials, SaaS login credentials, prod DB passwords, or even a CSV file containing customers’ SSNs.
If yes, that’s okay. All that matters to you (most probably) before committing a piece of code is that the code works. Unless we have something configured (like pre-commit hooks) to detect secrets, we can’t be aware of everything we commit into git repositories every time.
Pre-commit hooks to detect secrets are not foolproof solution. Most (if not all) tools search for specific regexes or high entropy. If you add a new regex for a particular secret type in the future and you find that such secrets are hardcoded already, you will have to revoke/rotate it or maybe remove it from git history.
Getting back to the topic, if you have committed secrets into code and you can regenerate these secrets, take the easiest route and revoke (or rotate) it.
From experience, I’ve seen folks occasionally take the red pill (removing from git history) if they:
- Committed something like SSNs, details and email IDs of clients, etc that cannot be rotated
- Don’t know where the secrets have been configured in prod/dev environments (just bad secrets management) and fear those envs going down because of revocation/rotation
This blog post is for those taking the red pill, especially for repos on GitHub Enterprise Server (Self-Hosted)!
The Story
It’s storytime.
You might be very familiar with this.
It’s been a busy month. I have been working through sprints, adding new and optimizing existing code based on product requirements. The code that I was writing for the last few days finally works on the local system. 🎉🎉🎉
I raised a PR with the code. My colleague who’s been busy debugging some issues peeks into my code, adds a few comments to improve code quality, and gives a 👍 for my PR.
Finally, PR gets merged. Work well done!
But wait, I notice that I have hardcoded AWS credentials in PR #2. 🤦♂️
I don’t want to revoke it because a few other developers are using the same AWS credential for some other project(s). So I add another commit to remove it.
We both know git remembers everything that’s committed - the hardcoded secret, the commit which introduced AWS keys, and the commit that removed it.
I will take the extra step of removing it from git history and force-push it.
BFG Cleaner
BFG cleaner is a great tool to help remove almost anything from git history. Sensitive strings, credentials, or even complete files containing sensitive data. If you are on Mac, you can easily install it using brew install bfg
.
NOTE: A precondition to delete any sensitive string/file using BFG Cleaner is that the string/file should not be present in the HEAD commit. In this story, I’ve already deleted the AWS credential in the last commit.
Steps to remove sensitive strings using BFG cleaner looks like this:
- Clone the repository using the
--mirror
flag:git clone --mirror [email protected]:badshah/secret-repo.git
- Save the strings we need to remove -
AKIAEXAMPLE123456KEY
and1111112222223333334444445555556666667777
tosecrets.txt
. - Execute the command:
bfg --replace-text secrets.txt secret-repo.git
- BFG will update all commits in all branches and tags but doesn’t delete any unwanted files (if any). If I wanted to delete a sensitive file in this repository, I need to execute:
cd secret-repo.git && git reflog expire --expire=now --all && git gc --prune=now --aggressive
. (In this example, as I’m just replacing sensitive text, there’s no need to execute this). - Finally, force push the changes
git push --force
There’s some error when pushing. But let’s see if the push has removed the AWS credentials from the code.
Hurray!!
I have successfully removed it from the git history.
How about the PRs?
I think the force push must have updated them as well.
Ah… I’m wrong. The git history doesn’t remember the AWS credentials. I can click on the commit ID from commit history and still see **REMOVED**
in the code. The GitHub repository remembers it because the original commit is still present in the the PR #2 commit list.
Anyone visiting the PR or the commit URL directly can still see the secrets.
Visiting the commit from PR shows the credentials along with an error message.
GitHub PRs
GitHub allows you to create, update and close PRs. It also allows you to comment (even debate 😜) and approve PRs. But it doesn’t allow you to delete PRs even if you are the repo owner.
The error message in git push output told this earlier. The forced push was unable to update the “read-only” branches that belonged to GitHub PRs.
There’s no option to remove the “read-only” behavior of PR branches. The two helpful PR options we will have are - de-referencing and deleting.
De-referencing or Deleting PRs
De-referencing PRs deletes the relationship between the PR and the commits it introduced. After de-referencing PRs, you can know who created the PR, at what time, and the comments made on PR but not the commits or code changes.
Deleting PRs is more straightforward - it just deletes the PR and any associated information.
De-referencing and deleting PRs are actions only accessible to GitHub Enterprise Admins. If you want to remove secrets added through PRs in public/private repositories on GitHub.com, you need to contact GitHub Support.
Getting back to the story, the next steps are:
- Find which PRs reference the particular git commit that introduced AWS creds
- De-reference or delete those PRs and make the commit URLs inaccessible.
The following steps work if you have GitHub Enterprise Admin access and have access to the GitHub Enterprise Server via SSH. If you are following this blog post to delete a secret in your own git repository, replace
github.enterprise.domain.com
andbadshah/secret-repo
with your GH hostname and repo name.
Finding the PRs
To find PRs referencing the particular git commit that introduced the AWS secret:
- Copy the commit ID. In my case, it’s
b851075285678c5234ec7416528108f148a216e5
- Log in to GitHub Enterprise Server and execute the command
ghe-repo badshah/secret-repo -c 'git for-each-ref --contains b851075285678c5234ec7416528108f148a216e5'
You see two PRs (#2 and #3) referencing the commit with AWS creds. If it’s a repo that has a lot of active contributors and a lot of PRs, the number of PRs can be more.
De-referencing the PR
Steps to de-reference PRs:
- SSH into the GitHub server and start the GHE console using
ghe-console -y
- Execute the following script:
1 2 3 4
repo = that "badshah/secret-repo" pr_numbers = [2, 3] prs = repo.pull_requests.select { |pr| pr_numbers.include?(pr.number) } prs.each(&:destroy_tracking_refs)
- Quit the GHE console using
quit
- Start the GitHub garbage collection:
ghe-repo-gc -v --prune badshah/secret-repo
- On the GitHub web app, head over to
https://github.enterprise.domain.com/stafftools/repositories/badshah/secret-repo
, select “Network”, and then click “Invalidate Git cache”.
Once de-referenced, the PRs appear as follows:
Also, as the original commit that introduced the AWS secret is no longer referenced by any branches, the garbage collection deletes. Accessing the commit URL gives a 404 error.
(Optional) Deleting the PR
If you don’t want to preserve the PR information (like comments, etc.), you can delete the PR instead of de-referencing it.
- Log in to the GitHub server and start the GHE Console:
ghe-console -y
- Execute the following script:
1 2 3 4 5 6 7
repo = that "badshah/secret-repo" pr_numbers = [2, 3] prs = repo.pull_requests.select { |pr| pr_numbers.include?(pr.number) } prs.each do |pr| pr.destroy_tracking_refs pr.issue.destroy end
- Quit the GHE console using
quit
- Start the GitHub garbage collection:
ghe-repo-gc -v --prune badshah/secret-repo
- On the GitHub web app, head over to
https://github.enterprise.domain.com/stafftools/repositories/badshah/secret-repo
, select “Network”, and then click “Invalidate Git cache”.
Final thoughts
As I showed you in this blog post, removing something from git repo is not just modifying git history. It requires you to modify the git history, force-push the modified history, and also take extra steps to de-reference/delete PRs.
While this blog post shows you how to “completely” remove secrets from GitHub repositories, using this should be the last step. Do all possible things to prevent secrets getting into git repos, say by developer awareness, pre-commit hooks, pre-receive hooks, etc.
PS: If you have a way (other than static regex checks on pre-receive hooks) to detect or prevent secrets/sensitive data from getting into git repos, I’m very interested. Please send a DM to @bnchandrapal or email to badshah -at- badshah.io.