What should you use - CloudQuery or Steampipe?

Badshah

2022-07-29 2022-07-29 1370 words 7 minutes

CloudQuery and Steampipe have very similar functionalities. The actual difference is with the way they work and the problems they solve. This blog post compares both the tools and helps you answer the question: What should I use - CloudQuery or Steampipe?

TL;DR:

CloudQuery and Steampipe have very similar functionalities
The actual difference is with the way they work and the problems they solve
CloudQuery:
- It is an asset inventory out of the box.
- It doesn’t support resource history but can dump all the latest resources to a local database using a single command.
- All resources are dumped to the local DB. All queries execute against local data in DB.
Steampipe:
- It adds SQL wrapper around web services and APIs. It can be used to create an asset inventory.
- It comes with an embedded Postgres DB but won’t store any resource data.
- Resources and their configurations are fetched on-demand for each SQL query. Steampipe uses Postgres Foreign Data Wrappers to abstract the web API calls to fetch those details.

Story time

Scanning your public assets periodically (at least once a week) is important. Period.

If you use the cloud for your prod and dev environments, it’s even more important. Who knows, maybe someone just opened internal payments portal to the internet “assuming” the AWS security groups take care of network access. 🤷‍♂️

With that said, I was looking for a tool that helps fetch all public assets on the AWS cloud. The idea was to pass the result on to network & web scanners for periodic scans. I was too lazy to build a tool that fetches public assets on AWS (because if I code such a tool I have to maintain it over time).

NOTE: I have complete read access to the AWS accounts. So, I am against the idea of running discovery tools like subfinder, amass or any other recon scripts. The problem with this “black-box” approach is it can never assure 100% coverage of all your public assets over time.

After a few hours of Google searches, I found there’s no off-the-shelf tool available to fetch all public assets of AWS. I came across Project Discovery’s cloudlist tool, but it was limited to EC2 public IPs and Route53 records. Finally, I ended up with two tools that looked very similar (at least based on their home pages) - CloudQuery and Steampipe. Both tools support listing down resources on AWS using simple SQL statements. The results can segregated to get the list of public resources.

AWS fanboys, I know what you are thinking - Why not use AWS Config? Simple. AWS Config is costly (as it’s a full-fledged asset inventory), and it doesn’t support all resource types in all regions. For example, it doesn’t monitor Cloudfront outside the us-east-1 region.

Both CloudQuery and Steampipe looked like the tool I needed:

Requirements	CloudQuery	Steampipe
Open source	✅	✅
Fetches all assets	✅	✅
Supports AWS Org	✅	✅
Allows segregating public assets	✅	✅
Easier to setup	✅	✅
Cost effective	✅*	✅
Other programs can interact	✅	✅
Good community support	✅	✅

* - CloudQuery requires an external database even for initial setup which would have an additional cost.

You can fetch data from both tools using similar SQL statements. Sometimes the only part that changes in both input queries is the table name.

For example, to get a list of all publicly accessible ELBv2 instances:

CloudQuery: SELECT * FROM aws_elbv2_load_balancers WHERE scheme = 'internet-facing'

Steampipe: SELECT * FROM aws_ec2_application_load_balancer WHERE scheme = 'internet-facing'

Both these tools also have other similarities. Both tools:

support Kubernetes and cloud platforms like GCP and Azure
allow integration with other visualization tools (like Grafana)
comes with off-the-shelf policies (like AWS CIS benchmarks, AWS foundational security best practices, etc.)

So I was puzzled on why two open source tools were competing to achieve the same thing. 🤔

After a few more hours of experiments with both tools, I finally understood the difference.

CloudQuery

CloudQuery is an asset inventory out of the box. You can think of it as an open-source (and cheaper) alternative to AWS Config Snapshot feature. It detects configuration details of multiple resource types along with multi-region and multi-account support.

Its setup requires an external database to store all the resources. The installation docs show how to set up a temporary local Postgres DB over a docker container.

Once you install CloudQuery, set up the database, and configure AWS credentials, just execute cloudquery init aws and cloudquery fetch. You will have all the resources across all regions in your default AWS profile dumped to the local database. It “extracts” the resource data from AWS via APIs and “loads” the data to your DB.

You can then use the data as per your requirements. You can query the data to see if it’s compliant with CIS benchmark checks by executing cloudquery policy run aws//cis_v1.2.0. If you have any visualization tools, you can try to visualize the asset inventory. There are a few open source CloudQuery Grafana dashboards.

The only (major) disadvantage is it doesn’t support asset history. Let’s say five S3 buckets were deleted in the past, they won’t appear in the latest data. The support for asset history using TimescaleDB is deprecated.

You can have workarounds for the disadvantage, like creating backups after each run, dumping results to JSON/CSV, etc. But that’s extra work that you need to do.

Steampipe

Steampipe is a utility that abstracts web APIs and gives you a SQL interface. You can fetch all AWS resources using simple SQL queries. However, it is not an asset inventory. Every query you execute makes HTTP requests to the cloud (if there’s no data in cache).

It uses Postgres Foreign Data Wrappers under the hood. One can create a Steampipe plugin for almost any website and define how SQL queries would interact with the website’s APIs. These plugins take care of things like pagination and hitting multiple APIs for a single query.

Steampipe has plugins that help fetch all your resources from the cloud. It also has mods that help visualize the fetched data and create simple compliance reports.

Once you install Steampipe, configure AWS credentials, just execute steampipe plugin install aws and steampipe query. Now you would have an interactive shell where you can fetch any AWS resource you want. Visualizing the data and creating compliance reports are just a few more commands away.

Steampipe CIS Dashboard — Source: https://raw.githubusercontent.com/turbot/steampipe-mod-aws-compliance/main/docs/aws_cis_v140_dashboard.png

Steampipe supports more service providers than CloudQuery. A good number of services that Steampipe supports have nothing to do with asset inventory (example: Shodan, Have I Been Pwned, etc).

But at the same time - the more plugins it has, the more advantageous it becomes.

If I want to create an in-house asset inventory tool that fetches resources from AWS cloud along with data on GitHub repositories and Heroku dynos, then Steampipe is the only tool that helps.

NOTE: Steampipe can fetch the latest data on-demand. It doesn’t store the query results anywhere - you will need to explicitly save them on disk. If you wish to use Steampipe as a database for your application, it has service mode.

A disadvantage of Steampipe is that you can unintentionally hit API ratelimits of your web services that you were querying.

What should you use: CloudQuery or Steampipe?

Use CloudQuery, if you want:

an asset inventory
to store all data about resources and configurations in a DB for compliance reasons
to query the data thousands of times and don’t want to get rate-limited by AWS
to check historical assets (this feature was deprecated but I hope it would be added in future)

Use Steampipe, if you want:

to query AWS resources on-demand a few times a day
to setup simple yet powerful asset inventory dashboards (without setting up external DB and visualization tools like Grafana)
to get CIS benchmarks/FedRAMP/HIPAA compliance reports with a single command
to use a single query to get data from two or more services that Steampipe supports (example, using Shodan to test AWS Public IPs)

Huge thanks to Bhanu Teja for helping me understand and experiment around CloudQuery.

Please feel free to reach out to me on Twitter or LinkedIn if you have any queries or if I missed anything that’s interesting.