unsubbed.co

Cloudquery

For security & authentication, Cloudquery is a self-hosted solution that provides ELT platform that enables easy data integration from hundreds of cloud and...

Open-source cloud security and governance, honestly reviewed. No marketing fluff, just what you get when you self-host it.

TL;DR

  • What it is: Open-source (MPL-2.0) data pipeline tool that syncs cloud infrastructure metadata into your database, then lets you query and govern it with SQL [README][website].
  • Who it’s for: Platform engineers, DevOps teams, and cloud governance leads managing multi-cloud infrastructure. This is not a non-technical founder tool — it requires SQL, Docker, and understanding of cloud IAM [README].
  • Cost savings: The CLI is free. The managed Platform product has contact-sales pricing with no public numbers [4][3]. Cost savings come from replacing or avoiding expensive CSPM/cloud governance SaaS tools like Wiz, Lacework, or Orca.
  • Key strength: Genuine open-source cloud asset inventory with 70+ source integrations, SQL-based policy engine, and full data sovereignty — your cloud config data never touches CloudQuery’s servers [README].
  • Key weakness: Steep setup curve for anything beyond the quickstart. Some plugins have been quietly moved from open-source to closed-source [README]. The tool recently announced it’s “joining env zero,” signaling a pivot or acquisition whose implications for the open-source project are unclear [website].

What is Cloudquery

CloudQuery is a data pipeline tool for cloud infrastructure metadata. It connects to AWS, Azure, GCP, Kubernetes, Cloudflare, GitHub, Okta, Wiz, and 70+ other sources, pulls configuration and security data, and writes it into a destination of your choice — PostgreSQL, Snowflake, BigQuery, S3, Elasticsearch, and others [README].

Once the data is in your warehouse, you query it with SQL. Want to find all EC2 instances missing the Backup tag? Write a SELECT. Want to detect public S3 buckets across all accounts? Write a SELECT. Want to build a FinOps dashboard showing cost by region and team? Point your BI tool at the tables CloudQuery populated.

The GitHub description puts it plainly: “Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions.” [README]. The website’s headline is less useful — “Cloud operations, without the chaos” — but the use cases listed tell you what it actually does: cloud asset inventory, cloud security posture management (CSPM), and FinOps cost aggregation [website].

The project has 6,344 GitHub stars and is licensed under MPL-2.0, not MIT. That distinction matters for commercial use and embedding — MPL-2.0 requires that modifications to MPL-licensed files be shared under the same license, but you can combine MPL code with proprietary code in larger works without open-sourcing everything [README][3].

One significant development at the time of writing: CloudQuery has announced it is “joining env zero” — the homepage banner reads “We’re moving from data to decisions” [website]. What this means for the open-source CLI and plugin ecosystem long-term is not stated. For anyone making a multi-year bet on this tool, that uncertainty is worth tracking.


Why people choose it

The independent review coverage of CloudQuery outside of its own learning center articles is thin — the third-party sources available for this review are mostly CloudQuery’s own content [1][2] and a software directory listing [4]. So this section draws primarily from the README, the website, and what the tool’s use cases reveal about who chooses it and why.

Versus proprietary CSPM tools (Wiz, Lacework, Orca, Prisma Cloud). These enterprise security platforms run $100K–$500K+ per year at mid-market scale. CloudQuery’s core pitch is that you can build the same cloud asset inventory and policy layer yourself, running on your own Postgres instance, with full control over what data goes where. SoftwarePlaza’s product listing describes it as “a secure data movement platform that helps enterprises sync cloud and SaaS data into data lakes or warehouses” with “volume-based pricing and flexible support packages” [4] — which is still enterprise pricing, but via contract rather than a fixed SaaS seat model.

Versus AWS Config / Azure Policy / GCP Asset Inventory. Each cloud provider has its own native asset management and policy tooling. The problem is that these tools are siloed: AWS Config can’t tell you about your GCP resources. CloudQuery’s value proposition is unification — one schema, one SQL interface, across all your cloud accounts and providers simultaneously [README][1].

Versus Steampipe. Steampipe is the closest open-source competitor — also SQL-based, also targeting cloud inventory and security. Steampipe’s approach is more query-time (it translates SQL into live API calls), while CloudQuery syncs data into a warehouse first and queries the stored copy. CloudQuery is therefore better for historical analysis, scheduled reporting, and high-volume data; Steampipe is better for ad-hoc interactive queries where you want live data without maintaining a warehouse [README].

The reasons people pick CloudQuery over the alternatives tend to be: they’re already running their own data infrastructure and want to add cloud governance data to it; they work across multiple cloud providers and need a single schema; or they need full data sovereignty because their security or compliance posture prohibits sending cloud config data to a third-party CSPM vendor [README][1][3].


Features

Source plugins (70+ integrations): The hub at cloudquery.io/hub lists source plugins for AWS, GCP, Azure, Kubernetes, GitHub, GitLab, Cloudflare, Oracle, PagerDuty, Okta, Wiz, Terraform, and many more [website][README]. Coverage for the three major cloud providers is deep — the AWS plugin alone covers hundreds of resource types across IAM, EC2, S3, RDS, Lambda, ECS, EKS, and on.

Destination plugins: Write synced data to PostgreSQL, MySQL, BigQuery, Snowflake, Redshift, S3, Azure Blob, Elasticsearch, Kafka, and others. The architecture is ETL: extract from source, transform into a normalized schema, load into destination.

SQL-based policy engine: Write detective policies as SQL queries. CloudQuery provides a policy framework where you define queries that should return zero rows (clean) or flag rows on violation. These can be run on a schedule, wired to Slack alerts, or integrated with Jira ticketing [website][README].

Apache Arrow transport: Internally, CloudQuery uses Apache Arrow as the in-memory data format for sync operations. This is a performance choice — Arrow’s columnar format is significantly faster for bulk data movement than row-by-row serialization [README].

Multi-language plugin SDK: Plugins can be written in Go, Python, JavaScript, or Java. The Plugin SDK is open-source at github.com/cloudquery/plugin-sdk [README]. If the built-in AWS plugin doesn’t cover a resource type you need, you can extend it or write your own.

Automation (Platform only): The managed Platform tier adds event-driven workflows — trigger on a policy violation, cost spike, or infrastructure drift, and route to Slack, webhooks, or a ticketing system automatically [website]. This is not available in the open-source CLI.

Important caveat on open-source scope: The README explicitly notes that some code has been moved from open-source to closed-source, and links to zip/CSV archives of the MPL-licensed versions for compliance [README]. This is a pattern worth watching — the trend has been toward moving more advanced plugin functionality behind the commercial Platform tier.


Pricing: SaaS vs self-hosted math

CloudQuery publishes no public pricing for the Platform tier. The website routes all pricing inquiries to a demo form [website]. SoftwarePlaza categorizes it as “Price Upon Request” [4].

What is free (CLI + open-source plugins):

  • The CLI binary and framework: $0 (MPL-2.0)
  • Community plugins for AWS, GCP, Azure, Kubernetes, and most major sources
  • Running syncs on your own infrastructure
  • No telemetry in “offline” mode (per the terms of service [3])

What costs money (Platform):

  • Managed platform with UI, policy engine, automation workflows
  • Commercial plugins (some source integrations are closed-source per the README)
  • Support contracts and SLAs

Infrastructure costs to self-host the CLI:

  • A VPS or server to run sync jobs: $10–50/mo depending on scale
  • A destination database (Postgres on $5–20/mo, or your existing warehouse)
  • Optional: Grafana for dashboarding, which is free for self-hosted

What you’re replacing: Commercial CSPM/cloud governance tools don’t have transparent pricing, but public reports put mid-market Wiz at $50K–200K/year; Lacework and Orca are in similar ranges. For a startup or SMB, this category is often completely out of reach. CloudQuery self-hosted brings the capability into reach at infrastructure cost only.

The realistic comparison isn’t CloudQuery vs. Zapier — it’s CloudQuery vs. “we don’t have cloud governance because we can’t afford Wiz.” For the right team, the math is compelling.


Deployment reality check

This is where honest assessment matters: CloudQuery is not a tool you hand to a non-technical founder. The installation path is brew install cloudquery/tap/cloudquery [README], which is clean. But what comes after requires:

  • Writing a YAML config file that specifies source and destination connections, including cloud credentials and API keys
  • Setting up IAM roles in AWS (or equivalent in GCP/Azure) with the right read permissions across every service you want to inventory — this alone can take hours across a large AWS account
  • Running and operating a destination database
  • Scheduling sync jobs (cron, GitHub Actions, Kubernetes CronJob, or Airflow)
  • Writing SQL policies yourself, or adapting the community policy packs

The CloudQuery quickstart guide gets you from zero to first sync in under 30 minutes if you know Docker and cloud IAM. If you don’t know what an IAM role is, you’re not the target user.

For teams that are already operating a data warehouse and have cloud infrastructure engineers, the integration lift is moderate — maybe 2–8 hours to get a meaningful first sync covering AWS and write a few basic policies.

What can go sideways:

  • IAM permission scoping errors will cause partial syncs with no obvious error — you’ll notice a resource type is missing and have to debug permissions
  • Large AWS accounts with hundreds of accounts in an Organization can produce multi-GB syncs per run; destination database sizing matters
  • The closed-source plugin situation creates uncertainty about future feature availability

Pros and cons

Pros

  • Full data sovereignty. Your cloud configuration data never touches CloudQuery’s servers in CLI mode. For regulated industries (finance, healthcare, government), this is a hard requirement that rules out most CSPM SaaS tools [README][3].
  • SQL is the right interface for this. Governance and inventory questions map naturally to relational queries. Anyone who knows SQL can write policies without learning a proprietary DSL.
  • 70+ source integrations with deep cloud coverage. The AWS plugin alone covers a breadth of resource types that would take years to build in-house [README][website].
  • Apache Arrow for performance. Large-scale syncs across hundreds of AWS accounts are genuinely fast [README].
  • Multi-cloud unification. One query that joins AWS, GCP, and Azure resources in a single result set is a real capability that native cloud tools can’t provide [1].
  • Multi-language plugin SDK. Go, Python, JavaScript, Java — you can extend it in whatever language your team uses [README].
  • MPL-2.0 is permissive enough for most uses. Unlike AGPL, you can embed MPL-2.0 code in larger proprietary works without open-sourcing everything [README].

Cons

  • Not for non-technical users. Zero chance a founder without cloud engineering experience deploys and operates this themselves. This is a tool for platform engineers [README].
  • Open-source scope is shrinking. The README explicitly acknowledges that code has been moved from open-source to closed-source [README]. This creates risk for teams building on the free tier.
  • “Joining env zero” is an open question. The homepage announcement of a pivot or acquisition creates real uncertainty about the product roadmap, especially for the open-source CLI [website].
  • No public pricing for the Platform. You can’t evaluate cost without a sales call [website][4]. For startups, this is a friction point.
  • Platform features are opaque. Automation, workflow triggers, and the UI layer are Platform-only but not well-documented without a demo engagement.
  • No native remediation. CloudQuery tells you what’s wrong; it doesn’t fix it. You’re responsible for wiring alerts to action (Slack, Jira, Lambda, whatever). That’s by design but requires additional engineering.
  • Thin independent review coverage. Most available written material is from CloudQuery’s own learning center. Independent third-party benchmarks and honest long-term user reviews are scarce.

Who should use this / who shouldn’t

Use CloudQuery if:

  • You’re a platform engineer or DevOps team managing multi-cloud infrastructure (AWS + GCP + Azure) and you need a unified asset inventory without paying $50K/year for a CSPM tool.
  • You already operate a data warehouse and want cloud governance data alongside your business data.
  • Your security or compliance posture requires that cloud configuration data stays on your own infrastructure.
  • You have engineers who know SQL and can write their own policies.
  • You want to build custom cloud governance tooling rather than configure a black-box SaaS product.

Skip it (stay on native cloud tools) if:

  • You’re only on one cloud provider and AWS Config / Azure Policy / GCP Asset Inventory covers your needs.
  • You don’t have engineering capacity to set up and operate a data pipeline.

Skip it (wait for more clarity) if:

  • You’re making a multi-year platform bet and the “joining env zero” pivot concerns you about the open-source trajectory.
  • You need the Platform tier features (automation, UI) and want transparent pricing before starting a vendor relationship.

Skip it (pick Steampipe) if:

  • You want interactive SQL queries against live cloud data without managing a sync pipeline and destination database.
  • Your use case is ad-hoc investigation rather than scheduled reporting and policy enforcement.

This isn’t for non-technical founders at all. The core value proposition — SQL queries over synced cloud configuration data — requires cloud IAM knowledge, database operations, and engineering capacity to maintain. There is no GUI wizard that replaces those prerequisites.


Alternatives worth considering

  • Steampipe — the closest open-source comparison. Query-time rather than sync-time; better for ad-hoc investigation. Also SQL-based, free, large plugin library.
  • AWS Config / Azure Policy / GCP Asset Inventory — native cloud governance tools. Siloed by provider, but deep for single-cloud shops and no additional cost within your cloud bill.
  • Wiz, Lacework, Orca, Prisma Cloud — enterprise CSPM. Full-featured with managed service, AI-powered findings, and dedicated security research teams behind them. Priced accordingly (six figures for mid-market).
  • Prowler — open-source CSPM focused specifically on AWS security checks. Less breadth than CloudQuery but more opinionated about security findings. Free.
  • Cartography — open-source cloud asset inventory from Lyft’s security team. Graph-based rather than relational; useful for relationship analysis between cloud resources. Less actively maintained.
  • Driftctl / Terracognita — infrastructure drift detection tools focused on IaC alignment rather than general inventory.

For a platform team choosing between open-source options, the realistic shortlist is CloudQuery vs. Steampipe vs. Prowler. CloudQuery if you want a warehouse-centric pipeline with broad multi-cloud coverage. Steampipe if you want live interactive queries. Prowler if you want an opinionated AWS security checker without building your own policy layer.


Bottom line

CloudQuery is a legitimate open-source tool for cloud asset inventory and governance, built by and for platform engineering teams. The SQL-based approach is sensible — it reuses skills engineers already have and integrates naturally with existing data infrastructure. The 70+ source integrations and multi-cloud coverage are genuinely strong. For a team that would otherwise spend $50K–$200K/year on a commercial CSPM, the self-hosted CLI is a compelling alternative.

The caveats are real: this is not beginner territory, some open-source functionality has been quietly moved behind the commercial tier, and the recent “joining env zero” announcement adds product uncertainty that any serious adopter should investigate before committing. The pricing opacity of the Platform tier is also a friction point for startups trying to evaluate total cost.

For the right team — cloud-native, SQL-literate, already running a data warehouse — CloudQuery earns serious consideration. For a non-technical founder trying to cut SaaS bills, it’s the wrong tool entirely. That’s not a criticism, it’s just an honest match between tool and audience.


Sources

  1. CloudQuery Learning Center — “Best Cloud Asset Management Software in 2026 (Compared)”. https://www.cloudquery.io/learning-center/cloud-asset-management-software
  2. CloudQuery Learning Center — “AWS Cost Optimization - 11 Tools and 11 Critical Best Practices”. https://www.cloudquery.io/learning-center/aws-cost-optimization
  3. CloudQuery, Inc. — Product Terms of Service (licensing and offline/online software definitions). https://www.cloudquery.io/legal/product-terms-of-service
  4. SoftwarePlaza — “CloudQuery Data Movement Platform” listing. https://www.softwareplaza.com/development/software/devops.html

Primary sources:

Features

Integrations & APIs

  • Plugin / Extension System
  • Slack Integration

Localization & Accessibility

  • Multi-Language / i18n