unsubbed.co

Tabby

Open-source AI coding assistant. A self-contained, self-hosted alternative to GitHub Copilot tailored for your development needs.

Open-source AI code completion, honestly reviewed. Privacy-first, no per-seat subscription, and it runs on hardware you already own.

TL;DR

  • What it is: Self-hosted AI coding assistant with code completion, inline chat, and an Answer Engine — a privacy-preserving alternative to GitHub Copilot [README].
  • Who it’s for: Developer teams and individual engineers who want AI-assisted coding without sending source code to a third-party cloud — particularly useful for companies with IP sensitivity or compliance requirements [README][2].
  • Cost savings: GitHub Copilot Business runs $19/user/month. For a 5-person team, that’s $1,140/year. Tabby self-hosted runs on a shared VPS for $10–20/month — roughly $1,000/year saved for the same team [README][2].
  • Key strength: Fully local inference, no data leaves your network, supports consumer-grade GPUs, and integrates with VS Code, JetBrains, Vim, and Neovim out of the box [README][2].
  • Key weakness: Completion quality in the project’s early days lagged behind cloud-hosted competitors, and the setup requires GPU-capable hardware or a VPS with GPU pass-through to match Copilot’s response speed [1][2]. License terms are listed as unspecified in the package metadata — check the GitHub repo directly before committing to this for commercial use.

What is Tabby

Tabby is a self-hosted AI coding assistant. You run a server on your own hardware or VPS, install an IDE extension, and get AI code completion, a chat interface, and an answer engine — all without a single line of your source code touching GitHub’s or OpenAI’s servers.

The GitHub README’s pitch is direct: “Self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot.” Three architectural decisions distinguish it from the cloud alternatives. First, it is self-contained — no external database management system required, no cloud service dependency, everything runs in a single Docker container [README]. Second, it exposes an OpenAPI interface, which means you can integrate it into Cloud IDEs, CI systems, or any custom tooling [README]. Third, it supports consumer-grade GPUs — you don’t need an A100 to run it, a gaming GPU or a VPS with a mid-range GPU will do [README][2].

The project has accumulated 33,022 GitHub stars, which places it among the most-starred self-hosted developer tools. It ships IDE plugins for VS Code, IntelliJ IDEA, PyCharm, Vim, and Neovim. The server handles model hosting, context indexing, and user management, while the plugins handle the real-time completion overlay in your editor [README][2].

Recent development (as of 2025) has moved well beyond basic completions. The v0.30 release added GitLab Merge Request indexing as context. v0.29 added REST APIs for injecting your own documentation. v0.28 introduced persistent, shareable Answer Engine pages. v0.24 added LDAP authentication — the kind of feature that signals a project has moved from “hobby tool” to “enterprise-viable” [README].


Why people choose it

The case for Tabby comes down to three things: privacy, cost, and control. These are not marketing points — they come directly from the technical constraints that push engineering teams away from Copilot.

The privacy argument is real. When you use GitHub Copilot or Cursor, your code is sent to a cloud inference endpoint. For companies with IP-sensitive codebases, NDAs, or security compliance requirements, that’s a non-starter. Tabby’s entire value proposition is that inference happens on your hardware: “All data processing remains on your own hardware and network” [2]. No code snippet leaves the building.

The per-seat pricing argument compounds over time. Copilot Individual is $10/month. Copilot Business is $19/user/month. Cursor Pro is $20/month. These numbers look small per person but scale badly. A 10-person dev team on Copilot Business is $2,280/year — before any enterprise features. Tabby self-hosted on a single GPU-enabled VPS costs the same regardless of team size [2][README].

The model flexibility argument is underrated. With Tabby, you choose which model backs your completions. The VirtualizationHowto comparison [2] shows an example deployment using StarCoder-1B for completions and Qwen2-1.5B-Instruct for chat. If a better open-source model drops next month, you update your deployment. With Copilot, you get whatever Microsoft decided to ship, with no control over model updates or rollbacks.

The early 2023 GIGAZINE hands-on review [1] is worth reading as a historical document. At that point in the project’s life, completions were unreliable — a Fibonacci function request returned syntactically plausible but logically wrong code, and inference on an i7-6800K took 10–30 seconds. The GIGAZINE reviewer did note that adding a comment (”// calculate Fibonacci numbers”) dramatically improved the output, highlighting a lesson that applies to all code LLMs: context quality directly determines completion quality [1]. That 2023 snapshot is not the current state of the project — Tabby has shipped 30+ major releases since — but it’s honest context for what the tool was when it started.


Features

Code completion: Real-time, context-aware completions as you type. The engine pulls context from open files, recently edited buffers, and indexed repository code [README][2]. The VirtualizationHowto review notes that it provides “real-time suggestions, completions, and inline documentation” across all supported IDEs [2].

Answer Engine: A chat interface embedded in your IDE that can answer questions about your codebase, generate code from descriptions, explain functions, and more. As of v0.13 (July 2024), the Answer Engine became a “central knowledge engine for internal engineering teams” with integration into internal dev data sources [README]. v0.28 added persistent, shareable pages from Answer Engine threads — useful for distributing technical decisions across a team [README].

Inline Chat: Contextual AI discussion tied directly to a code block, without leaving the editor. The website describes this as making “collaboration more efficient and focused” — you select a function, ask a question, get a response in context [website].

Data Connectors (Context Providers): Tabby can index external documentation, configuration files, and data sources to inform its completions. v0.29 added REST APIs for pushing your own internal documentation into the context engine [README]. v0.30 added GitLab Merge Request indexing, so the assistant can reference open MRs as context [README].

Model flexibility: You choose your completion and chat models independently. The Docker example from the VirtualizationHowto review uses --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct — two different models for two different tasks [2]. Compatible with Code Llama, CodeGemma, CodeQwen, Codestral, and others via the Tabby model registry [README].

Administration features:

  • LDAP authentication (v0.24+) [README]
  • SSO support [merged profile]
  • REST API for programmatic management [merged profile][README]
  • Plugin system [merged profile]
  • Llamafile deployment integration (v0.21+) for simplified local model setup [README]

Pochi (Agent — private preview): A separate but related tool that connects GitHub issues to implementation tasks and can create PRs with CI/lint/test result breakdowns directly from the IDE sidebar [README]. As of late 2025 this is in private preview — worth watching but not a shipping feature yet.


Pricing: SaaS vs self-hosted math

GitHub Copilot (the primary comparison):

  • Individual: $10/month (one seat)
  • Business: $19/user/month
  • Enterprise: $39/user/month

Cursor:

  • Free tier with limited completions
  • Pro: $20/month per user

Tabby self-hosted:

  • Software: $0 [README]
  • Hardware: your own server with consumer GPU, or a cloud VPS with GPU
  • A GPU-enabled VPS on Hetzner (e.g., CAX41 with shared GPU): approximately $35–60/month depending on tier — but this serves an unlimited number of developers

Team math:

Team sizeCopilot Business (annual)Cursor Pro (annual)Tabby on $40/mo VPS (annual)
1 person$228$240$480
3 people$684$720$480
5 people$1,140$1,200$480
10 people$2,280$2,400$480
20 people$4,560$4,800$480

The crossover point for a GPU VPS sits around 2–3 developers. Past 3 people, Tabby wins on raw cost even with a more expensive VPS. For an on-premises deployment using existing hardware, the VPS cost drops to zero.

Important caveat: If you don’t have GPU hardware and need a dedicated GPU VPS, costs vary widely. On your own existing hardware (a workstation with an RTX 3090, for example), the marginal cost is effectively zero. On a cloud instance, factor in the GPU rental cost and whether it’s shared or dedicated.

Tabby Cloud pricing data was not available in the source material — check the official pricing page directly.


Deployment reality check

The happy path looks like this [2]:

docker run -d \
  --name tabby \
  --gpus all \
  -p 8080:8080 \
  -v $HOME/.tabby:/data \
  registry.tabbyml.com/tabbyml/tabby \
  serve \
  --model StarCoder-1B \
  --chat-model Qwen2-1.5B-Instruct \
  --device cuda

One command, GPU passthrough, two models, done. For someone comfortable with Docker, this is a 15-minute setup [2].

What you actually need:

  • Docker (and nvidia-docker if using a GPU)
  • A GPU with sufficient VRAM for your chosen model. Smaller models (1B–3B parameters) run on 8GB VRAM. Larger coding models (7B+) need 16GB+.
  • A reverse proxy (Caddy or nginx) for HTTPS if you’re exposing this beyond localhost
  • Port forwarding or a VPN if your team needs remote access to the server

What can go sideways:

  • The GIGAZINE early review [1] hit a gotcha immediately: Docker’s relative path in the volume mount specification caused the container to fail. Rewriting to an absolute path fixed it. This is a Docker footgun, not a Tabby-specific issue, but worth noting for non-Docker-native users.
  • The same 2023 review reported ~30GB memory consumption during inference [1]. This was on early model versions — modern smaller models (1B–3B parameters) run comfortably in 8–16GB RAM — but if you’re running a 13B+ parameter model, memory planning matters.
  • Inference latency on CPU is real. The 2023 review reported 10–30 seconds per completion on an i7-6800K with no GPU [1]. GPU is not optional if you want response times that don’t break your flow. A CPU-only deployment is functional for testing but not production use.
  • The license field in the package metadata registers as “NOASSERTION” — meaning the license wasn’t cleanly identified in the packaging data. Before using this in a commercial product or distributing it, verify the actual license terms directly on the GitHub repository.

Realistic time estimates:

  • Experienced Docker user with a GPU machine: 15–30 minutes to a working server
  • Setting up HTTPS with a reverse proxy: add 30–60 minutes
  • Non-technical user following a guide: 2–4 hours for full setup including IDE plugin configuration
  • Deploying for a 10-person team with LDAP: half a day

Pros and cons

Pros

  • True local inference. No code leaves your network. This is the core value proposition and it’s real — all inference, context processing, and model storage happens on your hardware [README][2].
  • Consumer GPU compatible. You don’t need cloud GPU instances or enterprise hardware. An RTX 3090 is sufficient for most 7B-parameter coding models [README][2].
  • IDE coverage. VS Code, IntelliJ, PyCharm, Vim, Neovim — the mainstream editors are all covered [README][2].
  • Model flexibility. Pick your completion model and chat model independently. Switch to a better open-source model when one ships without waiting for a vendor to update [README][2].
  • LDAP + SSO for teams. Enterprise authentication is in the box, not behind a commercial tier [README][merged profile].
  • REST API. Programmatic access to the server — useful for CI integration and custom tooling [README][merged profile].
  • Active release cadence. 30+ major releases with consistent feature additions — GitLab MR indexing, shareable Answer Engine pages, Llamafile integration, LDAP — this is a living project [README].
  • 33K GitHub stars. Enough community investment to expect ongoing development and community support.
  • No per-seat pricing. One server, unlimited developers. The cost structure fundamentally changes at scale [2].

Cons

  • GPU is a hard requirement for production use. CPU inference is too slow for a real workflow — the 2023 review measured 10–30 seconds per completion on an i7-6800K [1]. This isn’t a knock on Tabby specifically, but it’s a real infrastructure dependency.
  • Memory footprint is significant. Larger models need 16GB+ VRAM. For a team needing a high-quality coding model, a consumer GPU may not be sufficient and a server-grade GPU or multi-GPU setup becomes necessary [1].
  • Early completion quality was inconsistent. The 2023 GIGAZINE hands-on found that poorly-contextualized prompts produced logically wrong completions [1]. Completion quality correlates directly with model quality and context — smaller, faster models will trade accuracy for speed.
  • License is unclear from package metadata. The license field registers as “NOASSERTION” — verify directly on the GitHub repository before using in commercial contexts.
  • Setup requires technical competence. Unlike Copilot (which is a VS Code extension with a login), Tabby requires a running server, Docker familiarity, and either local GPU hardware or a GPU-enabled VPS. Not a tool for non-technical founders.
  • Pochi (agent feature) is in private preview. The most compelling agentic capability — implementing GitHub issues as pull requests — isn’t generally available yet [README].
  • No hosted completion catalog. Unlike Copilot, which is maintained by Microsoft and automatically improves, you’re responsible for model selection and updates.

Who should use this / who shouldn’t

Use Tabby if:

  • Your team handles proprietary source code that cannot leave your infrastructure under any circumstances — legal, contractual, or security reasons.
  • You’re running 3+ developers on Copilot or Cursor and the math on a GPU VPS is favorable.
  • You have existing GPU hardware sitting underutilized (a workstation, a local server, a NAS with a GPU) and want to put it to work.
  • You want model flexibility — the ability to switch between StarCoder, Code Llama, Codestral, and new open-source releases as the field evolves.
  • Your company already uses LDAP for identity management and wants that to extend to dev tooling.

Skip it (use Copilot or Cursor instead) if:

  • You’re a solo developer or small team where the per-seat cost is acceptable and you don’t have the infrastructure to run a local server.
  • You don’t have GPU hardware and don’t want to pay for a GPU VPS on top of the tooling.
  • Code privacy isn’t a concern and you’d rather have the fastest, best-maintained model with zero ops overhead.
  • You need the Copilot integration with GitHub pull requests, code review, and issue management — GitHub’s own ecosystem integration is deeper than anything Tabby offers today.

Skip it (use Continue.dev instead) if:

  • You want a similar self-hosted, open-source AI coding assistant but prefer a configuration-file-driven approach with more model provider flexibility (Ollama, OpenAI-compatible endpoints, local API servers) without running a full Tabby server.

Alternatives worth considering

  • GitHub Copilot — the incumbent. Best model quality and GitHub integration, $10–39/user/month, all code goes to Microsoft’s servers. The option Tabby exists to replace.
  • Cursor — AI-first IDE (fork of VS Code) rather than a server-side assistant. $20/month, cloud inference, strong agent capabilities. Not self-hostable, but the best cloud option if privacy isn’t a constraint.
  • Continue.dev — open-source IDE extension that proxies to any LLM backend (Ollama, OpenAI API, Claude API). No dedicated server required — lighter operational footprint than Tabby, but less opinionated and fewer built-in features.
  • Codeium — free cloud-hosted AI coding assistant for individuals, with enterprise self-hosted options. Sits between “free cloud” and “full self-hosted” in the spectrum.
  • FauxPilot — earlier open-source Copilot alternative using Copilot-compatible APIs, allowing existing Copilot IDE plugins to work with a self-hosted backend [2]. Less active development than Tabby.
  • Ollama + Continue.dev — the DIY stack. Run any open-source model via Ollama, connect it to Continue.dev’s VS Code extension. Maximum flexibility, most configuration required.

For a non-technical founder, none of these are the right tool — AI coding assistants require a developer to set up and operate. The real comparison is Tabby vs Copilot for technical teams where privacy or cost at scale is the driver.


Bottom line

Tabby is the serious self-hosted answer to the GitHub Copilot question for privacy-sensitive or cost-conscious engineering teams. The value proposition is simple and defensible: your code stays on your hardware, the per-seat pricing disappears, and you control which models power your completions. The trade-offs are equally honest — you need GPU infrastructure, Docker competence, and willingness to manage a server that Copilot users never think about. The early (2023) completion quality concerns are a historical artifact; the project has shipped 30 major releases since then and added enterprise features (LDAP, SSO, REST API) that signal real production use. For a team of five or more developers currently paying for Copilot Business, the infrastructure cost of a GPU VPS pays for itself in roughly two months. Whether that’s worth the operational overhead depends entirely on your team’s infrastructure comfort — but the math is not ambiguous.


Sources

  1. GIGAZINE“Coding assistance AI ‘tabby’ that can be self-hosted on a local PC and can be used like Github Copilot” (Apr 10, 2023). https://gigazine.net/gsc_news/en/20230410-tabby-self-host-copilot/
  2. VirtualizationHowto“Best Self-hosted GitHub Copilot AI Coding Alternatives” (May 19, 2025). https://www.virtualizationhowto.com/2025/05/best-self-hosted-github-copilot-ai-coding-alternatives/

Primary sources:

Features

Authentication & Access

  • LDAP / Active Directory
  • Single Sign-On (SSO)

Integrations & APIs

  • Plugin / Extension System
  • REST API