GitHub logo

GitHub

Overview

GitHub uses a Personal Access Token (PAT) for API authentication. Both classic and fine-grained PATs work — fine-grained is recommended because it lets you scope the token to specific orgs and repos. Tokens are free on every GitHub plan; rate limits and which org-level data you can read depend on your account type and (for fine-grained) on the resource owner's approval.

Setup guide

Create the token

  1. Sign in at github.com.
  2. Open Settings → Developer settings → Personal access tokens and pick either Fine-grained tokens (recommended) or Tokens (classic).
  3. Click Generate new token, give it a name like ingest, and set an expiration that fits your org's policy.
  4. Apply scopes:
    • Fine-grained: under Repository permissions grant Contents: Read-only, Metadata: Read-only, Issues: Read-only, Pull requests: Read-only, Actions: Read-only, Deployments: Read-only, Environments: Read-only. For org-level data also set Organization permissions → Members: Read-only, and choose the org as the Resource owner.
    • Classic: select repo (covers all repo read), read:org (orgs and members), and read:user (authenticated user).
  5. Generate and copy the token immediately. GitHub only shows the value once; tokens start with github_pat_ (fine-grained) or ghp_ (classic).

Add it to Ingest

In the Ingest UI under Connectors → GitHub, paste the token. Ingest stores it in AWS Secrets Manager under the key token.

Mind the limits

The Ingest runtime dispatches GitHub requests at 1 req/sec by default — well under the 5,000/hour primary cap — and uses AIMD backoff on 429s. Status 401 is treated as fatal so the request stops immediately if the token is invalid or revoked; 429 and 500/502/503/504 retry with exponential backoff up to five times. Watch the secondary "100 concurrent requests" and "900 points/minute" limits if you enable many fan-out endpoints (per-PR reviews/files/commits, per-repo workflow runs) across a large repo set.

Pick endpoints

Start with user_repos — every per-repo endpoint downstream fans out from the (owner, repo) pairs it produces. From there:

  • user, user_orgs, org_members — identity and org membership (org endpoints need the PAT authorized as a resource owner of that org)
  • repo_details, repo_languages, repo_topics, repo_branches, repo_contributors — slow-changing repo metadata
  • repo_issues, repo_pulls, repo_issue_comments, repo_issue_events, repo_pull_review_comments, repo_labels, repo_milestones — issues and pull-request collaboration
  • repo_pull_reviews, repo_pull_files, repo_pull_commits — per-PR review history (one request per pull request, so volume scales with PR count)
  • repo_commits, repo_releases, repo_tags, repo_stargazers, repo_forks — version-control and adoption signals
  • repo_workflows, repo_workflow_runs, repo_deployments, repo_environments — Actions and deployment history

Supported streams

28 endpoints are available out of the box. Each endpoint syncs into its own Iceberg table in Snowflake.

EndpointDescriptionReference
org_members
org_members
repo_branches
repo_branches
repo_commits
repo_commits
repo_contributors
repo_contributors
repo_deployments
repo_deployments
repo_details
repo_details
repo_environments
repo_environments
repo_forks
repo_forks
repo_issue_comments
repo_issue_comments
repo_issue_events
repo_issue_events
repo_issues
repo_issues
repo_labels
repo_labels
repo_languages
repo_languages
repo_milestones
repo_milestones
repo_pull_commits
repo_pull_commits
repo_pull_files
repo_pull_files
repo_pull_review_comments
repo_pull_review_comments
repo_pull_reviews
repo_pull_reviews
repo_pulls
repo_pulls
repo_releases
repo_releases
repo_stargazers
repo_stargazers
repo_tags
repo_tags
repo_topics
repo_topics
repo_workflow_runs
repo_workflow_runs
repo_workflows
repo_workflows
user
user
user_orgs
user_orgs
user_repos
user_repos

Authentication

Auth type
Bearer Token
Sent as header
Authorization
Provider docs
docs.github.com

Performance & limits

Rate limit
5,000 req/hour per authenticated PAT (~1.4 req/sec sustained). Secondary limits cap concurrent requests at 100 and content-creation at 80 req/min, but read endpoints rarely brush them. Link-header pagination at up to 100 items per page.
Automatic backoff
Ingest throttles requests to the published rate limit and retries with exponential backoff on transient errors. You don't need to handle 429s, retries, or pagination yourself.

Resources