GitHub
This page contains the setup guide and reference information for the GitHub source connector.
Prerequisites
- List of GitHub Repositories (and access for them in case they are private)
For Airbyte Cloud:
- OAuth
- Personal Access Token (see Permissions and scopes)
For Airbyte Open Source:
- Personal Access Token (see Permissions and scopes)
Setup guide
Step 1: Set up GitHub
Create a GitHub Account.
Airbyte Open Source additional setup steps
Log into GitHub and then generate a personal access token. To load balance your API quota consumption across multiple API tokens, input multiple tokens separated with ,
.
Step 2: Set up the GitHub connector in Airbyte
For Airbyte Cloud:
- Log into your Airbyte Cloud account.
- In the left navigation bar, click Sources.
- On the source selection page, select GitHub from the list of Sources.
- Add a name for your GitHub connector.
- To authenticate:
-
For Airbyte Cloud: Authenticate your GitHub account to authorize your GitHub account. Airbyte will authenticate the GitHub account you are already logged in to. Please make sure you are logged into the right account.
-
For Airbyte Open Source: Authenticate with Personal Access Token. To generate a personal access token, log into GitHub and then generate a personal access token. Enter your GitHub personal access token. To load balance your API quota consumption across multiple API tokens, input multiple tokens separated with
,
.
- GitHub Repositories - Enter a list of GitHub organizations/repositories, e.g.
airbytehq/airbyte
for single repository,airbytehq/airbyte airbytehq/another-repo
for multiple repositories. If you want to specify the organization to receive data from all its repositories, then you should specify it according to the following example:airbytehq/*
.
Repositories with the wrong name or repositories that do not exist or have the wrong name format will be skipped with WARN
message in the logs.
- Start date (Optional) - The date from which you'd like to replicate data for streams. For streams which support this configuration, only data generated on or after the start date will be replicated.
-
These streams will only sync records generated on or after the Start Date:
comments
,commit_comment_reactions
,commit_comments
,commits
,deployments
,events
,issue_comment_reactions
,issue_events
,issue_milestones
,issue_reactions
,issues
,project_cards
,project_columns
,projects
,pull_request_comment_reactions
,pull_requests
,pull_requeststats
,releases
,review_comments
,reviews
,stargazers
,workflow_runs
,workflows
. -
The Start Date does not apply to the streams below and all data will be synced for these streams:
assignees
,branches
,collaborators
,issue_labels
,organizations
,pull_request_commits
,pull_request_stats
,repositories
,tags
,teams
,users
- Branch (Optional) - List of GitHub repository branches to pull commits from, e.g.
airbytehq/airbyte/master
. If no branches are specified for a repository, the default branch will be pulled. (e.g.airbytehq/airbyte/master airbytehq/airbyte/my-branch
).
Supported sync modes
The GitHub source connector supports the following sync modes:
- Full Refresh - Overwrite
- Full Refresh - Append
- Incremental Sync - Append
- Incremental Sync - Append + Deduped
Supported Streams
This connector outputs the following full refresh streams:
- Assignees
- Branches
- Contributor Activity
- Collaborators
- Issue labels
- Organizations
- Pull request commits
- Tags
- TeamMembers
- TeamMemberships
- Teams
- Users
- Issue timeline events
This connector outputs the following incremental streams:
- Comments
- Commit comment reactions
- Commit comments
- Commits
- Deployments
- Events
- Issue comment reactions
- Issue events
- Issue milestones
- Issue reactions
- Issues
- Project (Classic) cards
- Project (Classic) columns
- Projects (Classic)
- ProjectsV2
- Pull request comment reactions
- Pull request stats
- Pull requests
- Releases
- Repositories
- Review comments
- Reviews
- Stargazers
- WorkflowJobs
- WorkflowRuns
- Workflows
Notes
-
Only 4 streams (
comments
,commits
,issues
andreview comments
) from the listed above streams are pure incremental meaning that they:- read only new records;
- output only new records.
-
Streams
workflow_runs
andworflow_jobs
is almost pure incremental: -
Other 19 incremental streams are also incremental but with one difference, they:
- read all records;
- output only new records. Please, consider this behaviour when using those 19 incremental streams because it may affect you API call limits.
-
Sometimes for large streams specifying very distant
start_date
in the past may result in keep on getting error from GitHub instead of records (respectiveWARN
log message will be outputted). In this case Specifying more recentstart_date
may help. The "Start date" configuration option does not apply to the streams below, because the GitHub API does not include dates which can be used for filtering:
assignees
branches
collaborators
issue_labels
organizations
pull_request_commits
pull_request_stats
repositories
tags
teams
users