Time Window: Analyzes the last N days (default 365, user-configurable 1-365)
Data Sources:
- Recent commits within the analysis window
- Recent issues and pull requests within the analysis window
- Review comments and participants
Bot Detection: Filters out automated accounts using multiple criteria:
- Username contains: "bot", "automated", "auto", "ci", "cd", "deploy", "build", "github-actions", "dependabot", "renovate", "codecov", "travis", "jenkins", etc.
- Display name contains similar bot indicators
- Email patterns suggesting automation
Result: Only human contributors are included in the analysis
For each human contributor, the system tracks:
- Commits: Number of commits authored
- Issues Created: Issues opened by the contributor
- PRs Created: Pull requests opened by the contributor
- Reviews Given: Code reviews provided to other PRs
- Comments Made: Estimated comments in issues/PRs (distributed among participants)
Each contributor's email is classified as:
- Company: Known corporate domains (microsoft.com, google.com, etc.)
- Personal: Gmail, Yahoo, Outlook, etc.
- Academic: .edu, .ac.uk, .edu.au domains
- Custom: User-provided company domains to filter
- "no email available": When email is missing or invalid
Quarterly Breakdown: Divides analysis window into 4 quarters
Trend Calculation: Compares recent half (Q2+Q3) vs older half (Q0+Q1)
Trend Categories:
- Increasing: Recent activity > 1.5x older activity
- Decreasing: Recent activity < 0.67x older activity
- Stable: Activity levels roughly equal
- Insufficient Data: Less than 10 total activities
Eligibility: Only contributors with 10+ activities get sentiment analysis
Data Sources: Commit messages, issue comments, PR comments, review comments
Output: Average polarity, subjectivity, and sentiment label (positive/negative/neutral)
Each contributor entry includes all tracked metrics and classifications.
Sorting: Contributors sorted by total activity (highest first)
Concentration Risk: Calculated based on top contributor's percentage of total activity
Activity Distribution: Shows how activity is distributed among top 1, 3, and 5 contributors
This comprehensive approach provides a detailed view of who's actively contributing to a repository, their engagement patterns, communication sentiment, and potential organizational affiliation based on email domains.