This project is a GitHub scraper that uses Puppeteer to extract information about GitHub organizations/users and their repositories.
It collects data such as organization/user details, top languages, and repository information.
Clone the repository:
git clone cd github-scraper
Install the dependencies:
npm install
Set the
environment variable to the GitHub organization/user you want to scrape:➜ github-scraper git:(main) env PERMALINK=ranbot-ai WITH_REPOS=false npx ts-node src/index.ts // Organization Info with Repos: { name: 'RanBOT Lab', picImageURL: '', description: 'RanBOT uses AI/ML to transform web content into structured data.', topLanguages: [ 'TypeScript', 'JavaScript', 'CSS' ], followers: 4, peopleCount: 1, website: '', location: 'China', socialLinks: [ '', '', '' ] }
➜ github-scraper git:(main) env PERMALINK=encoreshao WITH_REPOS=false npx ts-node src/index.ts // User Info with Repos: { name: 'Encore Shao', nickname: 'encoreshao', picImageURL: '', followers: 26, following: 35, website: '', location: 'Shanghai, China', currentCompany: 'Ekohe', position: 'Engineer Manager | Researcher', organizations: [ { name: 'ekohe', link: '/ekohe', orgImageURL: '' }, { name: 'ranbot-ai', link: '/ranbot-ai', orgImageURL: '' }, { name: '', link: '/encoreshao?tab=overview&org=ranbot-ai', orgImageURL: '' }, { name: '', link: '/encoreshao?tab=overview&org=ekohe', orgImageURL: '' }, { name: '', link: '/encoreshao?tab=overview&org=linktr-ai', orgImageURL: '' } ] }
- Extracts organization/user information including name, description, top languages, employee count, website, and social links.
- Scrapes repository data such as name, link, description, stars, forks, and pull requests.
- Handles pagination to scrape multiple pages of repositories.
Contributions are welcome! Please fork the repository and submit a pull request for any improvements or bug fixes.
This project is licensed under the MIT License.