github-fecther: Checking new Repos from Github with Go
Go lang is a very interesting language. Github already is the central hub for the open source on the Internet. Companys have lots of repositories and it's very hard to catch up nowadays. So I build a simple silly tool in Go in order to fetch all repositories a company might have and they compare with the previous fetch and see if there are new repositories. Thanks to GO elegance I was able to do that with less than 200 lines of code. In order to do what we need to do, we will use some basic functionality in Go such as IO, Http Calls, JSON support for Structs, Slices, Maps, Error Handling(Which is the sucking part) and a bit of logic. This program can fail since I'm not using a proper TOKEN for github so you will HIT the github Throttling limits pretty quickly if you keep running the program. You can create a proper DEV account and fix the code to use your token or you can schedule the program to run in different times in your crontab. This is not production ready, but it's a way to have fun with go and build something somehow useful :D Let's get started!
github-fetcher: How the Program Works?
First of all the program looks into the DISK to see if there is a JSON file(using the name of the organization you want to check) and if there is that JSON file is loaded to memory. IF there is no JSON file in DISK(First time you run for each organization) is fine.
After Loading the file from DISK if present we will go to the Github API and fetch all repositories from the Organization you pass by the parameter. GIthub organization / repos API is paginated. So I do multiples calls in a loop in order to get reports from all pages. Nowadays is very common to have companies with more than 100 repositories.
After getting the repositories from Github API we will compare the repositories from Disk(previous run) with repositories from the site. This will give us a DIFF - which will be new repos or deleted ones. Then the JSON is updated in DISK with current run.
Getting the Code and Running
Download the main.go file them we can run in 2 ways. We can do: go run main.go facebook or we can build a binary file by doing: go build and then we can run with: ./github-fetcher facebook. When you run it you will see an output like this:
The Go Code
So let's go through the code. First of all, we are importing the libs we need for this code. After imports, we are creating a Struct called Repo. Here we are using an interesting Go lang feature which allows us Map JSON to Structs and vice versa. Github API has many attributes but I just care about the repository full_name so that's why there is just 1 field there.
There is a function called extractRepos which receives the pagination and the organization name. This function returns 2 things: A slice(which is like an Array but not) of Repo and error if happens. This is how we do error handler in go - since there are no exceptions, every function needs to return 2 things. I do the HTTP call and parse the result. You can see there is a json.Unmarshal which receives the http body content and a pointer reference to an slice of repos called &repos. So &repos means we are pasing repos by reference not by value. In the previous line, you might realize we are using the make function - that's there in order to create an Array.
The next function is getAllRepos which will call extractRepos with different pagination until we receive an error - this is how I know how many pages are there. You might realize when I call extractRepos I have repos, _ this means repos will be the array of repos and _ is the error, since I won't ignore that I use underline. The current repo is appended to the array of repos - this is done by using the function append where we pass the 2 arrays we want append and this results in a 3rd array.
Next function is persistInDisk, here we receive a path(which is a string) so this is the location where we want to persist in the disk and receives an []Repo - this is an array of Repos. Here we are using json.Marshal and passing the array of Repos in order to transfer our array of Struct Repo in JSON string. Them we use io.copy yo copy to the file in the disk and persist it.
Next function is loadFromDisk which also receives a path but now there is *[]Repo which is a pointer to Repo array. We need that because we will load a value by reference. We will read the content from the file and decode it to JSON and send back to the array struct.
Next function is the diff one. Here we receive 2 slices(arrays) which will be the array from DISK and the array from github call. In order to get the difference, we will do the following algorithm. First, we will loop throw the first array and add all items on a first slice(array from disk) to a map which the key will be the repo name and we will assign a counter - for this loop will be 1 to all keys. Them we will be doing the same with the other slice(from github api call) however now we will get the value from ma ap if exit and add 1. If we got a duplicated key the value will be 2 otherwise 1. Finally, we loop throu our map and find keys where the counter is 1 so this means they are unique and thats waht we want this is the diff. Right now the algorithm doesn't make difference between new repos and deleted ones this could be pretty easy to do just by checking the source of the number or by using the different number when is the second array.
Finally the main function. Here we orchestrate the main flow described previously in this post. We are getting the organization name by parameter doing os.Args so we get from command line arguments. We call other functions and if there are errors I dont proceed.
Thats it!
Cheers,
Diego Pacheco
github-fetcher: How the Program Works?
First of all the program looks into the DISK to see if there is a JSON file(using the name of the organization you want to check) and if there is that JSON file is loaded to memory. IF there is no JSON file in DISK(First time you run for each organization) is fine.
After Loading the file from DISK if present we will go to the Github API and fetch all repositories from the Organization you pass by the parameter. GIthub organization / repos API is paginated. So I do multiples calls in a loop in order to get reports from all pages. Nowadays is very common to have companies with more than 100 repositories.
After getting the repositories from Github API we will compare the repositories from Disk(previous run) with repositories from the site. This will give us a DIFF - which will be new repos or deleted ones. Then the JSON is updated in DISK with current run.
Getting the Code and Running
Download the main.go file them we can run in 2 ways. We can do: go run main.go facebook or we can build a binary file by doing: go build and then we can run with: ./github-fetcher facebook. When you run it you will see an output like this:
go build main.go
./github-fetcher facebook
$ github-fetcher facebook
1. Loading previous JSON from disk:
JSON from disk:
facebook/codemod
facebook/hhvm
facebook/pyre2
facebook/open-graph-protocol
facebook/facebook-android-sdk
facebook/facebook-ios-sdk
facebook/pfff
facebook/php-webdriver
facebook/SocketRocket
facebook/folly
facebook/jcommon
facebook/sparts
facebook/facebook-oss-pom
facebook/caf8teen
facebook/nailgun
facebook/watchman
facebook/rocksdb
facebook/FBMock
facebook/libphenom
facebook/chef-utils
facebook/mysql-5.6
facebook/buck
facebook/xctool
facebook/emitter
facebook/hblog
facebook/react
facebook/fbthrift
facebook/fishhook
facebook/glusterfs
facebook/react-devtools
facebook/pyaib
facebook/regenerator
facebook/rebound
facebook/treadmill
facebook/jest
facebook/puewue-backend
facebook/puewue-frontend
facebook/mcrouter
facebook/conceal
facebook/planout
facebook/bistro
facebook/liblogfaf
facebook/Shimmer
facebook/chisel
facebook/facebook-clang-plugins
facebook/KVOController
facebook/Specs
facebook/Tweaks
facebook/IT-CPE
facebook/augmented-traffic-control
facebook/pop
facebook/Haxl
facebook/yoga
facebook/php-graph-sdk
facebook/proguard
facebook/pose-aligned-deep-networks
facebook/fb-adb
facebook/tac_plus
facebook/facebook-php-business-sdk
facebook/thpp
facebook/immutable-js
facebook/flux
facebook/wdt
facebook/rebound-js
facebook/dfuse
facebook/osquery
facebook/jsx
facebook/facebook-python-business-sdk
facebook/grocery-delivery
facebook/taste-tester
facebook/between-meals
facebook/fbpca
facebook/fatal
facebook/proxygen
facebook/ds2
facebook/flow
facebook/wangle
facebook/fbcuda
facebook/react-native
facebook/stetho
facebook/zstd
facebook/infer
facebook/mysqlclient-python
facebook/libafdt
facebook/FBFetchedResultsController
facebook/ThreatExchange
facebook/facebook-ruby-business-sdk
facebook/shimmer-android
facebook/nuclide
facebook/fresco
facebook/device-year-class
facebook/jscodeshift
facebook/openbmc
facebook/fboss
facebook/network-connection-class
facebook/componentkit
facebook/Stack-RNN
facebook/C3D
facebook/react-native-applinks
facebook/PathPicker
facebook/gnlpy
facebook/fbjs
facebook/eyescream
facebook/FLAnimatedImage
facebook/fbpush
facebook/graphql
facebook/android-jsc
facebook/react-native-fbsdk
facebook/Recipes-for-AutoPkg
facebook/relay
facebook/fbkutils
facebook/screenshot-tests-for-android
facebook/facebook-sdk-for-unity
facebook/WebDriverAgent
facebook/FBSimulatorControl
facebook/ocpjbod
facebook/dataloader
facebook/homebrew-fb
facebook/bAbI-tasks
facebook/MemNN
facebook/xcbuild
facebook/Conditional-character-based-RNN
facebook/robolectric
facebook/SoLoader
facebook/prepack
facebook/reason
facebook/fb-caffe-exts
facebook/facebook-java-business-sdk
facebook/learningSimpleAlgorithms
facebook/MazeBase
facebook/chef-cookbooks
facebook/transform360
facebook/fb.resnet.torch
facebook/UdpPinger
facebook/fbtracert
facebook/draft-js
facebook/fb-util-for-appx
facebook/fbtftp
facebook/UETorch
facebook/fbctf
facebook/fbshipit
facebook/BridgeIC
facebook/facebook-instant-articles-sdk-php
facebook/redex
facebook/makeitopen
facebook/FBRetainCycleDetector
facebook/FBAllocationTracker
facebook/FBMemoryProfiler
facebook/FBNotifications
facebook/remodel
facebook/Surround360
facebook/facebook-sdk-swift
facebook/create-react-app
facebook/TextLayoutBuilder
facebook/prophet
facebook/react-360
facebook/metro
facebook/Carmel-Starter-Kit
facebook/DelegatedRecoverySpecification
facebook/litho
facebook/fbzmq
facebook/duckling
facebook/facebook-instant-articles-sdk-extensions-in-php
facebook/DelegatedRecoveryReferenceImplementation
facebook/360-Capture-SDK
facebook/prop-types
facebook/mysql-8.0
facebook/Docusaurus
facebook/facebook-nodejs-business-sdk
facebook/openr
facebook/react-native-website
facebook/pyre-check
facebook/instant-articles-builder
facebook/FAI-PEP
facebook/Sonar
2. Fetching all repos for: facebook
facebook/codemod
facebook/hhvm
facebook/pyre2
facebook/open-graph-protocol
facebook/facebook-android-sdk
facebook/facebook-ios-sdk
facebook/pfff
facebook/php-webdriver
facebook/SocketRocket
facebook/folly
facebook/jcommon
facebook/sparts
facebook/facebook-oss-pom
facebook/caf8teen
facebook/nailgun
facebook/watchman
facebook/rocksdb
facebook/FBMock
facebook/libphenom
facebook/chef-utils
facebook/mysql-5.6
facebook/buck
facebook/xctool
facebook/emitter
facebook/hblog
facebook/react
facebook/fbthrift
facebook/fishhook
facebook/glusterfs
facebook/react-devtools
facebook/pyaib
facebook/regenerator
facebook/rebound
facebook/treadmill
facebook/jest
facebook/puewue-backend
facebook/puewue-frontend
facebook/mcrouter
facebook/conceal
facebook/planout
facebook/bistro
facebook/liblogfaf
facebook/Shimmer
facebook/chisel
facebook/facebook-clang-plugins
facebook/KVOController
facebook/Specs
facebook/Tweaks
facebook/IT-CPE
facebook/augmented-traffic-control
facebook/pop
facebook/Haxl
facebook/yoga
facebook/php-graph-sdk
facebook/proguard
facebook/pose-aligned-deep-networks
facebook/fb-adb
facebook/tac_plus
facebook/facebook-php-business-sdk
facebook/thpp
facebook/immutable-js
facebook/flux
facebook/wdt
facebook/rebound-js
facebook/dfuse
facebook/osquery
facebook/jsx
facebook/facebook-python-business-sdk
facebook/grocery-delivery
facebook/taste-tester
facebook/between-meals
facebook/fbpca
facebook/fatal
facebook/proxygen
facebook/ds2
facebook/flow
facebook/wangle
facebook/fbcuda
facebook/react-native
facebook/stetho
facebook/zstd
facebook/infer
facebook/mysqlclient-python
facebook/libafdt
facebook/FBFetchedResultsController
facebook/ThreatExchange
facebook/facebook-ruby-business-sdk
facebook/shimmer-android
facebook/nuclide
facebook/fresco
facebook/device-year-class
facebook/jscodeshift
facebook/openbmc
facebook/fboss
facebook/network-connection-class
facebook/componentkit
facebook/Stack-RNN
facebook/C3D
facebook/react-native-applinks
facebook/PathPicker
facebook/gnlpy
facebook/fbjs
facebook/eyescream
facebook/FLAnimatedImage
facebook/fbpush
facebook/graphql
facebook/android-jsc
facebook/react-native-fbsdk
facebook/Recipes-for-AutoPkg
facebook/relay
facebook/fbkutils
facebook/screenshot-tests-for-android
facebook/facebook-sdk-for-unity
facebook/WebDriverAgent
facebook/FBSimulatorControl
facebook/ocpjbod
facebook/dataloader
facebook/homebrew-fb
facebook/bAbI-tasks
facebook/MemNN
facebook/xcbuild
facebook/Conditional-character-based-RNN
facebook/robolectric
facebook/SoLoader
facebook/prepack
facebook/reason
facebook/fb-caffe-exts
facebook/facebook-java-business-sdk
facebook/learningSimpleAlgorithms
facebook/MazeBase
facebook/chef-cookbooks
facebook/transform360
facebook/fb.resnet.torch
facebook/UdpPinger
facebook/fbtracert
facebook/draft-js
facebook/fb-util-for-appx
facebook/fbtftp
facebook/UETorch
facebook/fbctf
facebook/fbshipit
facebook/BridgeIC
facebook/facebook-instant-articles-sdk-php
facebook/redex
facebook/makeitopen
facebook/FBRetainCycleDetector
facebook/FBAllocationTracker
facebook/FBMemoryProfiler
facebook/FBNotifications
facebook/remodel
facebook/Surround360
facebook/facebook-sdk-swift
facebook/create-react-app
facebook/TextLayoutBuilder
facebook/prophet
facebook/react-360
facebook/metro
facebook/Carmel-Starter-Kit
facebook/DelegatedRecoverySpecification
facebook/litho
facebook/fbzmq
facebook/duckling
facebook/facebook-instant-articles-sdk-extensions-in-php
facebook/DelegatedRecoveryReferenceImplementation
facebook/360-Capture-SDK
facebook/prop-types
facebook/mysql-8.0
facebook/Docusaurus
facebook/facebook-nodejs-business-sdk
facebook/openr
facebook/react-native-website
facebook/pyre-check
facebook/instant-articles-builder
facebook/FAI-PEP
facebook/Sonar
Repos from DISK : 175
Repos from Github: 175
3. **** NEW REPOS ****
[]
JSON persisted in disk
The Go Code
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package main | |
import ( | |
"bytes" | |
"encoding/json" | |
"fmt" | |
"io" | |
"io/ioutil" | |
"net/http" | |
"os" | |
"strconv" | |
"time" | |
) | |
// Repo represents github repository from org | |
type Repo struct { | |
Name string `json:"full_name"` | |
} | |
func extractRepos(page int, orgname string) ([]Repo, error) { | |
url := "https://api.github.com/orgs/" + orgname + "/repos?page=" + strconv.Itoa(page) | |
client := http.Client{ | |
Timeout: time.Second * 2, | |
} | |
req, err := http.NewRequest(http.MethodGet, url, nil) | |
if err != nil { | |
fmt.Println("Error 1") | |
return nil, err | |
} | |
res, getErr := client.Do(req) | |
if getErr != nil { | |
fmt.Println("Error 2") | |
return nil, getErr | |
} | |
body, readErr := ioutil.ReadAll(res.Body) | |
if readErr != nil { | |
fmt.Println("Error 3") | |
return nil, readErr | |
} | |
repos := make([]Repo, 0) | |
jsonErr := json.Unmarshal(body, &repos) | |
if jsonErr != nil { | |
fmt.Println("Error 4 - Often this means we HIT rate limit of Github API") | |
return nil, jsonErr | |
} | |
return repos, nil | |
} | |
func getAllRepos(orgname string) []Repo { | |
var allRepos []Repo | |
pagination := 1 | |
for true { | |
repos, _ := extractRepos(pagination, orgname) | |
if len(repos) == 0 { | |
break | |
} | |
for _, r := range repos { | |
allRepos = append(allRepos, r) | |
} | |
pagination = pagination + 1 | |
} | |
return allRepos | |
} | |
func persistInDisk(path string, v []Repo) error { | |
if v == nil || len(v) == 0 { | |
return nil | |
} | |
f, err := os.Create(path) | |
if err != nil { | |
return err | |
} | |
binary, err := json.Marshal(v) | |
if err != nil { | |
return err | |
} | |
_, err = io.Copy(f, bytes.NewReader(binary)) | |
f.Close() | |
return err | |
} | |
func loadFromDisk(path string, v *[]Repo) error { | |
f, err := os.Open(path) | |
if err != nil { | |
return err | |
} | |
json.NewDecoder(f).Decode(&v) | |
f.Close() | |
return nil | |
} | |
func diff(slice1 []Repo, slice2 []Repo) []string { | |
diffRepo := make([]string, 0) | |
m := map[string]int{} | |
for _, s1Val := range slice1 { | |
m[s1Val.Name] = 1 | |
} | |
for _, s2Val := range slice2 { | |
m[s2Val.Name] = m[s2Val.Name] + 1 | |
} | |
for mKey, mVal := range m { | |
if mVal == 1 { | |
diffRepo = append(diffRepo, mKey) | |
} | |
} | |
return diffRepo | |
} | |
func main() { | |
args := os.Args | |
if len(args) <= 1 { | |
fmt.Println(" _____ _ _ _ _ ______ _ _ ") | |
fmt.Println("| __ (_) | | | | | | ___| | | | | ") | |
fmt.Println("| | \\/_| |_| |__ _ _| |__ | |_ ___ ___| |_| |__ ___ _ __ ") | |
fmt.Println("| | __| | __| '_ \\| | | | '_ \\ | _/ _ \\/ __| __| '_ \\ / _ \\ '__|") | |
fmt.Println("| |_\\ \\ | |_| | | | |_| | |_) | | || __/ (__| |_| | | | __/ | ") | |
fmt.Println(" \\____/_|\\__|_| |_|\\__,_|_.__/ \\_| \\___|\\___|\\__|_| |_|\\___|_| ") | |
fmt.Println(" ") | |
fmt.Println("github-fecther: Fetch all repos from a organization and tell you the new repos! ") | |
fmt.Println("USAGE : ./github-fecther Facebook") | |
fmt.Println("BY : Diego Pacheco - 2018") | |
return | |
} | |
fmt.Println("1. Loading previous JSON from disk: ") | |
allReposFromDisk := make([]Repo, 0) | |
if loadFromDisk("/home/diego/github.fetcher/"+args[1]+".json", &allReposFromDisk) != nil { | |
fmt.Print("No Previous JSON in DISK.\n") | |
} else { | |
fmt.Println("JSON from disk:") | |
for _, o := range allReposFromDisk { | |
fmt.Println(o.Name) | |
} | |
} | |
fmt.Println("2. Fetching all repos for: " + args[1]) | |
allRepos := getAllRepos(args[1]) | |
for _, o := range allRepos { | |
fmt.Println(o.Name) | |
} | |
if allRepos == nil || len(allRepos) == 0 { | |
fmt.Println("There are no repos on the WEB we will not proceed! ") | |
} else { | |
fmt.Print("Repos from DISK : ") | |
fmt.Println(len(allReposFromDisk)) | |
fmt.Print("Repos from Github: ") | |
fmt.Println(len(allRepos)) | |
diffRepo := diff(allReposFromDisk, allRepos) | |
fmt.Println("\n\n3. **** NEW REPOS **** ") | |
fmt.Println(diffRepo) | |
if persistInDisk("/home/diego/github.fetcher/"+args[1]+".json", allRepos) != nil { | |
fmt.Println("JSON NOT persisted in disk") | |
} else { | |
fmt.Println("JSON persisted in disk") | |
} | |
} | |
} |
So let's go through the code. First of all, we are importing the libs we need for this code. After imports, we are creating a Struct called Repo. Here we are using an interesting Go lang feature which allows us Map JSON to Structs and vice versa. Github API has many attributes but I just care about the repository full_name so that's why there is just 1 field there.
There is a function called extractRepos which receives the pagination and the organization name. This function returns 2 things: A slice(which is like an Array but not) of Repo and error if happens. This is how we do error handler in go - since there are no exceptions, every function needs to return 2 things. I do the HTTP call and parse the result. You can see there is a json.Unmarshal which receives the http body content and a pointer reference to an slice of repos called &repos. So &repos means we are pasing repos by reference not by value. In the previous line, you might realize we are using the make function - that's there in order to create an Array.
The next function is getAllRepos which will call extractRepos with different pagination until we receive an error - this is how I know how many pages are there. You might realize when I call extractRepos I have repos, _ this means repos will be the array of repos and _ is the error, since I won't ignore that I use underline. The current repo is appended to the array of repos - this is done by using the function append where we pass the 2 arrays we want append and this results in a 3rd array.
Next function is persistInDisk, here we receive a path(which is a string) so this is the location where we want to persist in the disk and receives an []Repo - this is an array of Repos. Here we are using json.Marshal and passing the array of Repos in order to transfer our array of Struct Repo in JSON string. Them we use io.copy yo copy to the file in the disk and persist it.
Next function is loadFromDisk which also receives a path but now there is *[]Repo which is a pointer to Repo array. We need that because we will load a value by reference. We will read the content from the file and decode it to JSON and send back to the array struct.
Next function is the diff one. Here we receive 2 slices(arrays) which will be the array from DISK and the array from github call. In order to get the difference, we will do the following algorithm. First, we will loop throw the first array and add all items on a first slice(array from disk) to a map which the key will be the repo name and we will assign a counter - for this loop will be 1 to all keys. Them we will be doing the same with the other slice(from github api call) however now we will get the value from ma ap if exit and add 1. If we got a duplicated key the value will be 2 otherwise 1. Finally, we loop throu our map and find keys where the counter is 1 so this means they are unique and thats waht we want this is the diff. Right now the algorithm doesn't make difference between new repos and deleted ones this could be pretty easy to do just by checking the source of the number or by using the different number when is the second array.
Finally the main function. Here we orchestrate the main flow described previously in this post. We are getting the organization name by parameter doing os.Args so we get from command line arguments. We call other functions and if there are errors I dont proceed.
Thats it!
Cheers,
Diego Pacheco