github-fecther: Checking new Repos from Github with Go

Go lang is a very interesting language. Github already is the central hub for the open source on the Internet. Companys have lots of repositories and it's very hard to catch up nowadays. So I build a simple silly tool in Go in order to fetch all repositories a company might have and they compare with the previous fetch and see if there are new repositories.  Thanks to GO elegance I was able to do that with less than 200 lines of code. In order to do what we need to do, we will use some basic functionality in Go such as IO, Http Calls, JSON support for Structs, Slices, Maps, Error Handling(Which is the sucking part) and a bit of logic. This program can fail since I'm not using a proper TOKEN for github so you will HIT the github Throttling limits pretty quickly if you keep running the program. You can create a proper DEV account and fix the code to use your token or you can schedule the program to run in different times in your crontab. This is not production ready, but it's a way to have fun with go and build something somehow useful :D Let's get started!


github-fetcher: How the Program Works?

First of all the program looks into the DISK to see if there is a JSON file(using the name of the organization you want to check) and if there is that JSON file is loaded to memory. IF there is no JSON file in DISK(First time you run for each organization) is fine.

After Loading the file from DISK if present we will go to the Github API and fetch all repositories from the Organization you pass by the parameter. GIthub organization / repos API is paginated. So I do multiples calls in a loop in order to get reports from all pages. Nowadays is very common to have companies with more than 100 repositories.

After getting the repositories from Github API we will compare the repositories from Disk(previous run) with repositories from the site. This will give us a DIFF - which will be new repos or deleted ones. Then the JSON is updated in DISK with current run.

Getting the Code and Running 

Download the main.go file them we can run in 2 ways. We can do: go run main.go facebook or we can build a binary file by doing: go build and then we can run with: ./github-fetcher facebook. When you run it you will see an output like this:



The Go Code



So let's go through the code. First of all, we are importing the libs we need for this code. After imports, we are creating a Struct called Repo. Here we are using an interesting Go lang feature which allows us Map JSON to Structs and vice versa. Github API has many attributes but I just care about the repository full_name so that's why there is just 1 field there.

There is a function called extractRepos which receives the pagination and the organization name. This function returns 2 things: A slice(which is like an Array but not) of Repo and error if happens. This is how we do error handler in go - since there are no exceptions, every function needs to return 2 things.  I do the HTTP call and parse the result. You can see there is a json.Unmarshal which receives the http body content and a pointer reference to an slice of repos called &repos. So &repos means we are pasing repos by reference not by value. In the previous line, you might realize we are using the make function - that's there in order to create an Array.

The next function is getAllRepos which will call extractRepos with different pagination until we receive an error - this is how I know how many pages are there. You might realize when I call extractRepos I have repos, _ this means repos will be the array of repos and _ is the error, since I won't ignore that I use underline. The current repo is appended to the array of repos - this is done by using the function append where we pass the 2 arrays we want append and this results in a 3rd array.

Next function is persistInDisk, here we receive a path(which is a string) so this is the location where we want to persist in the disk and receives an []Repo - this is an array of Repos. Here we are using json.Marshal and passing the array of Repos in order to transfer our array of Struct Repo in JSON string. Them we use io.copy yo copy to the file in the disk and persist it.

Next function is loadFromDisk which also receives a path but now there is *[]Repo which is a pointer to Repo array. We need that because we will load a value by reference. We will read the content from the file and decode it to JSON and send back to the array struct.

Next function is the diff one. Here we receive 2 slices(arrays) which will be the array from DISK and the array from github call. In order to get the difference, we will do the following algorithm. First, we will loop throw the first array and add all items on a first slice(array from disk) to a map which the key will be the repo name and we will assign a counter - for this loop will be 1 to all keys. Them we will be doing the same with the other slice(from github api call) however now we will get the value from ma ap if exit and add 1. If we got a duplicated key the value will be 2 otherwise 1. Finally, we loop throu our map and find keys where the counter is 1 so this means they are unique and thats waht we want this is the diff. Right now the algorithm doesn't make difference between new repos and deleted ones this could be pretty easy to do just by checking the source of the number or by using the different number when is the second array.

Finally the main function. Here we orchestrate the main flow described previously in this post. We are getting the organization name by parameter doing os.Args so we get from command line arguments. We call other functions and if there are errors I dont proceed.

Thats it!

Cheers,
Diego Pacheco

Popular posts from this blog

Kafka Streams with Java 15

Rust and Java Interoperability

HMAC in Java