Hydro Labs experiment time! Over the weekend I was working on a little side project. I wanted to find a way to visualize all of the code repositories for a decentralized project over a period of time. This experiment idea came directly from community feedback telling us how Hydro development activity is challenging to see by looking at our official code repository alone. The dev team usually articulates development activity for the week with a blog post summary. This week we wanted to try something a bit more engaging.
A decentralized project like Hydro has a few different challenges versus centralized projects when it comes to its code bases. One of the main differences is how code gets added, who can collaborate and where actual code lives. The whole point of open-source is to encourage the ability for outside contributors to enhance and build on its underlying code base in a completely transparent setting. Over the past few years in the blockchain space, looking at a projects Github activity has become an increasing metric for the health of a project. Since this is a metric some are using, at Hydro Labs, we want to accurately account for the true scope of code we maintain and interact with day to day.
One of the challenges with decentralized projects is code is usually in multiple repositories or spans multiple accounts. That is not a bug – that is actually a strong feature of a decentralized project. This can enable a false impression of activity if only some code is visible or code is being viewed in only one location. I will give you an example. At any given time, there could be multiple dozens of code repositories for Hydro scattered all over the globe, with varying rates of code activity. Since anyone can start building on the ecosystem, there could be developers and projects out there we are not even aware of. This is great and we encourage it! We just want to account for it all when we talk metrics and show scope of the Hydro ecosystem.
Typically, with a centralized project, code is kept in a central location. Since it is centralized, you can see it all in one repository (if it is even public in the first place). You can very quickly get a sense of how active a project truly is. With decentralized projects and code potentially scattered in various places, it becomes much more challenging to get an idea of what is going on unless you know where to look.
Here is where the experiment comes in. I wanted a way to show the true scope of Hydro’s code. This required a couple of things: A way to visualize data so code activity could clearly be seen – and the ability to add multiple sources of data over time. After some quick research and poking around for a bit I landed on something that would check those boxes. Once I fired up Terminal and learned how to use the new library – I could come up with exactly what I needed. Take a look below at some of Hydro’s code activity for 2019. Look closely, it tells a story. Each 0.1 second that goes by is a new day. every person icon is a developer in the ecosystem. Each laser beam, color change or movement is a modification to the code base. As projects get larger, they branch off, cluster and bloom.
Pretty neat huh? What you are looking at are actual developers developing. Adding, editing and modifying code across multiple Hydro projects, in parallel across a timeline. This could be Solidity smart contracts, ReactJS files, Swift for iOS, Java, HTML and CSS to name just a few. What makes it even more interesting is each minute movement has a story behind it. It derived from a specific problem or feature to add. This could have been a task that was self-assigned and proactively worked on, or assigned by another in the community – sometimes with meetings, Google hangouts and lots of caffeine included.
You can literally see a story being told here. Many developers working across multiple projects and files. You can see when pull requests are made, when chunks of sprites get pulled into another and has that elastic bounce – when code gets deleted or merged when it turns red and puffs away – and the collaboration that is happening across the files in a project expanding the ecosystem.
Here is another look at one project specifically for Hydro Hail – which is Hydros security token smart contract. You can clearly see multiple engineers working on the contracts in tandem. Looking at a project on its own allows us to get a much closer look at some things.
Another example is for the Hydro dApp Store. Again, multiple developers over time across files and resources.
Hydro dApp Store:
If you want to kick it up a few notches – take a look at Ethereum and Bitcoin projects. I will point out, for each of these projects – only one repository is being analyzed and it is massive vs multiple smaller collections of repositories like above. Open source is pretty sweet.
How To Create The GitHub Visualizations:
For anyone that wants to geek out a little further like I did, here is how I accomplished the above visualizations and my thought process.
A lot of the heavy lifting here came from Gource – the visualization library and some fancy terminal code. Once I had a visualization library capable of representing what I wanted to do, I needed to learn about the parameters and what the library had to offer. Luckily, it didn’t take too long once I had all the dependencies installed and pathed for Terminal. I needed to first install ffmpeg for the dynamic video creation and also install Gource. A dependency for these dependencies was a dependency manager called Homebrew, but I already had that installed. This allows you to run brew commands like brew install <package name>
Gource works by analyzing Github code commit logs and outputting the results into a new log file which gets parsed and fed into a fancy visual simulation. Once I was able to get it up and running and play around with one repository, I thought to my self, I could do this to multiple repositories and simply merge the logs and order the results in a timeline to one master log. Having this master log, I could then create the magic on this page showing scope of activity. When it worked the first time, I got pretty excited.
Something to point out that I quickly learned is looks can be deceiving. Sometime there were massive blooms and births in the visualization. I took a closer look in the logs to see what was going on. A few repos across the network had their whole “npm_modules” repo checked in, which added about 30k files to the project at once. I wanted to point out, with the Hydro videos created above – I removed any “npm_modules” library or any core log lines that could inflate the analysis. I did this simply by doing a global find for “npm_modules” and deleting the multiple thousands of lines in the generated .txt file.
Want to make your own visualization? Here is a walkthrough of how I did it.
Install the dependencies:
We need some things before we begin.
brew install gource
brew install ffmpeg
Xcode developer tools pathing:
I am using OSX, and I needed to make sure my Xcode developer tools were pathed correctly. Make sure you have Xcode developer tools installed first. This worked for me.
sudo xcode-select -switch /Library/Developer/CommandLineTools
Get the logs:
I used GitHub to grab the latest version of the code, and told Gource to create a dataset based on the git commit log. Run this command in the root folder of the Github repository you want to visualize.
gource –output-custom-log log1.txt new-snowflake-dashboard
gource –output-custom-log log2.txt hail
This creates a file called log1.txt with contents like below:
But picture this log file with 10,000+ lines
Combine the logs:
Now that I have all the log files each repository, I can use a simple terminal command to combine and sort them.
cat log1.txt log2.txt | sort -n > combined.txt
Now picture this combined log file with 500,000+ lines
Run the visualization and dynamically output to a video:
Next we run the visualization and on the fly, programmatically create a video with a specific set of parameters. One of the neat features here was to hide data I did not want to display, like file names because it cramped up the screen – and adjust the speed of the animation.
gource hydro-combined.txt –hide dirnames,filenames –seconds-per-day 0.3 –auto-skip-seconds 1 -1280×720 -o – | ffmpeg -y -r 60 -f image2pipe -vcodec ppm -i – -vcodec libx264 -preset ultrafast -pix_fmt yuv420p -crf 1 -threads 0 -bf 0 combined-zoom.mp4
And voilá we have our visualization.
I wanted to point out some keyboard shortcuts that I found useful while the visualization is running:
(V) Toggle camera mode
(K) Toggle file extension key
(M) Toggle mouse visibility
(N) Jump forward in time to next log entry
(S) Randomize colors
(D) Toggle directory name display mode
(F) Toggle file name display mode
(U) Toggle user name display mode
(G) Toggle display of users
(T) Toggle display of directory tree edges
(R) Toggle display of root directory edges
(+-) Adjust simulation speed
(<>) Adjust time scale
(TAB) Cycle through visible users
That being said..
I hope this experiment can give some further insight into the way Hydro works with its decentralized ecosystem of smart contracts, products and protocols. If you have any questions, feel free to reach out on Twitter or Telegram! Hope you liked it.
Timothy Allard / HCDP & Development Lead