Choosing an open source product or platform upon which to build an ICT4D service is hard. Creating a sustainable, volunteer-driven open source project is even harder. There is a proliferation of open source tools in the world, but the messaging used to describe a given project does not always line up with the underlying technology. For example, the project may make claims about modularity or pluggability that, upon further investigation, prove to be exaggerations at best. Similarly, managers of ICT4D projects may be attracted to Open Source because of the promise of a “free” product, but as we’ve learned through trial and error at Caktus, it’s not always less costly to adapt an existing open source project than it would be to engineer a quality system from the ground up.
In this post I will go over some of the criteria we look at when evaluating a new open source project, from a developer’s perspective, in the hopes that it helps managers of ICT4D projects make educated decisions about when it makes sense to adopt a pre-existing open source solution. For those ICT4D managers looking to release a new open source platform, what follows may also prove helpful when deciding how best to allocate resources to the initial release and ongoing management of an open source product or platform. To that end, I’ll provide a high level overview of what matters most: licensing, code quality assessments, automated testing, development workflow, documentation, release frequency, and community engagement.
The three things that are most important to ICT4D projects, I would argue, are quick iteration, replicability, and scalability. Quick iteration is required in order to get early drafts of solutions out in front of beneficiaries to pilot as quickly as possible. Replicability is important when a pilot project is ready to be tested in multiple locations. Similarly, once a pilot has been shown to be successful, the ability to quickly scale up that project to meet regional, national, or even international demand is critical.
The problem is that these three success factors often place competing demands on the project. Doing things the quick and dirty way may be perceived as shortening the time to a working solution, but it also means the solution might not work in other contexts. Similarly, the project might hit a technical barrier when it comes time to scale up. With proper planning and execution, however, I believe all three of these — quick iteration, replicability, and scalability — can be achieved in a way that does not require compromises nor starting over from scratch when it comes time to replicate or scale an ICT4D project. Furthermore, we believe strongly at Caktus that doing things the right way the first time minimizes both risk and the time to develop a software project, even for quick, iterative pilots.
Selecting permissive licenses lowers the barrier to entry
There are many types and subtypes of open source licensing, and trying to select a project based on a license can easily get confusing. Generally speaking, we opt for the more permissive BSD- or MIT-style licenses at Caktus when we have the choice. The main thing to consider when using software with more restrictive licenses such as the GPL or AGPL is that they tend to be less business- or donor-friendly and hence may attract a smaller overall community than they would have otherwise. They can also add requirements that your project might not otherwise have had, such as open-sourcing it.
Creating code readable by humans improves scalability
Code quality is something that is easy to forget about early in a project. ICT4D pilots are often like startups: the drive is to get features out the door as quickly as possible to test and prove the minimum viable product (MVP). We believe you can produce work that is both speedily deployed and later easy to scale by focusing on code quality from the start. In software development there is a concept of “technical debt:” Moving quickly without concern for quality creates “debt” that must be paid back, with interest accruing over time.
Code quality includes creating code that is readable to fellow developers. Like any language, clarity for other people reading it matters. At Caktus our preference generally tends to be for the Python programming language because it is well known for being highly readable and easy to learn.
For those ICT4D program managers starting new projects, regardless of the programming language, it’s helpful to build in time for the development team to add automated checks to the project that enforce a code formatting standard. For those evaluating a new open source solution, apart from reviewing the code itself, ICT4D program managers can check for the existence of documented coding standards. The end goal is for all developers on a project is to write code that is indistinguishable from another developer’s code; you should not be able to tell from looking at a piece of code who wrote it. This makes it easier both to bring new people into the project and for a developer jump into a part of the code he or she didn’t write, in case the person who wrote it happens to be inaccessible at the time an urgent change is needed. The code should be the product of the team, not a set of disparate individuals, and having code formatting standards in place helps encourage that. At Caktus, we typically use flake8 (run via Travis CI) to check the format of our code automatically each time a developer makes a commit or submits a pull request.
Automated code testing ensures reliability
Automated code testing is both best practice and necessary to avoid software failures, but we have seen it dismissed in the rush to deploy. The key concepts for ICT4D program managers to consider in the planning process is what kind of automated testing developers are using. Automated testing includes both “unit” and “integration” testing. “Unit tests” are pieces of code that individually test discrete parts of the overall code base to ensure they continue to work as expected as changes are made to the system. “Integration tests,” similarly, verify that the different components function when combined into a complete system. The end goal of both types of tests is the same: to ensure that the existing software does not break as features are added or changed or bugs are fixed. Absent automated tests, it’s all too easy for something as small as a bug fix to introduce one or more new, unanticipated bugs in other parts of the system.
At Caktus we primarily use Django’s testing framework, which is based on unittest framework in Python. We also set up Continuous Integration to run tests on every set of changes automatically and email the developers when tests fail, so the team is always aware when the tests aren’t passing. When evaluating whether or not a project relies heavily on automated testing, two things to look for are (a) whether or not the project advertises test coverage (as a percentage, at least 85-90% is preferred), and (b) whether or not the development process requires new features to come bundled with unit tests. As with code quality, if automated tests are left out of a project, I would argue that the time to develop the project will actually increase rather than decrease because the development team will end up spending time tracking down bugs that would have been caught by the testing framework, time that could have been spent developing features.
A documented development workflow streamlines new contributions
The development workflow is another important part of any software project, in particular open source projects. Open source projects should have a clearly documented, community supported method for (a) proposing and discussing potential features or other changes, (b) developing those changes, (c) having those changes reviewed and approved by other developers, (d) merging those changes into the main branch(es), and (e) releasing sets of those changes as numbered releases (e.g., v1.2). Whether a project has these things documented can usually be discovered easily by searching for a “developer manual” or “contributors guide,” as well as reviewing the content of the project’s developer mailing list to see evidence of how contributions work in practice. This documentation acts as a clear entry point for both users and developers without which open source projects wither.
At Caktus we typically use a variant of the GitHub Flow model that includes one additional “staging” or “develop” branch that is used to deploy the code to an intermediary “staging” server. This allows code to be tested before being deployed to the production server. A key part of this workflow is the peer code review, a process by which a fellow developer reviews every new change. Not only does the process help detect potential issues early, it also broadens overall knowledge of the code base. Code reviews can’t be done intermittently or when it’s convenient, but should be done for every change being made to the project. We believe creating a culture of code reviews allows individual developers to forgo ego in favor of a drive towards system integrity. One can evaluate whether a project does code reviews by checking a number of places, including the project developer mailing list, the GitHub or BitBucket “pull requests” feature which allows line-by-line reviews, or simply by reviewing the commit log to see if changes are made directly to the “master” or “default” branch or if they’re made to separate “feature” branches first.
Clear documentation helps create sustainable open source projects
Good documentation is fundamental to any successful open source project. Perhaps counter intuitively, it’s just as easy to have too much documentation as it is to have too little. Signs that an open source project takes documentation seriously include things like how often the documentation is referenced on the project’s mailing list(s), where the documentation is stored, how the documentation is edited, and how easy the documentation makes it, both for new users and developers of the project, to come on board. While not always the case, documentation that is automatically generated by the code can be a case of “too much” rather than “good” documentation. Jacob Kaplan-Moss of the Django project wrote a great blog post back in 2009 on writing good technical documentation that is worth a read for anyone putting together documentation for an open source project.
At Caktus we generally have a preference for storing developer-written documentation in the code repository itself; this allows the team to quickly update documentation when code changes are made, and also makes it easy to spot discrepancies between code changes and documentation changes when doing code reviews. While wikis may be easy to update, they tend to fall out of sync with the code because updating them happens as part of a different process. Hosting documentation in a wiki also makes it harder to refer back to older versions of the documentation if you have a system that’s been running for a few years and have not been able to upgrade the underlying platform.
Regular releases and recent “commits” help ensure continuity
One of the first things we tend to look at (in part because it’s one of the easiest) is to check how recently the project we’re evaluating released a new version and/or how recently someone committed new changes to the code. While it’s not always a bad sign if there hasn’t been a release in a year or two, it’s generally better to find projects that have regular releases of at least 2-3 times a year. It can also be a bad sign, for example, if there are lots of frequent commits to the code repository, but the last “released” (numbered) version is many months or years old. This may mean that the release management has fallen off track, and the project is targeting only internal users rather than the larger open source / ICT4D community.
Developer community engagement necessary to leverage the power of open source
Community engagement and openness are two more important factors to consider when selecting an open source project as the foundation for (or to add to) an ICT4D solution. Community engagement matters because projects without a community of users and contributors tend not to be maintained over the long run. Engagement of the community can be evaluated by reviewing traffic on the project’s mailing list(s) and bug tracker (for both users and developers) and determining the prevailing character of project communications. Key events to look for include the usual response when someone enters a bug report, submits a suggested change or pull request, or proposes a discussion around the project’s development workflow. While reasonable demands can (and should) be placed on new users for following protocol, a high number of rejected changes or disgruntled first-time users tends to be an indicator of poor community relations. These are some of the reasons why we’re big proponents of the Django framework: the community is almost always warm and welcoming and is quick to enforce this culture. In addition to communications, other positive attributes to look for include documentation around adding new members to the core development team as well as codes of conduct or other policies that set forth in a public way the desire to create an inclusive community for all. These things matter because developers are people, and communication -- as in any discipline -- is critical.
While by no means an all-inclusive list, these are some of factors I think it’s important to consider when selecting a new open source product to use for an ICT4D solution. I hope to have provided useful insight into the developer’s perspective, one that I think ICT4D program managers should consider when evaluating open source projects. I realize selecting projects that hold themselves to the highest standard on all of these points may be a difficult task, so as with many things deficiencies in one area may be made up for with excellence in others. Similarly, implementing all of the above points on an open source project you release will not result in a sudden wave of contributions from volunteer developers, but the more you can do the more you’ll lower the barrier to entry for developers and facilitate community growth.
I hope to update this post from time to time with new ideas and approaches for evaluating open source projects for use in ICT4D, so if you have any questions, comments, or suggested additions, please leave them in the comments section below. I look forward to your feedback!