The Art of Picking Dependencies
Advice for choosing open source dependencies
In this post I’ll discuss some of the things I look for when picking an open source library. With so many options available it can become an overwhelming process to pick something that meets my needs now, yet doesn’t cause problems in the future. In the worst case it becomes an arbitrary process, so let’s go over some of the considerations to keep in mind when picking a new open source library.
First, Do You Even Need a Library?
Keep Things Small
Before searching for a new library I spend time thinking about whether I even need a new dependency at all. I try to add as few dependencies as possible to my applications. This isn’t to save disk space (it’s 2018, disk is practically free), but to save complexity. Oftentimes adding a new dependency to do something simple is just not worth the trouble. Conversely, adding a large dependency brings its own issues such as a larger surface area for bugs and security issues, or perhaps breaking changes due to a part of the library I’m not using directly. So if I can, I avoid adding new dependencies to my projects entirely.
Do You Already Have Something Good Enough?
If you are using a well known framework like Spring, there’s a chance it might have what you need already. For example, one of the primary benefits of Spring Boot is an opinionated and maintained set of dependencies. No worries about incompatibilities, the Spring team are more on top of security issues than most groups are, and you can use more or less anything they package.
Search through your transitive dependencies, you might be surprised at what is in there. There is a slight risk that a transitive dependency might get dropped in the future, but in my experience this is somewhat rare.
Nobody Else Has This Problem
Once you’ve decided that your problem is large enough to warrant its own dependency, what happens if you don’t find anything that solves your problem? This is either an opportunity to write something that the open source community hasn’t figured out it needs yet, or is an indication that you are on the wrong track. Not every problem needs to be solved with a new dependency, so take some time to think about whether the absence of a pre-packaged solution is a signal that maybe the problem doesn’t really exist in the first place. This is especially true of languages that have been around for a while, such as Java.
What I Look For
Here are some of the factors I consider when picking a new library. They aren’t presented in any specific order, and I don’t weight them equally or consistently. Given the size and complexity of the problem the library is going to solve for me, I might consider some of these factors as must-haves, but for the most part these are entirely subjective and mostly optional to one degree or another.
License
“I’m not a lawyer, but I play one at work."
If you work for a large company, they probably have a list of “acceptable” licenses to choose from. My company certainly does, and if the new dependency has a license that’s not on this list I tend to look elsewhere. Frequently, it’s just not worth the effort to get a new license approved.
If you work for a smaller company or this is for a side project your options might not be as limited, but you should definitely be aware of what license you are agreeing to and what its implications are. For example, you might be required to release more code to the public than you are comfortable with if you make fixes to and distribute the results. When it comes to understanding each license, I find the Choose a License site to be helpful. Prefer libraries with an explicit and mainstream license over something that isn’t licensed, or has a home grown license.
When I author open source software myself I pick the MIT license because I want things to be simple for everybody.
Simple Explanation
If I see a library in a project that I’ve never heard of, I go look it up. Sometimes it’s very hard to figure out what some open source projects actually do. Think of maintainers of your project in years to come. Do you think they’ll understand what this even does, or why you picked it? A projects should have a simple, professionally written, clear explanation of their purpose and intended use cases. This should not be a guessing game when opening their GitHub page, for example.
Tests
“How am I going to know that it works if there aren’t tests?"
Remember when I said most of these factors were subjective? This one isn’t. I will pass on projects without tests, or tests that only cover a trivial portion of the useful code. I don’t need tests for everything like getters and setters or other trivial methods (in fact, that might be a sign of an immature set of tests!), but projects need to have tests for the important bits. Otherwise bugs slip through from release to release.
Like it or not, the absence of tests sends a signal that the maintainers don’t care enough to bother. Why should you tie the fate of your project to something that wasn’t taken seriously enough to even unit test?
Overlap with Existing Libraries
Do you already have a dependency that does most of what your new library does? Seriously consider either replacing the current library with the new contender, or finding something different to use for the missing parts. Heck, if your need is small enough, write it yourself (taking inspiration from the other projects you’ve found). I personally dislike it when I have a few dependencies that all do similar things (I’m looking a you String and Collections manipulation libraries).
Additional Uses
“I only need it to do one thing, but I’m sure I’ll find a use for the other 99 things this library does eventually!"
Don’t pick a library to do one simple thing just because you feel that you might use all of the other things it does one day. More code means more bugs, more vulnerabilities, more interconnections, and more unrelated documentation to wade through.
When developing software, complexity is the enemy. Don’t add a library that does WAY more than you need, just because you can.
Documentation and Examples
Libraries should at least have a README file on how to integrate and use them. Ideally, comprehensive documentation and examples on all major features is available as well. This is more true of larger projects. If a library is small and focused, I’ll overlook a lot of missing documentation if I have a general idea how to use it.
Essentially, the larger a project is, the better the documentation and examples need to be.
Quality of the Code
It’s open source, why not look at the code you’re going to be depending on? I bet most people don’t bother looking too deeply (if at all) into the source code of their dependencies. And in most cases, that’s totally fine.
Looking at the code, do you think you would be able to find and fix a bug if you had to? When you pull it into your IDE, are there lots of yellow warnings about possible bugs? Does your static analysis tool prompt you saying it has generated too many warnings already?
Don’t confuse poor quality code with disorganized or poorly formatted code. Sure those are things to look out for, but I’d argue that most of those are subjective (for instance, I’m the only developer in the world who doesn’t care about tabs or spaces). Besides, code can look disorganized at first if you don’t really understand the architecture of a project or system.
Security Warnings
Hey, security is tricky stuff. You only need to screw up once and a whole army of black hats will be on you. It’s important that the maintainers of your new dependency have a good track record of avoiding or quickly fixing security vulnerabilities. If your organization has licenses for something like Black Duck or Snyk , I’d highly recommend scanning any potential new dependency with them first. There are online versions of both products that offer an overview of various open source projects, and that’s a good start if you don’t license the full versions.
Release Frequency
With small projects that are very tightly focused I’m willing to overlook the fact that the project may not have had updates for months or even years. If there’s no reason to change, why release something just for the sake of it? On the other hand if the project is larger, it should be updating its dependencies regularly, taking advantage of new language features, and generally fixing bugs and improving test coverage. A large project without updates in years is a sign to find something else.
Issues and Bugs
Most open source projects will have an issue tracker, so go check that out. In fact, if they don’t have an issue tracker, I’d find something else. Do there seem to be a lot of legitimate issues?
Sometimes a lot of questions being entered in an issue tracking system can be a sign of poor documentation, or lack of an alternative place for users to ask questions.
Assuming there are bugs (hey, we’ve all written buggy software), do they seem like they are getting fixed? Do the maintainers respond to bugs, or just close them? Even worse, do bugs just stay open with no triage or comment? Imagine integrating this library and finding something critical. How do you feel it would be addressed?
Would you risk the success of your project on the way bugs in your potential new dependency are fixed?
Community Attitude
There is absolutely no reason to use software from a generally negative group of people. Do issues get closed with demeaning or condescending comments? Pass. I’ll stress that this is really rare, most communities are perfectly fine. Some are even great with friendly groups of people willing to help (Kotlin, for example).
If you run into one of those rare bad communities, find something else. Life is too short.
Popularity
Despite the trend in our industry, I don’t measure the usefulness of projects based on the number of stars it has on GitHub (Gasp! Heretic!). However I do try and get a general sense of what people think of any library I plan on using. On the one hand, I don’t want to always pick the most popular solution just because it’s popular. But on the other hand, I generally don’t want to be the only user of something.
I realize that by using an unloved project it might grow into something the community embraces, but since I work in payments (read: moving other people’s money around), I’m generally not willing to take that risk. The last thing I want to do is pick a dependency and rely on it for something critical, only to have the project die off while I’m stuck maintaining the code around it for ten more years (yes, this has happened).
Conclusion
I suppose the overarching theme is that larger projects should have more of the good qualities and fewer of the bad qualities, while smaller projects can probably be cut some slack.
TL;DR:
- Try as hard as possible not to add a new dependency at all.
- Be sure you understand what license terms you are agreeing to.
- If it can’t be explained in simple terms, it probably isn’t very simple.
- If there aren’t tests, it’s not worth your time.
- It shouldn’t do the same things your existing dependencies already do.
- You aren’t going to need all of the extra bells and whistles, be careful about big libraries.
- Documentation and examples should exist and be high quality, especially on larger projects.
- Code quality is important, strive for easy to understand and few static analysis warnings.
- A track record of security and prompt fixes for security issues is a must.
- Large projects should have regular releases.
- There should be a history of quick triage and timely fixes for bugs and issues.
- Nobody deserves negativity in their life.
- Popularity isn’t everything, but it’s something.
- When in doubt, see point one.