Sopping Wet — Today’s Software Ecosystem Isn’t DRY

Tl; Dr:

  • Everyone seems to understand DRY is good at the program level, but they don’t seem to understand it at the community level.
  • Examples of useless duplication include many programming languages, libraries, package managers, data-stores, tools
  • This community duplication reduces interoperability and slows productivity across the board

Section 1: Some examples

1. Why is there more than one unix/linux package manager? Do we really need a different package manager with the same commands but renamed for each programming language? Why do we need a distinct package manager for each distro?

2. Nobody seems to admit it, but Php, Ruby, Python, and Javascript are the same language, with a little sugar added here or there and different libraries. More formally, I’d say that for 99% of lines of code (written) there’s a 1:1 translation between each of these languages (on a per-line basis). I get differences like curly braces vs indenting, but it does strike me as wet that each language has rebuilt so much core functionality (date parsing, database connectivity with the use of NoSQL database solutions, html parsing, regex, etc). Wrapping libcurl, for example, is a great way to stay dry.

This leads to a scenario where “learning a language” is more about learning the libraries than anything else (e.g. “How do timezones work again in PHP?”)

3. Did MongoDB really need to exist as as a standalone application? What if MongoDB had simply been a storage engine? The concept of a datastore that adapts its schema on-the-fly and drops relations for speed is okay, but does that justify creating an entirely new data-storage technology to that end? This means millions of engineers learning a new query syntax for a potentially temporary technology. Same with the security policy, all the DB drivers. There’s no reason all the tools to get visibility (sql pro) and backup the database need to be reinvented. Plus, if it were just a storage engine, migrating tables to InnoDB would be easier.

The same point holds for cassandra (which is basically mysql with sharding and more sophisticated replication built in), elastic search, and even kafka (basically just WAL of mysql without columns). For example, a kafka topic could be seen as a table with the columns: offset, value. Remember storage engines can process different variations on SQL to handle any special functionality or performance characteristics as-needed (I do recognize what I’m describing is easier said than done, but recommend it nonetheless).

4. Overly-specialized technologies should not exist (unless built directly around a general technology). You ever see a fancy dinner-set, where for “convenience” people are offered 5 forks and spoons, each one meant to be used slightly differently for a slightly different task? That’s how I feel about overly-specialized technologies. For example, people seem to love job queues. What if all job queues were implemented on top of a SQL backend so that engineers get the normal benefits:

  1. engineers know how to diagnose the system if it fails because it’s a known system (e.g. performance issues, permissions)
  2. engineers can always query the system to see what’s happening because it’s using a standardized query language
  3. engineers can modify the system if necessary because it provides visibility into its workings
  4. engineers can use existing backup, replication, monitoring, and other technologies to store/distribute the queue (giving interoperability)

Section 2: What’s the result of all this?

  • Every time a brand-new hype technology is introduced senior Engineers are all set back years relative to junior ones (which is bad for senior engineers, good for junior engineers)
  • The ecosystem is set back as a whole (all tools, libraries that interact with the old technology are rebuilt for the new one)
  • Company is placed in an uncomfortable position because it now only has junior engineers in the given technology. When I was junior, I worked at a startup that accidentally lost most of their customers’ phone numbers because their PHP driver for mongo would convert numeric strings to numbers, and phone numbers would overflow the default integer, resulting in no fatal errors but simply negative phone numbers.
  • The company runs the risk of being saddled with a technology that will be dropped (e.g. couchdb, backbone) and will require a rewrite back to a standard technology or be perceived as behind-the-times.
  • Slow-learning / part-time engineers must keep pace with the changing landscape or face irrelevance. Those that can’t learn several technologies a year will stumble.
  • Fast paced-engineers will lose half of their learning capacity on trivialities and gotchas of each technology’s idiosyncrasies (e.g. why can’t apache configs and nginx configs bare any resemblance to each other?). Once these technologies are phased out (e.g. now it’s mostly cloud ELBs) all of that memorization is for naught. It’s a treadmill effect – engineers have to sprint (keep learning new technologies) to move forward at all, walk just to stay in place, and if you can’t keep pace with the treadmill you fall behind.Quick aside – I think things moving to the cloud is probably one of the most outstanding dev benefits I’ve seen in my life. However it continues to illustrate the point in that if different cloud providers don’t standardize then the whole development community is slowed down.

Section 3: The exceptions

There are a few exceptions I can think of when a complete rebuild from scratch was an improvement. One would be Git. In a few months, one of the most prominent software geniuses of our era invented a source-control system so superior to everything else that it has been adopted universally in a few years, despite the intimidating interface.

The times a rebuild is justified seem to be when many of these criteria apply:

  • You’re a known and well-respected name that people trust so much the community might standardize on what you make (e.g. Linus Torvalds, Google)
  • The existing systems are all awful in fundamental ways, not simple in easily-patchable ways. You’ve got the ability, time [and we’re talking at least a decade of support], money to dedicate yourself to this project (git, aws, gmail, jquery in 2006)
  • You can make your system backward compatible (e.g. C++ allows C, C allows assembler, Scala allows Java, many game systems and storage devices can read previous-generation media) and thus can reuse existing knowledge, libraries, and tools
  • You’re so smart and not-average that your system isn’t going to have the myriad of unanticipated flaws that most software systems you want to replace will. For example, angular, backbone, nosql, are all things that might, in hindsight, have been not worth learning. Of the current high-buzz languages as of me writing this (Go, Clojure, Haskell, Ruby) it’s open to speculation which will stand the test of time.
  • Your system is already built-in-to or easily-integrated-with existing systems (e.g. JSON being interpretable in all browsers automatically, moving your service to the web where it will work cross-platform and be accessible without installation)

Section 4: What can one do?

  1. Learn the technologies that have stood the test of time: linux cli, c++/java, javascript, SQL
  2. Wait years before adopting a technology in professional use for a major use-case– let other companies be the guinea pig
  3. Be judicious in usage of new technologies. For whatever reason, it’s culturally “cool” to know about the “next big thing,” but it’s better to be late and right than early and wrong.