Let There Be Bad Code: A New Approach to Deploying Code in an Age of Machine Learning

The fields of data science, machine learning and artificial intelligence are starting to have a material impact on technology and business. We may still be in the “Wild West” phase, but no longer are these topics esoteric and reserved for college professors.

If you’re a developer or architect who I’ve argued with about code quality, give me a second to explain. To understand why I believe that bad code may be acceptable, you should recognize the shift that is occurring in programming. There are a couple of undeniable trends to examine:

Systems have consistently become more and more complicated in order to tackle tougher and tougher problems.
The skills required to code an entire system have multiplied. It’s been a long time since anyone has written an enterprise system in a single language in something like COBOL, which doesn’t use a database or 3rd party library. Enterprise systems (and many other systems) require multiple languages, 3rd party tools and 3rd party libraries.
Users will no longer accept a system that just does what it’s supposed to. We expect the system to anticipate our needs. We no longer tolerate typing in a complete address; We expect that the system will complete the address once we give it just enough information. As a result, there is a drive—and a need—for smarter and more complex systems.

Until now, there have been several different approaches deployed to battle this complexity:

Developers have better tools to manage the complexity, like Integrated Development Environments (IDE) and very mature tools like Visual Studio.
We’re moving further away from machine code. (You’d be hard pressed to find someone who still codes in Assembler.)
The introduction of abstraction such as Object Oriented Programming (OOP)
The introduction of abstraction through layers such as Network Stacks, database connectors, etc.
There’s a separation of concern through approaches such as Microservices.
There are new trends of model-based coding, such as 4GLs.
There are better requirements through enhanced process and tools.
There is a greater use of automated testing frameworks.

The above approaches don’t change the fundamental fact that most code is really some form of “If this thing is true, then do this other thing.” If the user presses the red button on the screen, turn on the alarm. Better tools, abstraction and micro-services are simply techniques that are used to build bigger and more complicated systems.

But the shift from an “if-this-then-that” programming style to a probability-based programming style is fundamentally different. Machine learning (ML), artificial intelligence (AI) and data science are all about probabilities. Processing millions of data points with advanced math will yield a probability, not 100% certainty. Examples are all around us in everyday life: When your car has an issue, you take it to the mechanic. They may say something like “sounds like there is a loose belt, but I won’t know for sure until I take a look inside and try a couple of things.” If a mechanic looks at 1,000 cars a year he might be right 93% of the time. That’s good enough for him to have a good business and fix loose belts all day long.

I would even suggest that the mere nature of this newly adapted programming style makes testing with any level of certainty impossible. If you write some new code or a formula that increases sales by 10%, does it really matter if the code is “technically” correct? Isn’t what matters the jump in sales?

Of course I can’t totally let go of writing good code. Here are some suggestions on how to approach quality with this new approach:

Don’t dismiss the value of writing good code; Just do it later and be selective. When building your data pipeline, make code live and then look at the pieces that exhibit both a combination of slowness and high usage.
Create monitors that observe performance. Have the team use alerts to go back and tune and test that code. Let the bad code run and then tune the most important areas. Don’t waste a lot of time trying to pre-tune.
Implement A/B testing. Seriously. A/B test everything. In some cases, you can test new algorithms vs. old algorithms. In other cases, you may need two slightly different permutations.
Many business have independent data science teams, which I am going on record as being a mistake. If we have learned anything from Agile’s approach to software development, it is that cross-functional teams are the way to organize. In time (another prediction here), machine learning and artificial intelligence are all going to merge back into the tech industry’s normal organizational charts. Data science experts are great; Just make them part of a team that delivers value throughout the normal development lifecycle.
Don’t re-invent the wheel. I recently attended a local ML/Big Data user group and the presenter was so proud of himself for “inventing” a way to keep track of his algorithm changes. I almost lost it right in the middle of his presentation. “A” for enthusiasm, “F” for not using tools like Git or Bitbucket. Development teams around the world have been working on solutions for how to organize and track code for decades, so leverage their efforts.

Examine the early adopters of data-driven programming based on probabilities and you will see teams successfully navigate through all of the above issues—and more. Old habits must be broken and non-data scientists will need to be educated because it will no longer make sense to ask if something is 100% accurate.

With this new approach, we will all need to learn a little bit of math. We won’t need to do linear regressions by hand, but we should know when a logistical regression should be used over a linear one.

The idea of programming as an art and science is evolving just as much as the underlying languages are evolving. It’s exciting. Soon systems will have abilities that were thought impossible only a decade ago. Many of these systems will be powered by this new style of programming, so you might as well prepare for it now.

John BassoDecember 6, 2017programming, coding, data science, machine learning, website architect

Blog

Let There Be Bad Code: A New Approach to Deploying Code in an Age of Machine Learning

Based in Boulder, CO