View All Posts
read
Want to keep up to date with the latest posts and videos? Subscribe to the newsletter
HELP SUPPORT MY WORK: If you're feeling flush then please stop by Patreon Or you can make a one off donation via ko-fi
#BUGS #DEBUGGING #PROBLEM-SOLVING #PROGRAMMING #REPROGRAMMING DEBUGGING STAGES #SOFTWARE DEVELOPMENT #TOOLS

A programming classic

There’s a classic programmer joke - the stages of debugging:

  1. That can’t happen
  2. That doesn’t happen on my machine
  3. That shouldn’t happen
  4. Why does that happen
  5. Oh, I see…
  6. How did that ever work?

It’s funny because it’s true.

But why do we laugh at this? It’s a pretty terrible state of affairs.

There’s a lot to unpack in this joke.

“That can’t happen”

First off why is our reaction immediately to deny the very existence of the bug? It’s unlikely that someone will have gone to the effort of cooking up an elaborate lie to waste our time looking for a non-existent bug.

Bugs are a violation of expectation, someone expected the system to behave in a certain way and it didn’t.

From a developers point of view, our expectations have also been violated. We told the computer to do one thing, and it has decided to something completely different.

This leads to the classic - “there must be a bug in the compiler” or “must be user error” and the equally popular blame the tools: “it’s because we’re using XYZ language or framework - everyone knows it’s buggy/broken”.

Developers take a lot of pride in their work - we’re generally compensated well because we are considered to be experts in our field - suddenly we’re exposed as being just as fallible as the next person.

Obviously the “That can’t happen” is a foolish response. A computer just does what it is told to do. It is not an evil mischievous imp that is deliberately trying to sabotage our work.

We need to change our immediate response to one of acceptance - there’s a bug, no point in pretending it doesn’t exist.

“That doesn’t happen on my machine”

What kind of a developer just chucks code over the wall without testing it? Of course it works on my machine!

Well, what can we say about this one? We could just lump this in with the denial of the bug existence, but it’s worth breaking it out into its own discussion.

In complex systems this is a remote possibility, code that works in isolation may not work when deployed. Interactions between different parts of a system can cause the behaviour to change in unexpected ways.

However, in a well-architected system this should be rare, and if it’s really happening then it’s a sign that something is wrong.

There’s no point saying “it works on my machine” until you’ve actually gone and tried to reproduce the bug on your machine.

Once you can prove categorically that it works on your machine then you can add that to the evidence pile for debugging the problem.

“That shouldn’t happen”

“Umm, yes, that’s why I’ve reported it as a bug” would be the facetious reply.

But this is another facet of the “I told the computer to do this, but instead it’s doing that”. It’s a violation of expectations on the developer’s side of things.

This is usually the phase of acceptance. We’ve now reached the point where we agree that something is wrong, we’ve seen the bug with our own eyes, it’s something wrong with what we’ve done, there’s no more excuses to hide behind.

“Why does that happen?”

This is where things start to get interesting. This is the fun part bug bashing.

Why is this bug happening? What’s our hypothesis for what we are seeing and how do we test it?

“Oh, I see”

The lightbulb moment of insight, through investigation you’ve developed a hypothesis of why the bug is happening and you have an idea on how to fix it.

The problem now becomes impossible not to see. It’s obvious. How did this code ever ship? Which moves up nicely onto the next stage.

“How did that ever work?”

Hindsight is a wonderful thing.

Now that you know how to create the bug, and you know how the code is wrong, you’re wondering which idiot wrote it (spoiler alert - git blame will point the finger at you).

The code could never have worked properly. You’ll start to wonder how many other bits of the codebase are complete nonsense.

Adjusting our approach

Let’s turn the programming classic on its head and rewrite the stages of debugging:

  1. This is happening
  2. Research
  3. Create a hypothesis
  4. Test hypothesis
  5. Fix the problem
  6. How do we stop this happening again?

“This is happening”

No point denying it - there’s a bug, I’m glad you found it.

Research

We need to gather information on the bug:

  • how do we reproduce it?
  • what test data do we need?
  • how much of the system do we need to run recreate it?
  • what’s the minimum I need to recreate to debug it?
  • which part of the codebase is it happening in?
  • which bit of code is the likely problem?
  • do we have any relevant logs from when the problem occurred?

The more information we can gather the better.

Create a hypothesis

Our research should have pointed us at the potential problem, we should have developed enough knowledge to form a working hypothesis on what the bug is caused by. We should hopefully be looking at the bit of code that is wrong and have an idea on how to fix it.

Test hypothesis

How are you going to test your fix works?

Before jumping in and changing code can you definitely recreate the bug? Does it happen consistently in your test environment?

Can you write a unit test to recreate the bug?

When you apply your fix does the test now pass? When you run through the steps to recreate does it now consistently work?

Fix the problem

If we’re lucky the previous step proves that our thinking was correct, we’ve changed the code and everything works.

Clean up any debugging code go through code reviews and deploy - everyone is happy!

Don’t forget to check that you’ve not broken anything else…

The bug is fixed when the person who raised the bug in the first place is happy.

How do we stop this happening again?

This is the real value in finding and fixing bugs. The bug should never have happened in the first place.

  • Are we missing unit tests for this part of the codebase?
  • Have we missed a whole class of unit tests across the codebase that make this kind of bug more likely?
  • Are we missing integration tests?
  • Do we have automated tests to catch these bugs?
  • Is there something wrong in our process that allowed this bug to slip through the net?

Types of Bugs

What kind of bugs do we encounter? And how do we fix them?

Easy(?) Bugs

There’s a set of bugs that can be classed as “easy(?)”. There’s a question mark next to the easy as a bug being obvious or repeatable does not necessarily mean the finding the underlying cause and fixing it is necessarily easy.

  • UI Bugs
  • Bugs of Omission or Misinterpretation
  • Repeatable bugs

UI Bugs

It functions but it doesn’t look right.

UI bugs tend to revolve around the styling and positioning of elements.

Well organised companies will have wireframes and high definition mockups that you should be working from. They should have style guides and component libraries that tell you how things should look and behave.

Sometimes, we are working in the dark, there may not have been time or resources to design wireframes and mockups, you may be working from some scribbles on a napkin - make sure you fit with the rest of the application. Don’t break people’s expectations!

Another source of these bugs are different device formats - maybe it’s fine on your large desktop monitor, but on small laptops or mobile devices the UI you’ve created just doesn’t work.

There’s also a class of bugs around accessibility issues - these often get overlooked and unless attention is paid to this area it’s easy to forget about it only to have it flagged by a diligent QA person.

Solving these bugs should be straightforward:

  • What is it supposed to look like?
  • Make it look right
  • Test on the correct target devices and sizes

There may be some fundamental process issues to be addressed here - someone knows what it should look like as they have raised the bug.

Why didn’t you know what it was supposed to look like when you built it?

Bugs of Omission or Misinterpretation

You thought you’d built the right thing.

You didn’t…

In theory, this should be an easy one to fix - find out what was supposed to be built, build it…

There are some questions to be asked around what went wrong in this situation - was the task not specified in enough detail, is there a communication gap between the product managers and the dev team that leads to the wrong thing being built?

Or did you just fundamentally misunderstand what was being asked of you?

Sometimes it’s simply a case of trying to hit a moving target. By the time you’ve finished building something everyone’s understanding of what should be built has changed. Expectations have changed and someone forgot to tell you…

Something is broken in your process - it’s important to work out what it is if this class of bug keeps occurring.

Repeatable bugs

Every time I do these steps, this thing happens, it’s not what I expect to happen, it should do this instead.

This is a nice class of bugs - repeatable with a clear set of steps to recreate the problem.

Should be an easy fix:

  • Look at the application logs whilst recreating the bug
  • Inspect any relevant crash logs and stack traces
  • Run through the steps with a debugger attached and have it break on exceptions
  • Simply walk through the code and sanity check it - does it make sense?

However, for new developers or people unfamiliar with the codebase these can also be extremely frustrating bugs.

I can happily recreate the bug, it breaks on my machine, I have no idea where to even start looking in the codebase for where to fix it.

Senior developer strolls over, takes one look at the bug and immediately brings up the line of code that is the problem.

Someone who knows the codebase intimately will probably know where most data in the system is coming from and will appear to have some magical power for identifying where a bug it.

This is why bug fixing a few simple bugs can be such a good onboarding process.

What can we do if we don’t know the codebase?

We’ll need to start employing our powers of detection and deduction.

Look at the architecture of the system, how does data flow from one place to another. What are good places to inspect the current state of the system, where does data get transformed. These are all good places to start tracking down bugs.

We can work backwards from the UI, search for a string that is near the problem area, hopefully, the bit of code showing the value will be nearby. Now work backwards from there to where the value is displayed to where it is generated.

Keep working backwards sanity checking the code as you go. You should eventually work your way to the bug or at least a place where you can start debugging the code.

Hard Bugs

Now we start getting onto the more difficult class of bugs:

  • User/Data Specific
  • Heisenbugs/Rare/Weird Bugs

User/Data Specific

One user or one subset of users have a problem, everyone else is ok. You can’t recreate it locally and you can’t recreate it with any of your test user accounts.

This is a really nasty kind of bug, sometimes it can help to have the user demonstrate exactly what they are doing - there may be some subtleties about the steps they are doing that aren’t captured in the steps to reproduce.

On some systems you may be able to get permission to login as the user.

What is specific about their environment? You need to try and recreate the exact user environment to reproduce the issue.

Remote logging can be a life saver in these situations, pull the logs for when the bug happens, is there anything out of the ordinary.

If you have good logging in place then you should be able to see deviations from the happy path.

You need to become a detective. What is special about this user that makes them different from the other users.

Bring in other people to help create and test different hypotheses about what could be causing the issue.

Heisenbugs/Rare/Weird Bugs

  • When you try and debug it, it stops happening
  • Only happens in release mode
  • Time/Date specific
  • Network state specific
  • Race conditions/threading issues
  • Memory corruption/run away pointers

You’re now into deep detective work, you may need to run soak tests for days on end to recreate the problem.

You’ll need to add detailed logging - but you might find adding the logging moves the problem, or even worse, the problem disappears when you turn on logging!

There’s only one way to track these down and that’s to apply brain power to the problem. You’ll need to keep generating and eliminating hypothesis until you hit upon the correct one.

The tools of the trade

What tools do we have at our disposal?

Logging

I cannot emphasise enough how important good logging is to debugging. Good logging should show the happy path through the code and any errors that can occur. The beauty of this is that when something is going wrong you should be able to see when the code deviates from the happy path. Why does it suddenly stop half way through processing this request?

The debugger

A lot of people just don’t seem to know how to use the debugger for their language!

This is one of the nuclear weapons in our arsenal - learn how to use it.

If anyone tells you that it’s not possible to use a debugger for your particular language - don’t believe them! Check for yourself and learn how to use the tools at your disposal.

Our brains

Computers are only doing what we tell them to do, debugging is simply the art of tracking down where the instructions we’ve given them are incorrect.

We can reason about bugs, we can hypothesis about why the system is behaving in a certain way, and then test the hypothesis.

You have everything you need at your disposal, take a step back from the coal face and the solution will generally present itself.

#BUGS #DEBUGGING #PROBLEM-SOLVING #PROGRAMMING #REPROGRAMMING DEBUGGING STAGES #SOFTWARE DEVELOPMENT #TOOLS

Related Posts

How to Log - Logging errors and exceptions are often the bare minimum folks set-up but hey, it's time we optimized this! We need log trails that allow us to follow the code - the 'happy path'. Why, you ask? To be able to trace user reported bugs, have a visual of where the code went rogue and save tons of debugging time. Next time you set up your logging system, remember these words of wisdom.
AI Tools in Coding: Progress or Just Optimization of the Status Quo? - Pondered the future of coding with a colleague of mine recently, as one does. Couldn't help feeling a tad underwhelmed by the programming tools like GitHub's Copilot X and ChatGPT. Sure, these AI-powered coding buddies are smart, helpful and do save time, but they more or less tinker around the edges - optimize the existing processes but don't fundamentally alter them. Software development still remains a tedious enterprise - a loop of problem-definition, code-generation, code-validation, and feedback. Not so different from what it was 30 years ago really. We're surrounded by subpar software everywhere - the bugs, the slowness, the vulnerability. My, what a mess! We need a massive, transformative solution. Imagine an AI system that could take high-level concepts and materialize fully functional apps from them! We've seen glimpses of such capabilities in ChatGPT, generating apps from sketches, but that's just it - glimpses. Still a long way off. My hope for the future? An era where writing code is antiquated. We instruct AI to generate, tweak, even test code - on a constant iterative feedback loop. Maybe, what we know as 'code' could be on the cusp of extinction, replaced by something machines create and understand - and we don't. A brave new world, indeed!
Are you an effective team? - This blog outlines the absolute fundamental elements that any competent team needs to ensure smooth operations. It emphasizes the importance of continuous integration, continuous deployment, running effective unit tests, comprehensive code reviews, and the ability to run the system locally. The objective here is not to over-complicate procedures, but to streamline consistent quality through these five basic yet crucial practices which when implemented should mark an effective team.
A Life in Tech - The Early Years - I was fortunate enough to enter the world in 1971 alongside Intel's 4004 microprocessor – a moment that ushered in the digital era as we know it. Although a bit of an educational renegade, my curiosity steered me down a path filled with ZX Spectrums, Christmas wish lists, dangerously strewn cables and a legion of half-disassembled childhood toys. In spite of the haphazard approach to my intellectual explorations, I eventually managed to grasp the fundamentals of assembly language and savoured the glory of publishing a small utility, all whilst navigating the complex prepubescent minefield of Dungeons & Dragons. Looking back, I wish I could've broken out of my shell to learn more from my peers and mentors. Still, I cherish these nerdy memories and the doors they opened for me in life...

Related Videos

How Good Are My USB Cables? - In this video, I engage in a deeper exploration of USB testing, encountering a range of trials and tribulations along the way. First up, I attempt to test a couple of USB cables and discuss their data lines' peculiar situatedness. Then, I delve into the assembly of the PCB boards, sourced from PCBWay, which surely turned into a learning voyage than an easy sail. The first version faced challenges of misordered large-frame stencil, unseen connection problems, and even DRC errors in the submission process. Not dwell on that, I redesign and bring to table Version 2, complete with USB connectors with broken-out pins and designed for visual inspection ease. Although this version demonstrated success, soldering difficulties and bad connections persisted. Considering all the hurdles, I decide to let PCBWay handle the assembly for the next version. For future strategies, adding test points and eliminating the lithium-ion charging circuit seemed more practically viable. Peeping into version 3, testing points for all USB cable pins have been added and even an option to break out actual USB connections, all towards ensuring an improved error checking and usefulness of the assembly.
AR Sudoku Solver in Your Browser: TensorFlow & Image Processing Magic - Discover how to recreate a Sudoku app using browser APIs and ecosystem advancements, and explore the image processing pipeline technique for extracting Sudoku puzzles and solving them.
We weren't really going for speed - but this gets quite fast... - Discover the process of sorting out issues with open loop mode and vibrations in 3D printers, while learning how to make the most out of leftover proof of concepts.
PCB Mistake Turns into Vacuum Stencil Experiment | MHP30 Hotplate Test - Learn how to recover from a mistaken chip order and improve SMD soldering techniques using a mini hotplate and vacuum cleaner stencil holder. Watch the mesmerizing results under the microscope!
Can You Spot the Problem? - Buckle up folks, this video is a thrilling one! There's everything from unboxing my new ESP32 TV boards that arrived from PCB Way to discovering some hidden issues. We're talking about some pesky problems, surprises, and even a potential catastrophic error that could've led to a disaster. The main dish is the high-speed SD card access over USB - ultimately achieving a whooping transfer rate! But, the journey is a roller-coaster ride, from the project completely failing initially, to some smart hacks and triumphant moments. All the peripherals worked well, from the display to the sound amplifier and even the infrared receiver. Despite the ups and downs, there's a lot to learn and that's what makes this video exciting! Can't wait to share the improvements I have in mind for turning the prototype into the ultimate all-in-one device. But first, let's address the elephant in the room - an ill-placed diode that's a ticking bomb, because you know, safety first!
HELP SUPPORT MY WORK: If you're feeling flush then please stop by Patreon Or you can make a one off donation via ko-fi
Want to keep up to date with the latest posts and videos? Subscribe to the newsletter
Blog Logo

Chris Greening


Published

> Image

atomic14

A collection of slightly mad projects, instructive/educational videos, and generally interesting stuff. Building projects around the Arduino and ESP32 platforms - we'll be exploring AI, Computer Vision, Audio, 3D Printing - it may get a bit eclectic...

View All Posts