Friday, February 2, 2007

Mastering Linux debugging techniques

This article presents four scenarios for debugging Linux programs. For Scenario 1, we use two sample programs with memory allocation problems that we debug using the MEMWATCH and Yet Another Malloc Debugger (YAMD) tools. Scenario 2 uses the strace utility in Linux that enables the tracing of system calls and signals to identify where a program is failing. Scenario 3 uses the Oops functionality of the Linux kernel to solve a segmentation fault problem and shows you how to set up the kernel source level debugger (kgdb) to solve the same problem using the GNU debugger (gdb); the kgdb program is the Linux kernel remote gdb via serial connection. Scenario 4 displays information for the component that is causing a hang by using a magic key sequence available on Linux.

General debugging strategies

When your program contains a bug, it is likely that somewhere in the code, a condition that you believe to be true is actually false. Finding your bug is a process of confirming what you believe is true until you find something that is false.

The following are examples of the types of things you may believe to be true:

  • At a certain point in the source code, a variable has a certain value.
  • At a given point, a structure has been set up correctly.
  • At a given if-then-else statement, the if part is the path that was executed.
  • When the subroutine is called, the routine receives its parameters correctly.

Finding the bug involves confirming all of these things. If you believe that a certain variable should have a specific value when a subroutine is called, check it. If you believe that an if construct is executed, check it. Usually you will confirm your assumptions, but eventually you will find a case where your belief is wrong. As a result, you will know the location of the bug.

Debugging is something that you cannot avoid. There are many ways to go about debugging, such as printing out messages to the screen, using a debugger, or just thinking about the program execution and making an educated guess about the problem.

Before you can fix a bug, you must locate its source. For example, with segmentation faults, you need to know on which line of code the seg fault occurred. Once you find the line of code in question, determine the value of the variables in that method, how the method was called, and specifically why the error occurred. Using a debugger makes finding all of this information simple. If a debugger is not available, there are other tools to use. (Note that a debugger may not be available in a production environment, and the Linux kernel does not have a debugger built in.)

This article looks at a class of problems that can be difficult to find by visually inspecting code, and these problems may occur only under rare circumstances. Often, a memory error occurs only in a combination of circumstances, and sometimes you can discover memory bugs only after you deploy your program.

Full Article @ IBM