Python — PDB usage and reproducing program execution | Some Thoughts on AI, LLMs and Tech

Python — PDB usage and reproducing program execution

So imagine you have a Python program, and you want to inspect some parameters during an error.

There are many, possible, ways to do that; I’d like to speak about a basic one, which involves debugger. Just like GDB for C/C++, Python has PDB.

PDB is command line debugger, which can be attached to process or started from within the process.

Just add the lines import pdb; pdb.set_trace() and you will have a shell where you can communicate with the process.

Needless to say, this is good only for CLI programs. Others, like servers, should have other solutions (Rookout etc.., PyCharm remote debugger etc..).

Let’s say we run a program, which calls some_erroneous_function and we want to know some value from this function. main() -> foo() -> some_erroneous_function()

how can we know the value inside some_erroneous_function()? simple - add next line:

import pdb; pdb.set_trace()

Can’t see value of a vs being able to see:

Can’t see value of aDo manage to see value of a

What happens when program A runs program B?

When we have main() -> bar() -> cli_app_bar.py -> some_erroneous_function(),

the import pdb; pdb.set_trace() trick simply doesn’t work; We get a stuck process instead. This is because the pdb opens in the child process, however the parent process is waiting for the child process the finish and we’re stuck.

In this case, we should run child process ourselves.

Which parts are required to run a child process ourself?

So there are 2 parts which are required; one is obvious, the other part is often forgotten!! 2 parts are:

program name + command like arguments
Environment variables!!
(there’s a 3rd part which is IPC messages, but it’s very hard to mimic such behavior…)

Let’s see how do we capture this:

Modify program to save CLI arguments and env vars
Run using CLI and env vars

getting cmd + env vars

Several methods; getting env vars for a running process you could use cat /proc/46/environ

tr ‘\0’ ‘\n’ (replace 46 with process id)

From within Python process, we want to print env vars in “ready to go” format, eg with the export prefix:

with open('/tmp/params.txt', 'w') as fout:
    # print all env vars
    for k, v in os.environ.items():
        fout.write('export "%s"="%s"\n' % (k,v))

And then diff with current env vars:

echo "creating bar before"
cat <<EOF > create_before.py
#!/usr/bin/python3
import os
with open('/tmp/params.before.txt', 'w') as fout:
    for k, v in os.environ.items():
        fout.write('export "%s"="%s"\n' % (k,v))
EOF

python create_before.py

echo "print some stats"
wc -l /tmp/params.txt /tmp/params.before.txt

echo "get keys"
cat /tmp/params.txt | awk -F '=' ' { print $1 } ' | sort > /tmp/params.keys.txt
cat /tmp/params.before.txt | awk -F '=' ' { print $1 } ' | sort > /tmp/params.before.keys.txt
wc -l /tmp/params.keys.txt /tmp/params.before.keys.txt
diff /tmp/params.keys.txt /tmp/params.before.keys.txtecho "creating bar before" cat << EOF > create_before.py #!/usr/bin/python3 import os with open('/tmp/params.before.txt', 'w') as fout: for k, v in os.environ.items(): fout.write('export "%s"="%s" \n ' % (k,v)) EOF python create_before.py echo "print some stats" wc -l /tmp/params.txt /tmp/params.before.txt echo "get keys" cat /tmp/params.txt | awk -F '=' ' { print $1 } ' | sort > /tmp/params.keys.txt cat /tmp/params.before.txt | awk -F '=' ' { print $1 } ' | sort > /tmp/params.before.keys.txt wc -l /tmp/params.keys.txt /tmp/params.before.keys.txt diff /tmp/params.keys.txt /tmp/params.before.keys.txt

and we got the newly added env var key, EXTRA:

YuvShell $ diff /tmp/params.keys.txt /tmp/params.before.keys.txt
1d0
< export "EXTRA"YuvShell $ diff /tmp/params.keys.txt /tmp/params.before.keys.txt 1d0 < export "EXTRA"

Questions

**Q: What is the “YuvShell”?? ** A: It’s just me editing the ~/.bashrc and changing the PS1 (Prompt String) var;

~/.bashrc to change shell prefix

Q: What is the different betweencat some_file.txt | wc -l and**wc -l some_file.txt? ** A: with cat + wc we use a pipe to transfer data from the cat output to the wc input; with wc only, we don’t use the pipe.

Let’s create some big file from urandom, and see time output of both options:

cat /dev/urandom | base64 | head -c 1GB > /tmp/random_1GB_file.txt

time cat /tmp/random_1GB_file.txt | wc -l
time wc -l /tmp/random_1GB_file.txtcat /dev/urandom | base64 | head -c 1GB > /tmp/random_1GB_file.txt time cat /tmp/random_1GB_file.txt | wc -l time wc -l /tmp/random_1GB_file.txt

Source Code

#!/usr/bin/python3
import argparse
import subprocess
import os
from err_module import some_erroneous_function

def foo():
    some_erroneous_function()

def bar():
    my_env = os.environ.copy()
    my_env["EXTRA"] = 'True'
    cmd = "python cli_app.py --run"
    cmds = cmd.split(' ')

    stdout, stderr = subprocess.Popen(cmds, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, env=my_env).communicate()
    print("stdout [%s], stderr [%s]" % (stdout, stderr))

def do_main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--foo', dest='foo', default=False, action='store_true')
    parser.add_argument('--bar', dest='bar', default=False, action='store_true')
    args = parser.parse_args()

    if args.foo:
        foo()
    if args.bar:
        bar()

if __name__ == '__main__':
    do_main()


#!/usr/bin/python3
import random

def some_erroneous_function():
    print("some_erroneous_function:: enter")
    a = random.randint(1,100)
    import pdb; pdb.set_trace()     # adding PDB to open debugger
    raise Exception("some error")
    print("value of a: %d" % (a))


#!/usr/bin/python3
import sys, os
from err_module import some_erroneous_function

# code to save program execution parameters
with open('/tmp/params.txt', 'w') as fout:
    # print all env vars
    for k, v in os.environ.items():
        fout.write('export "%s"="%s"\n' % (k, v))

if __name__ == '__main__':
    if sys.argv[-1] == '--run' and os.environ.get('EXTRA') == 'True':
        some_erroneous_function()

Originally published at https://dev.to on November 11, 2022.