Python peephole optimizations

The peephole optimizations performed by python is in the file Python/peephole.c in line number 300.

It has a giant for loop which goes through every opcode and performs basic optimization. I would urge you to look into the optimizations performed for different opcodes.

For peephole optimizations in python look the library: http://bytecode.readthedocs.io/en/latest/peephole.html

Python sockets

This post contains a brief about how sockets are implemented in python. This is OS specific hence I will only be covering for UNIX based systems. Windows based systems I would urge to avoid due to the gory details. The main module is located in the file Modules/socketmodule.h. Observe how the appropriate files are included into the system depending on the operating system.

typedef struct {
PyObject_HEAD
SOCKET_T sock_fd; /* Socket file descriptor */
int sock_family; /* Address family, e.g., AF_INET */
int sock_type; /* Socket type, e.g., SOCK_STREAM */
int sock_proto; /* Protocol type, usually 0 */
PyObject *(*errorhandler)(void); /* Error handler; checks
errno, returns NULL and
sets a Python exception */
double sock_timeout; /* Operation timeout in seconds;
0.0 means non-blocking */
PyObject *weakreflist;
} PySocketSockObject;

Is the main socket object. Insert a breakpoint in the file Modules/socketmodule.c on line number 806. Create a program to create sockets and observe how the sockets operate.

Python threads

In this post let us dwelve deeper into how python threads are implemented in linux. Open the file Python/thread.c and observe how different headers are included based on the implementation of the thread library in the OS. On linux we use pthreads. So open the file Python/thread_pthread.h.

Insert a breakpoint on line number 172.

Open the debugger and insert the following statements:

def threadFunc():

print “hello”

import thread

thread.start_new_thread ( threadFunc, () )

See how the thread is initialized and created and spawned to execute the function.

Python arenas

Python arenas are used for allocation of objects less than size 512bytes. They are very efficient at memory management for the python VM.

Let us look at the structures for the arenas

typedef struct _block {
/* Total number of bytes owned by this block available to pass out.
* Read-only after initialization. The first such byte starts at
* ab_mem.
*/
size_t ab_size;

/* Total number of bytes already passed out. The next byte available
* to pass out starts at ab_mem + ab_offset.
*/
size_t ab_offset;

/* An arena maintains a singly-linked, NULL-terminated list of
* all blocks owned by the arena. These are linked via the
* ab_next member.
*/
struct _block *ab_next;

/* Pointer to the first allocatable byte owned by this block. Read-
* only after initialization.
*/
void *ab_mem;
} block;

/* The arena manages two kinds of memory, blocks of raw memory
and a list of PyObject* pointers. PyObjects are decrefed
when the arena is freed.
*/

struct _arena {
/* Pointer to the first block allocated for the arena, never NULL.
It is used only to find the first block when the arena is
being freed.
*/
block *a_head;

/* Pointer to the block currently used for allocation. It’s
ab_next field should be NULL. If it is not-null after a
call to block_alloc(), it means a new block has been allocated
and a_cur should be reset to point it.
*/
block *a_cur;

/* A Python list object containing references to all the PyObject
pointers associated with this area. They will be DECREFed
when the arena is freed.
*/
PyObject *a_objects;

#if defined(Py_DEBUG)
/* Debug output */
size_t total_allocs;
size_t total_size;
size_t total_blocks;
size_t total_block_size;
size_t total_big_blocks;
#endif
};

More about how arenas work is defined here :http://www.evanjones.ca/memoryallocator/.

The code implementation is in file obmalloc.c in line number 792. I would urge you to debug further and understand how arenas are used in memory management.

The PyFileObject

The PyFileObject is used in file management of python. The structure is defined in the file Include/fileobject.h

typedef struct {
PyObject_HEAD
FILE *f_fp;
PyObject *f_name;
PyObject *f_mode;
int (*f_close)(FILE *);
int f_softspace; /* Flag used by ‘print’ command */
int f_binary; /* Flag which indicates whether the file is
open in binary (1) or text (0) mode */
char* f_buf; /* Allocated readahead buffer */
char* f_bufend; /* Points after last occupied position */
char* f_bufptr; /* Current buffer position */
char *f_setbuf; /* Buffer for setbuf(3) and setvbuf(3) */
int f_univ_newline; /* Handle any newline convention */
int f_newlinetypes; /* Types of newlines seen */
int f_skipnextlf; /* Skip next \n */
PyObject *f_encoding;
PyObject *f_errors;
PyObject *weakreflist; /* List of weak references */
int unlocked_count; /* Num. currently running sections of code
using f_fp with the GIL released. */
int readable;
int writable;
} PyFileObject;

We observe that this is a wrapper to a plain C FILE object and has three buffers to process the data.

Open the file Objects/fileobject.c and insert a breakpoint on line number 322 and in the debug console type the expression:

a = open(‘test.py’)

You see that the debug point is trapped. Observe how the file is created and the pointers are set for operations.

Memory Management

I shall not completely dwelve into all aspects of memory management of python but will explain how memory is allocated, reference count is tracked and object is deallocated by the python.

Open the file Objects/listobject.c and observe line number 142, there is a call to the macro PyObject_GC_New. Let us observe the code for it:

#define PyObject_GC_New(type, typeobj) \
( (type *) _PyObject_GC_New(typeobj) )

It calls _PyObject_GC_New function which internally calls _PyObject_GC_Malloc

PyObject *
_PyObject_GC_New(PyTypeObject *tp)
{
PyObject *op = _PyObject_GC_Malloc(_PyObject_SIZE(tp));
if (op != NULL)
op = PyObject_INIT(op, tp);
return op;
}

PyObject *
_PyObject_GC_Malloc(size_t basicsize)
{
PyObject *op;
PyGC_Head *g;
if (basicsize > PY_SSIZE_T_MAX – sizeof(PyGC_Head))
return PyErr_NoMemory();
g = (PyGC_Head *)PyObject_MALLOC(
sizeof(PyGC_Head) + basicsize);
if (g == NULL)
return PyErr_NoMemory();
g->gc.gc_refs = GC_UNTRACKED;
generations[0].count++; /* number of allocated GC objects */
if (generations[0].count > generations[0].threshold &&
enabled &&
generations[0].threshold &&
!collecting &&
!PyErr_Occurred()) {
collecting = 1;
collect_generations();
collecting = 0;
}
op = FROM_GC(g);
return op;
}

which calls PyObject_MALLOC which uses Python Arenas for allocation for small objects else uses malloc to create larger objects.  Python Arenas will be explained in the coming section.

Let us observe how the reference counting mechanisms work.

There are basically two macros:

#define Py_INCREF(op) ( \
_Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA \
((PyObject*)(op))->ob_refcnt++)

#define Py_DECREF(op) \
do { \
if (_Py_DEC_REFTOTAL _Py_REF_DEBUG_COMMA \
–((PyObject*)(op))->ob_refcnt != 0) \
_Py_CHECK_REFCNT(op) \
else \
_Py_Dealloc((PyObject *)(op)); \
} while (0)

The first macro is called when a new reference is created to the object and second one when a reference is removed for example using del.

We observe that the object is automatically deallocated using _Py_Dealloc when the reference count becomes 0. I would suggest you to debug the macro once.

When the object is unable to deallocate due to cyclic references the cyclic garbage collector is called which is located in file Module/gcmodule.c in line number 872. I would urge you to debug and understand how it works.

The PyGenObject

The PyGenObject is the structure that wraps the Python generator.  The declaration is in the file Include/genobject.h.

typedef struct {
PyObject_HEAD
/* The gi_ prefix is intended to remind of generator-iterator. */

/* Note: gi_frame can be NULL if the generator is “finished” */
struct _frame *gi_frame;

/* True if generator is being executed. */
int gi_running;

/* The code object backing the generator */
PyObject *gi_code;

/* List of weak reference. */
PyObject *gi_weakreflist;
} PyGenObject;

You can observe the generator object contains a reference to the frame to which the function must enter after yielding.

Open the file Objects/genobject.c and insert a breakpoint in line number 385.

Enter the function:

>>>def gen(a):
while a < 10:
a += 1
yield

>>> a = gen(5)

You see that the breakpoint is trapped. Observe how the generator is created.

>> a.next()

And insert a breakpoint into line number 283. There is call to function gen_send_ex. Insert a breakpoint into line number 85, which is a call to the interpreter loop for the current frame and instruction pointer.

The PyFrameObject

The python frame object contains an entry in the execution stack and contains the wrapper to the code being executed and an instruction pointer. The declaration is in the file Include/frameobject.h.

typedef struct _frame {
PyObject_VAR_HEAD
struct _frame *f_back; /* previous frame, or NULL */
PyCodeObject *f_code; /* code segment */
PyObject *f_builtins; /* builtin symbol table (PyDictObject) */
PyObject *f_globals; /* global symbol table (PyDictObject) */
PyObject *f_locals; /* local symbol table (any mapping) */
PyObject **f_valuestack; /* points after the last local */
/* Next free slot in f_valuestack. Frame creation sets to f_valuestack.
Frame evaluation usually NULLs it, but a frame that yields sets it
to the current stack top. */
PyObject **f_stacktop;
PyObject *f_trace; /* Trace function */

/* If an exception is raised in this frame, the next three are used to
* record the exception info (if any) originally in the thread state. See
* comments before set_exc_info() — it’s not obvious.
* Invariant: if _type is NULL, then so are _value and _traceback.
* Desired invariant: all three are NULL, or all three are non-NULL. That
* one isn’t currently true, but “should be”.
*/
PyObject *f_exc_type, *f_exc_value, *f_exc_traceback;

PyThreadState *f_tstate;
int f_lasti; /* Last instruction if called */
/* Call PyFrame_GetLineNumber() instead of reading this field
directly. As of 2.3 f_lineno is only valid when tracing is
active (i.e. when f_trace is set). At other times we use
PyCode_Addr2Line to calculate the line from the current
bytecode index. */
int f_lineno; /* Current line number */
int f_iblock; /* index in f_blockstack */
PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
PyObject *f_localsplus[1]; /* locals+stack, dynamically sized */
} PyFrameObject;

To understand more about the frame object debug the interpreter loop to see how opcodes are fetched from the frame object and further executed.

The PyCodeObject

The PyCodeObject represents the bytecode that is executed by the interpreter. We have seen this in the function PyEval_EvalFrameEx which contains the frame object which internally contains the code to be executed. The structure is defined in the file Include/code.h.

typedef struct {
PyObject_HEAD
int co_argcount; /* #arguments, except *args */
int co_nlocals; /* #local variables */
int co_stacksize; /* #entries needed for evaluation stack */
int co_flags; /* CO_…, see below */
PyObject *co_code; /* instruction opcodes */
PyObject *co_consts; /* list (constants used) */
PyObject *co_names; /* list of strings (names used) */
PyObject *co_varnames; /* tuple of strings (local variable names) */
PyObject *co_freevars; /* tuple of strings (free variable names) */
PyObject *co_cellvars; /* tuple of strings (cell variable names) */
/* The rest doesn’t count for hash/cmp */
PyObject *co_filename; /* string (where it was loaded from) */
PyObject *co_name; /* string (name, for reference) */
int co_firstlineno; /* first source line number */
PyObject *co_lnotab; /* string (encoding addr<->lineno mapping) See
Objects/lnotab_notes.txt for details. */
void *co_zombieframe; /* for optimization only (see frameobject.c) */
PyObject *co_weakreflist; /* to support weakrefs to code objects */
} PyCodeObject;

The most important entry is the co_code entry which contains the python bytecode.

Open the file named codeobject.c and insert a breakpoint in line number 101. Open the debug console and type any expression:

a = 100

We see that a new corresponding CodeObject is created. I would further urge you to debug and understand more about this object.

The PyFunctionObject

The PyFunctionObject refers to a method created without the scope of a class. The structure representing the function is declared in the file Include/funcobject.h

typedef struct {
PyObject_HEAD
PyObject *func_code; /* A code object */
PyObject *func_globals; /* A dictionary (other mappings won’t do) */
PyObject *func_defaults; /* NULL or a tuple */
PyObject *func_closure; /* NULL or a tuple of cell objects */
PyObject *func_doc; /* The __doc__ attribute, can be anything */
PyObject *func_name; /* The __name__ attribute, a string object */
PyObject *func_dict; /* The __dict__ attribute, a dict or NULL */
PyObject *func_weakreflist; /* List of weak references */
PyObject *func_module; /* The __module__ attribute, can be anything */

/* Invariant:
* func_closure contains the bindings for func_code->co_freevars, so
* PyTuple_Size(func_closure) == PyCode_GetNumFree(func_code)
* (func_closure may be NULL if PyCode_GetNumFree(func_code) == 0).
*/
} PyFunctionObject;

The structure is self explanatory and I will not dwelve into further details. Also insert breakpoints in the file funcobject.c and observe the lifecycle of functions. Insert a comment below if you need help.