The rendering pipeline

This document is an attempt to describe how prompt_toolkit applications are rendered. It’s a complex but logical process that happens more or less after every key stroke. We’ll go through all the steps from the point where the user hits a key, until the character appears on the screen.

Waiting for user input

Most of the time when a prompt_toolkit application is running, it is idle. It’s sitting in the event loop, waiting for some I/O to happen. The most important kind of I/O we’re waiting for is user input. So, within the event loop, we have one file descriptor that represents the input device from where we receive key presses. The details are a little different between operating systems, but it comes down to a selector (like select or epoll) which waits for one or more file descriptor. The event loop is then responsible for calling the appropriate feedback when one of the file descriptors becomes ready.

It is like that when the user presses a key: the input device becomes ready for reading, and the appropriate callback is called. This is the read_from_input function somewhere in It will read the input from the Input object, by calling read_keys().

Reading the user input

The actual reading is also operating system dependent. For instance, on a Linux machine with a vt100 terminal, we read the input from the pseudo terminal device, by calling This however returns a sequence of bytes. There are two difficulties:

  • The input could be UTF-8 encoded, and there is always the possibility that we receive only a portion of a multi-byte character.

  • vt100 key presses consist of multiple characters. For instance the “left arrow” would generate something like \x1b[D. It could be that when we read this input stream, that at some point we only get the first part of such a key press, and we have to wait for the rest to arrive.

Both problems are implemented using state machines.

  • The UTF-8 problem is solved using codecs.getincrementaldecoder, which is an object in which we can feed the incoming bytes, and it will only return the complete UTF-8 characters that we have so far. The rest is buffered for the next read operation.

  • Vt100 parsing is solved by the Vt100Parser state machine. The state machine itself is implemented using a generator. We feed the incoming characters to the generator, and it will call the appropriate callback for key presses once they arrive. One thing here to keep in mind is that the characters for some key presses are a prefix of other key presses, like for instance, escape (\x1b) is a prefix of the left arrow key (\x1b[D). So for those, we don’t know what key is pressed until more data arrives or when the input is flushed because of a timeout.

For Windows systems, it’s a little different. Here we use Win32 syscalls for reading the console input.

Processing the key presses

The Key objects that we receive are then passed to the KeyProcessor for matching against the currently registered and active key bindings.

This is another state machine, because key bindings are linked to a sequence of key presses. We cannot call the handler until all of these key presses arrive and until we’re sure that this combination is not a prefix of another combination. For instance, sometimes people bind jj (a double j key press) to esc in Vi mode. This is convenient, but we want to make sure that pressing j once only, followed by a different key will still insert the j character as usual.

Now, there are hundreds of key bindings in prompt_toolkit (in ptpython, right now we have 585 bindings). This is mainly caused by the way that Vi key bindings are generated. In order to make this efficient, we keep a cache of handlers which match certain sequences of keys.

Of course, key bindings also have filters attached for enabling/disabling them. So, if at some point, we get a list of handlers from that cache, we still have to discard the inactive bindings. Luckily, many bindings share exactly the same filter, and we have to check every filter only once.

Read more about key bindings …

The key handlers

Once a key sequence is matched, the handler is called. This can do things like text manipulation, changing the focus or anything else.

After the handler is called, the user interface is invalidated and rendered again.

Rendering the user interface

The rendering is pretty complex for several reasons:

  • We have to compute the dimensions of all user interface elements. Sometimes they are given, but sometimes this requires calculating the size of UIControl objects.

  • It needs to be very efficient, because it’s something that happens on every single key stroke.

  • We should output as little as possible on stdout in order to reduce latency on slow network connections and older terminals.

Calculating the total UI height

Unless the application is a full screen application, we have to know how much vertical space is going to be consumed. The total available width is given, but the vertical space is more dynamic. We do this by asking the root Container object to calculate its preferred height. If this is a VSplit or HSplit then this involves recursively querying the child objects for their preferred widths and heights and either summing it up, or taking maximum values depending on the actual layout. In the end, we get the preferred height, for which we make sure it’s at least the distance from the cursor position to the bottom of the screen.

Painting to the screen

Then we create a Screen object. This is like a canvas on which user controls can paint their content. The write_to_screen() method of the root Container is called with the screen dimensions. This will call recursively write_to_screen() methods of nested child containers, each time passing smaller dimensions while we traverse what is a tree of Container objects.

The most inner containers are Window objects, they will do the actual painting of the UIControl to the screen. This involves line wrapping the UIControl’s text and maybe scrolling the content horizontally or vertically.

Rendering to stdout

Finally, when we have painted the screen, this needs to be rendered to stdout. This is done by taking the difference of the previously rendered screen and the new one. The algorithm that we have is heavily optimized to compute this difference as quickly as possible, and call the appropriate output functions of the Output back-end. At the end, it will position the cursor in the right place.