Python internal data structures

July 18, 2018 | 4 Minute Read

Python has different implementations, CPython and Python-JIT are popular ones.

  1. List— Lists are implemented as C like arrays in consecutive memory blocks.
    • .append(x): O(1) complexity, since appending at the end of list contents.
    • .insert(index, val): O(n) complexity, append at the specific index and then SHIFT everything after it by 1 index.
    • .pop(): O(1) complexity, pops the value of last item in the list.
    • .pop(index) or, remove(value) : O(n) complexity, delete the value and shift other items up.
    • List growth optimization : Python lists initialized with no data then it grows in a particular order using list_resize(). To reduce total number of list_resize () calls which allocates new memory the growth happens in the following order 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, …
    • Detailed reference

  2. Dictionaries—Implmented using hash tables. Steps in dictionary appending
    • hash(value): Find the hash of the value using any hash function such as %52 for strings represented as their ascii value.
    • hash(key) & 7 : finds an address <=7. Such as if hash(a)=1; hash(a)&7 is 1; So list[1]=value.
    • Collision: When a address has the value. Use a collision resolve method. Standard is open addressing. It could be a lined list, or better a BST on that particular value.
    • Growing: When the total entries are over 2/3 of the array’s capacity. dict_resize() is called to allocate a larger array. This function also copies the old table entries to the new table. To reduce the number of resize steps and increase sparseness it is . The new table size needs to be greater than 24 and it is calculated by shifting the current size 1 bit left until it is greater than 24. It ends up being 32 e.g. 8 -> 16 -> 32. Read more here. TO summarize a new table of size 32 is allocated. Old table entries are inserted into the new table using the new mask value which is 31.

  3. Tuples—Tuples
  4. Sets—Sets



List object C structure—

typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

An apple guy once asked me what are two fundamental data structures? The answer he was looking for is stack and queuue.

Python Data Structures

  1. Python data structures include lists, dictionaries, tuples, and sets.
  2. Python lists are implemented internally as variable-length arrays, rather than linked lists.
  3. Python dictionaries are implemented internally as resizable hash tables.
  4. Python tuples are records.
  5. Python sets are implemented internally as dictionaries (hash tables) with keys and no values.
  6. The data types for a list, dictionary, tuple, and set, are ‘list’, ‘dict’, ‘tuple’, and ‘set’, respectively.
  7. The list(), dict(), tuple(), and set() functions may be used to create list, dictionary, tuple, and set objects, respectively.
  8. List and dictionary items are mutable. Tuple and set items are immutable.
  9. Lists and tuples maintain order. Dictionaries and sets are unordered.
  10. Lists and tuples allow duplication. Dictionary and set items are unique.
  11. Sets may be used to remove duplication found in other data structures.
  12. List, dictionary, tuple, and set items may be accessed using a for loop.
  13. List and tuple items may be accessed by index. Dictionary items are accessed by key. Set items cannot be accessed by index.
  14. Python supports having a tuple on the left side of an assignment statement. This allows you to assign more than one variable at a time.
  15. Sets support logical set operations, including membership, subset, superset, union, intersection, and difference.
  16. Reference: wikiversity Python data structure summary.

Data Structure Concepts

  1. A data structure is a particular way of organizing data in a computer so that it can be used efficiently.
  2. Data structures can be used to organize the storage and retrieval of information stored in both main memory and secondary memory.
  3. The array and record data structures are based on computing the addresses of data items with arithmetic operations; while the linked data structures are based on storing addresses of data items within the structure itself.
  4. An array is a number of elements in a specific order, typically all of the same type. Elements are accessed using an integer index to specify which element is required. Arrays may be fixed-length or resizable.
  5. A linked list is a linear collection of data elements of any type, called nodes, where each node has itself a value, and points to the next node in the linked list.
  6. A record (also called tuple or struct) is an aggregate data structure. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually called fields or members.
  7. A class is a data structure that contains data fields, like a record, as well as various methods which operate on the contents of the record.


My refernces—
1. https://en.wikiversity.org/wiki/Python_Programming/Tuples_and_Sets 2. https://www.laurentluce.com/posts/python-list-implementation/ 1. https://en.wikiversity.org/wiki/Python_Programming/Tuples_and_Sets a wikiversity