Python internal data structures
Python has different implementations, CPython and Python-JIT are popular ones.
- List— Lists are implemented as C like arrays in consecutive memory blocks.
- .append(x): O(1) complexity, since appending at the end of list contents.
- .insert(index, val): O(n) complexity, append at the specific index and then SHIFT everything after it by 1 index.
- .pop(): O(1) complexity, pops the value of last item in the list.
- .pop(index) or, remove(value) : O(n) complexity, delete the value and shift other items up.
- List growth optimization : Python lists initialized with no data then it grows in a particular order using list_resize(). To reduce total number of list_resize () calls which allocates new memory the growth happens in the following order 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, …
- Detailed reference
- Dictionaries—Implmented using hash tables. Steps in dictionary appending
- hash(value): Find the hash of the value using any hash function such as %52 for strings represented as their ascii value.
- hash(key) & 7 : finds an address <=7. Such as if hash(a)=1; hash(a)&7 is 1; So list[1]=value.
- Collision: When a address has the value. Use a collision resolve method. Standard is open addressing. It could be a lined list, or better a BST on that particular value.
- Growing: When the total entries are over 2/3 of the array’s capacity. dict_resize() is called to allocate a larger array. This function also copies the old table entries to the new table. To reduce the number of resize steps and increase sparseness it is . The new table size needs to be greater than 24 and it is calculated by shifting the current size 1 bit left until it is greater than 24. It ends up being 32 e.g. 8 -> 16 -> 32. Read more here. TO summarize a new table of size 32 is allocated. Old table entries are inserted into the new table using the new mask value which is 31.
- Tuples—Tuples
- Sets—Sets
List object C structure—
An apple guy once asked me what are two fundamental data structures? The answer he was looking for is stack and queuue.
Python Data Structures
- Python data structures include lists, dictionaries, tuples, and sets.
- Python lists are implemented internally as variable-length arrays, rather than linked lists.
- Python dictionaries are implemented internally as resizable hash tables.
- Python tuples are records.
- Python sets are implemented internally as dictionaries (hash tables) with keys and no values.
- The data types for a list, dictionary, tuple, and set, are ‘list’, ‘dict’, ‘tuple’, and ‘set’, respectively.
- The list(), dict(), tuple(), and set() functions may be used to create list, dictionary, tuple, and set objects, respectively.
- List and dictionary items are mutable. Tuple and set items are immutable.
- Lists and tuples maintain order. Dictionaries and sets are unordered.
- Lists and tuples allow duplication. Dictionary and set items are unique.
- Sets may be used to remove duplication found in other data structures.
- List, dictionary, tuple, and set items may be accessed using a for loop.
- List and tuple items may be accessed by index. Dictionary items are accessed by key. Set items cannot be accessed by index.
- Python supports having a tuple on the left side of an assignment statement. This allows you to assign more than one variable at a time.
- Sets support logical set operations, including membership, subset, superset, union, intersection, and difference.
- Reference: wikiversity Python data structure summary.
Data Structure Concepts
- A data structure is a particular way of organizing data in a computer so that it can be used efficiently.
- Data structures can be used to organize the storage and retrieval of information stored in both main memory and secondary memory.
- The array and record data structures are based on computing the addresses of data items with arithmetic operations; while the linked data structures are based on storing addresses of data items within the structure itself.
- An array is a number of elements in a specific order, typically all of the same type. Elements are accessed using an integer index to specify which element is required. Arrays may be fixed-length or resizable.
- A linked list is a linear collection of data elements of any type, called nodes, where each node has itself a value, and points to the next node in the linked list.
- A record (also called tuple or struct) is an aggregate data structure. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually called fields or members.
- A class is a data structure that contains data fields, like a record, as well as various methods which operate on the contents of the record.
My refernces—
1. https://en.wikiversity.org/wiki/Python_Programming/Tuples_and_Sets
2. https://www.laurentluce.com/posts/python-list-implementation/
1. https://en.wikiversity.org/wiki/Python_Programming/Tuples_and_Sets a wikiversity