Python @ DjangoSpin

50+ Tips & Tricks for Python Developers

Buffer this pageShare on FacebookPrint this pageTweet about this on TwitterShare on Google+Share on LinkedInShare on StumbleUpon
Reading Time: 25 minutes

Page #4


When to use repr() & str() functions

The builtin repr() function returns a string containing the printable representation of an object. Each Python object, be it lists, sets, tuples etc. has a magic method __repr__ (Learn more about magic methods here) which is called implicitly when repr() is called on them. Let's look at a few examples.

>>> repr('string')
"'string'"
>>> repr(4)
'4'
>>> repr( set( [1, 2, 'three', 'four'] ) )
"{1, 2, 'four', 'three'}"
>>> repr( [1, 2, 'three', 'four'] )
"[1, 2, 'three', 'four']"
>>> repr( (1, 2, 'three', 'four') )
"(1, 2, 'three', 'four')"
>>> 
>>> 
>>> 
>>> str('string')
'string'
>>> str(4)
'4'
>>> str( set( [1, 2, 'three', 'four'] ) )
"{1, 2, 'four', 'three'}"
>>> str( [1, 2, 'three', 'four'] )
"[1, 2, 'three', 'four']"
>>> str( (1, 2, 'three', 'four') )
"(1, 2, 'three', 'four')"

There is a little difference between the outputs of the repr() and str() with same inputs, as seen in examples above. The str(object) calls the __str__ magic method of the object, if defined. The str(object) returns the "informal" or nicely printable string representation of the object. If the object does not have a __str__ method, then str(object) returns the string returned by repr(object) .

One important thing to note about the repr() function is that it keeps the escape sequences intact, and does not interpret them, like we saw in the BufferedReader code. This behavior is in contrast to the str() function.

In a nutshell, the __repr__() method of an object is defined to make it unambiguous, whereas the __str__() method is defined to make the object readable.


Naming slices for cleaner code

The constructor of the builtin slice class creates a slice object, which can be used in places where a slice is normally employed. It is a better alternative to hardcoded slices, especially when they begin to create readability and maintenance issues.

>>> listOfNumbers = [0, 1, 2, 3, 4, 5, 6,7, 8, 9]
>>> TWOtoFOUR = slice(2, 5)									# Slices include elements from starting index to ending index - 1.
>>> TWOtoFOUR
slice(2, 5, None)											# The step size is 0, hence the None.

>>> listOfNumbers[TWOtoFOUR]
[2, 3, 4]
>>> listOfNumbers[2:5]
[2, 3, 4]

>>> listOfNumbers[TWOtoFOUR] = [12, 13, 14]
>>> listOfNumbers
[0, 1, 12, 13, 14, 5, 6, 7, 8, 9]

The slice objects provide three read-only attributes to access the indices individually. These are start, stop & step.

>>> twoToFourWithStepOne = slice(2, 5, 1)
>>> twoToFourWithStepOne.start
2
>>> twoToFourWithStepOne.stop
5
>>> twoToFourWithStepOne.step
1

In addition to these attributes, it also provides a method called indices(). The indices() method is used to map a slice onto a sequence of a specific size. It takes length of the sequence as input and returns a tuple(start, stop, step) in such a manner that out of bounds indices are clipped to fit within bounds.

>>> x = slice(2, 25, 3)
>>> seq = 'Hi, my name is Ethan.'
>>> x.indices(len(seq))
(2, 21, 3)

Call functions upon termination of program

The builtin module atexit helps to call functions upon normal termination of Python scripts. This is achieved using its register() function.

# Contents of testAtExit.py
import atexit

def bye():
    print("Goodbye...")

atexit.register(bye)

# Command prompt / Terminal
$ python path_to_testAtExit.py
Goodbye...

Python allows to register more than one functions, and to pass arguments to these functions.

# Contents of testAtExit.py
import atexit

def bye(name):
    print("Goodbye {}...".format(name))

atexit.register(bye, 'Ethan')

# Command prompt / Terminal
$ python path_to_testAtExit.py
Goodbye Ethan...

Note that registered functions are not called in the following events:

  1. When program is exited using os._exit(). The registered functions are called when the program is exited using sys.exit().
  2. When the program is killed with a signal. This can be done using the subprocess module and kill() function of the os module.

Arrays in Python

An array is a data structure which stores homogeneous elements i.e. of the same type. This is in contrast to lists in Python, which can store elements of different types. While lists are a native data structure in Python, arrays are not. Here's a primitive example of implementing arrays in Python.

>>> import array
>>> myArray = array.array('i', [1, 2, 3, 4, 5, 6, 7, 8])

The constructor of class 'array' takes 1 positional argument i.e. typecode & 1 optional keyword i.e. initializer. The typecode is a single character string which tells Python about the data type of elements consisting the array. Listed below are the various typecodes and the type of data they represent. The initializer can be either of a list, a bytes-like object or an iterable.

  TYPECODE			TYPE OF DATA IT REPRESENTS					MINIMUM SIZE OF EACH ELEMENT
	'b'					signed integer 							 		1 byte
	'B'					unsigned integer								1 byte
	'c'					character 										1 byte
	'u'					unicode character								2 bytes
	'h'					signed integer									2 bytes
	'H'					unsigned integer								2 bytes
	'i'					signed integer									2 bytes
	'I'					unsigned integer								2 bytes
	'w'					unicode character 								4 bytes
	'l'					signed integer 									4 bytes
	'L'					unsigned integer								4 bytes
	'f'					floating point									4 bytes
	'd'					floating point									8 bytes

An array object, returned by the constructor of the array class, has a multitude of functions. These functions allow the user to append elements to the array, insert elements at specific positions of the array, extend an existing array etc. To learn about these operations, visit this page.


Archiving files using zipfile & tarfile modules

The standard libraries zipfile and tarfile facilitate creating and manipulating zip & Unix tar archive files. The tarfile is not limited to Unix, it can be used in Windows also, and is capable of working with bzip2 & gzip files as well. Expand the following code snippet to see how to create archives, append to them, extract them with primitive examples.

###### Creating new archives: zipfile.ZipFile().write() & tarfile.open().add() ########
The ZipFile class of zipfile module creates a ZipFile object, which provides us with write() method to create new archive files. The open() function of the tarfile module returns a TarFile object, which provides us with add() method to create archive files.

>>> import zipfile
>>> with zipfile.ZipFile('zipFileOne.zip', mode = 'w') as zF:
	zF.write('fileOne.txt')
	zF.write('fileTwo.txt')
	
>>> import tarfile
>>> with tarfile.open('tarFileOne.tar', mode = 'w') as tF:
	tF.add('fileOne.txt')
	tF.add('fileTwo.txt')

	
	
###### Appending to an archive ######
In order to append to existing archive files, change the mode in which you are opening archive files to 'a', signalling append mode, rather than 'w'.

>>> with zipfile.ZipFile('zipFileOne.zip', mode = 'a') as zF:
	zF.write('fileThree.txt')
	
>>> with tarfile.open('tarFileOne.tar', mode = 'a') as tF:
	tF.add('fileThree.txt')


###### Extracting archives: zipfile.ZipFile().extract(), zipfile.ZipFile().extractall() & tarfile.open().extract(), tarfile.open().extractall() ######
The ZipFile & TarFile objects provide two methods extract() and extractall() to extract contents from archive files. The extract() extracts the mentioned file from the archive whereas extractall() extracts all contents of the archive.

>>> with zipfile.ZipFile('zipFileOne.zip') as zF:
	zF.extract('fileThree.txt')							# returns 'complete_path_to_extracted_file'
	
>>> with zipfile.ZipFile('zipFileOne.zip') as zF:
	zF.extractall()

	

>>> with tarfile.open('tarFileOne.tar') as tF:
	tF.extract('fileThree.txt')
	
>>> with tarfile.open('tarFileOne.tar') as tF:
	tF.extractall()

Using Comprehensions to create lists, sets & dictionaries

Comprehensions are an excellent Python feature to create lists, dictionaries & sets in an alternative way. For example, to create a list of numbers 0 to 9, you would normally write the following code:

>>> listOfNumbers = []
>>> for number in range(0, 10):
	listOfNumbers.append(number)

>>> listOfNumbers                       # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

To create the same list, use the expression num for num in range(0, 10) in pair of square brackets.

>>> listOfNumbers = [ number for number in range(0, 10) ]	# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Read more about Comprehensions here.


Number System Interconversion: int(), bin(), oct() & hex()

Python provides handy builtin functions bin(), hex() & oct() to convert decimal values to strings of corresponding number systems.

>>> a = 4
>>> bin(a)
'0b100'
>>> hex(a)
'0x4'
>>> oct(a)
'0o4'

To convert these strings back into decimal system, use the builtin int() function with appropriate base value.

>>> int('0b100', 2)
4
>>> int('0x4', 16)
4
>>> int('0o4', 8)
4

You can use the combination of these methods to convert one non-decimal system to another. For example, converting a binary value to its hexadecimal equivalent:

>>> hex(   int('0b100', 2)   )
'0x4'

Encoding in Python: encode() & decode()

Encoding, is the process of transforming the string into a specialized format for efficient storage or transmission. In other words, encoding is the process of transforming content into sequence of bytes, which will ideally make sense again when it is decoded with the same encoding type with which it was encoded. Character encoding is used to represent the entire list of characters that belong in an encoding system.

For example, let's talk about two encodings: ASCII and Unicode.

ASCII(American Standard Code for Information Interchange) has a total of 127 characters, which is roughly a list of all the characters that you can type using a standard keyboard. You can view the list of symbols here. Basically, it covers numbers, uppercase letters and lowercase letters and a bunch of other symbols.

Unicode covers almost every character there is. It contains over 128 thousand characters, covering 135 modern and historic scripts, as well as multiple symbol sets, as per Wikipedia. Unicode is the standard character set of Python, and is denoted by utf-8. You can read about Unicode here.

The Python part. The encode() acts on a string and produces a sequence of bytes. The decode() acts on bytes and produces the original string.

>>> "hello".encode(encoding = 'ascii')
b'hello'
>>> b'hello'.decode(encoding = 'ascii')
'hello'

Python raises a UnicodeEncodeError, when you try to encode a string using an encoding that doesn’t have one or more of the characters in the string in its character set. To read more about encoding and decoding, visit this link.


Regular Expressions (RegEx) in Python

A Regular Expression (also known as RegEx) is a sequence of characters which make up a search pattern, which attempts to match text in a longer string. Different symbols match different characters, and different languages have different interpretations for these symbols. Python’s creators included support for Regular Expressions in version 1.5, and have derived its Regular Expression Engine from Secret Labs’ Regular Expression Engine i.e. SRE, and is designed to work, more or less, the same way as they work in Perl. Using builtin module re, you can perform advanced string matching.

import re

# The following pattern looks for a pair of paragraph tags and its contents.
>>> re.search('<p>.*?</p>', '<p>Paragraph 1</p>')
<_sre.SRE_Match object; span=(0, 18), match='<p>Paragraph 1</p>'>

To learn more about regular expressions in Python, visit this link.


PyDoc: Python's Documentation System

pydoc is a command line tool, just like perldoc in Perl. It is shipped with Python installation. It provides documentation on modules by enabling the user to pass the name of the module or its object and rendering the documentation page using the docstrings of the same module. In order to run pydoc, you need to use the -m flag of Python in command prompt, which asks the command prompt to run the library module as a script.

Obtaining help from PyDoc over replace function of the str class.

$ python -m pydoc str.replace
Help on method_descriptor in str:

str.replace = replace(...)
    S.replace(old, new[, count]) -> str

    Return a copy of S with all occurrences of substring
    old replaced by new.  If the optional argument count is
    given, only the first count occurrences are replaced.

This looks rather familiar to output of the builtin function help(), doesn't it? Actually, the builtin function help() is in fact being provided by the pydoc module i.e. pydoc.help().

PyDoc even provides interactive browser documentation using an HTTP server. It even lets you generate documentation for your modules. To be able to do this, you need to know about different flags that pydoc offers in the command line. For this, check this post out.


See also: 50+ Know-How(s) Every Pythonista Must Know


Buffer this pageShare on FacebookPrint this pageTweet about this on TwitterShare on Google+Share on LinkedInShare on StumbleUpon

Leave a Reply