Page #9
- Fetching Environment Variables using the os module
- Encoding in Python: encode() & decode()
- Using a buffer while manipulating files
- Removing duplicate items from lists
- The "_" identifier in Interactive Sessions
- Advanced Text Matching with Regular Expressions
Fetching Environment Variables using the os module
The environment variables of your system can be read using the os module's environ attribute.
>>> import os >>> os.environ # an environ object, a dictionary-like object whose elements can be accessed using keys. >>> os.environ['PATH'] # value of PATH environment variable
Encoding in Python: encode() & decode()
Encoding, is the process of transforming the string into a specialized format for efficient storage or transmission. In other words, encoding is the process of transforming content into sequence of bytes, which will ideally make sense again when it is decoded with the same encoding type with which it was encoded. Character encoding is used to represent the entire list of characters that belong in an encoding system.
For example, let's talk about two encodings: ASCII and Unicode.
ASCII(American Standard Code for Information Interchange) has a total of 127 characters, which is roughly a list of all the characters that you can type using a standard keyboard. You can view the list of symbols here. Basically, it covers numbers, uppercase letters and lowercase letters and a bunch of other symbols.
Unicode covers almost every character there is. It contains over 128 thousand characters, covering 135 modern and historic scripts, as well as multiple symbol sets, as per Wikipedia. Unicode is the standard character set of Python, and is denoted by utf-8. You can read about Unicode here.
The Python part. The encode() acts on a string and produces a sequence of bytes. The decode() acts on bytes and produces the original string.
>>> "hello".encode(encoding = 'ascii') b'hello' >>> b'hello'.decode(encoding = 'ascii') 'hello'
Python raises a UnicodeEncodeError, when you try to encode a string using an encoding that doesn’t have one or more of the characters in the string in its character set. To read more about encoding and decoding, visit this link.
Using a buffer while manipulating files
A buffer stores a chunk of data from the Operating System's file stream until it is consumed, at which point more data is brought into the buffer. The reason that is good practice to use buffers is that interacting with the raw stream might have high latency i.e. considerable time is taken to fetch data from it and also to write to it. Let's take an example.
Let's say you want to read 100 characters from a file every 2 minutes over a network. Instead of trying to read from the raw file stream every 2 minutes, it is better to load a portion of the file into a buffer in memory, and then consume it when the time is right. Then, next portion of the file will be loaded in the buffer and so on.
The following code snippet reads a file containing 196 bytes, with a buffer of 20 bytes, and writes to a file, 20 bytes at a time.
# A practical example will have large-scale values of buffer and file size. buffersize = 20 # maximum number of bytes to be read in one instance inputFile = open('fileToBeReadFrom.txt', 'r') outputFile = open('fileToBeWrittenInto.txt', 'a') # opening a file in append mode; creates a file if it doesn't exist buffer = inputFile.read(buffersize) # buffer contains data till the specified cursor position # Writing the contents of a buffer another file 20 bytes at a time counter = 0 # a counter variable for us to see the instalments of 20 bytes while len(buffer): counter = counter + 1 outputFile.write(buffer) print( str(counter) + " ") buffer = inputFile.read(buffersize) # next set of 20 bytes from the input file outputFile.close() inputFile.close()
To learn more about buffers, check out this link.
Removing duplicate items from lists
A simple way to remove duplicate items from a list is to cast it to a set, and then back to a list using the constructors of builtin set and list classes.
>>> aList = [1, 2, 3, 1, 2, 3, 4, 5] >>> list( set(aList) ) [1, 2, 3, 4, 5]
The "_" identifier in Interactive Sessions
The _ identifier stores the most recently printed expression. This identifier can be used to quickly access the last computed result while working interactively. However, this identifier is only available in interactive sessions, and not in modules.
>>> a = 5 >>> _ 5 >>> sum(range(10)) 45 >>> _ 45 >>> _ + _ 90
Advanced Text Matching with Regular Expressions
A Regular Expression (also known as RegEx) is a sequence of characters which make up a search pattern, which attempts to match text in a longer string. Different symbols match different characters, and different languages have different interpretations for these symbols. Python’s creators included support for Regular Expressions in version 1.5, and have derived its Regular Expression Engine from Secret Labs’ Regular Expression Engine i.e. SRE, and is designed to work, more or less, the same way as they work in Perl. Using builtin module re, you can perform advanced string matching.
import re # The following pattern looks for a pair of paragraph tags and its contents. >>> re.search('<p>.*?</p>', '<p>Paragraph 1</p>') <_sre.SRE_Match object; span=(0, 18), match='<p>Paragraph 1</p>'>
To learn more about regular expressions in Python, visit this link.
See also: 50+ Tips & Tricks for Python Developers