Python @ DjangoSpin

Encoding in Python: encode() & decode()

Buffer this pageShare on FacebookPrint this pageTweet about this on TwitterShare on Google+Share on LinkedInShare on StumbleUpon
Reading Time: 1 minutes

Encoding in Python - encode() & decode()

Encoding in Python - encode() & decode()

Encoding, is the process of transforming the string into a specialized format for efficient storage or transmission. In other words, encoding is the process of transforming content into sequence of bytes, which will ideally make sense again when it is decoded with the same encoding type with which it was encoded. Character encoding is used to represent the entire list of characters that belong in an encoding system.

For example, let's talk about two encodings: ASCII and Unicode.

ASCII(American Standard Code for Information Interchange) has a total of 127 characters, which is roughly a list of all the characters that you can type using a standard keyboard. You can view the list of symbols here. Basically, it covers numbers, uppercase letters and lowercase letters and a bunch of other symbols.

Unicode covers almost every character there is. It contains over 128 thousand characters, covering 135 modern and historic scripts, as well as multiple symbol sets, as per Wikipedia. Unicode is the standard character set of Python, and is denoted by utf-8. You can read about Unicode here.

The Python part. The encode() acts on a string and produces a sequence of bytes. The decode() acts on bytes and produces the original string.

>>> "hello".encode(encoding = 'ascii')
b'hello'
>>> b'hello'.decode(encoding = 'ascii')
'hello'

Python raises a UnicodeEncodeError, when you try to encode a string using an encoding that doesn’t have one or more of the characters in the string in its character set. To read more about encoding and decoding, visit this link.


See also:

Buffer this pageShare on FacebookPrint this pageTweet about this on TwitterShare on Google+Share on LinkedInShare on StumbleUpon

Leave a Reply