Serialization in Python
Serialization:
The process of converting native language objects into a sequence of bytes or objects of an interchange format, which are either stored in a file or stored in a string, for the purpose of loading the data in a different session, or for transmitting the data over the network, is known as Serialization. Saving the state of a game when you exit it, and loading it when you launch the game next time, is one such example where serialization is implemented. We'll discuss 3 modern-day Python modules which implement serialization: pickle, shelve and json.
- Navigation
- pickle
- shelve
- json
- Summary
- Helpful Links
The pickle Module§
The pickle module provides the dump() method to store the data structure in a file, and loads it into a new session using the load() method. The serialized data is in binary format, which can be un-pickled using the load() method. It is capable of storing complex Python data structures.
The signature of the dump() as provided in the Python documentation is as follows:
pickle.dump(obj, file, protocol=None, *, fix_imports=True)
where obj is the data that you want to store as bytes, file is the file object pointing to your file. You will seldom need the optional arguments. If you are interested, here's the link to the Python documentation for the pickle module.
>>> import pickle >>> saveGameDetails = { 'playerName': 'Ethan', 'level': 3, 'arrowCount': 12 } >>> with open('saveGame01.robinhood', 'wb') as pickleFileHandler: pickle.dump(saveGameDetails, pickleFileHandler)
Contents of 'saveGame01.robinhood' are in binary format. You can check it at this point, using a text editor of your choice.
The pickle module converts the data structure to a series of bytes using a set of rules called the pickle protocol. This protocol is Python-specific and does not guarantee cross-language compatibility. There is little chance of obtaining the original information from a pickled file using a language other than Python.
The format of the pickle protocol has changed with the Python version. Python 1.x supported two formats: text and binary. Python 2.3 onwards, to cater to new functionality of Python objects, a new format was introduced which was in binary format only. And lastly, Python 3.0 onwards, the pickle protocol provides support for the 'bytes' object, and it is also in binary format. Due to the difference between the bytes and strings, Python 3 can read data pickled with format used in Python 2, but Python 2 cannot read data pickled with format used in Python 3.
Since Python 3 uses the binary format for pickling the data, the process is carried out using binary access modes.
Let's look at how to un-pickle the serialized data. Signature of the load() method:
pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict")
where file is the file object from where you want the data.
>>> loadGameDetails Traceback (most recent call last): File "<pyshell#6>", line 1, in <module> loadGameDetails NameError: name 'loadGameDetails' is not defined >>> with open('saveGame01.robinhood', 'rb') as pickleFileHandler: loadGameDetails = pickle.load(pickleFileHandler) >>> loadGameDetails {'level': 3, 'playerName': 'Ethan', 'arrowCount': 12} >>>
You are not limited to pickling dictionaries, all the primitive data types(string, integers etc.), data structures(sets, dictionaries etc.), top-level functions and classes i.e. those defined at first indentation level are all pickle-able.
Pickling without a file
You can even pickle data without files, using the dumps() and loads() methods, which convert data to a bytes object which yields the data upon deserialization. The bytes object can be transmitted over a network and un-pickled on a destination machine.
>>> import pickle >>> saveGameDetails = { 'playerName': 'Ethan', 'level': 3, 'arrowCount': 12 } >>> saveGameDetailsBinary = pickle.dumps(saveGameDetails) >>> saveGameDetailsBinary b'\x80\x03}q\x00(X\n\x00\x00\x00arrowCountq\x01K\x0cX\n\x00\x00\x00playerNameq \x02X\x05\x00\x00\x00Ethanq\x03X\x05\x00\x00\x00levelq\x04K\x03u.' >>> loadGameDetails = pickle.loads(saveGameDetailsBinary) >>> loadGameDetails {'level': 3, 'playerName': 'Ethan', 'arrowCount': 12}
The shelve Module§
The pickle module is for serializing a Python object(or objects) as a single stream of bytes in a file. The shelve module enhances this and implements a serialization dictionary where objects are pickled along with a key (a string) which is used to access the corresponding pickle when the shelf is loaded. This is more convenient when you wish to serialize many objects. You can find Python documentation for the shelve module here.
It works like this:
- Using the open() method provided by the shelve module, you open a file which you want to use to store the information. The open() method returns a shelf object.
- This shelf object serves like a dictionary, and you can add key-pickle pairs to it, where the key will be used to identify the 'pickle'.
- Using this key, you can load the data, just like you would in a dictionary.
In this way, you can store multiple 'pickles' onto a shelf.
Signature of shelve.open():
shelve.open(filename, flag='c', protocol=None, writeback=False)
>>> import shelve >>> saveGameOneDetails = {'playerName': 'Ethan', 'level': 3, 'arrowCount': 12} >>> saveGameTwoDetails= {'playerName': 'Ethan', 'level': 5, 'arrowCount': 6} >>> with shelve.open('save_games.robinhood') as saveGames: # as good as saveGames = shelve.open('save_games.robinhood') saveGames['saveGame001'] = saveGameOneDetails >>> with shelve.open('save_games.robinhood') as saveGames: saveGames['saveGame002'] = saveGameTwoDetails # creates 2 files: save_games.robinhood.dat(containing serialized bytes, like the output file of a pickle) and save_games.robinhood.dir(containing records of individual pickles, like a register.) # contents of .dir: 'saveGame002', (512, 70) 'saveGame001', (0, 70) # UNPICKLING A SPECIFIC PICKLE FROM THE SHELF >>> with shelve.open('save_games.robinhood') as saveGames: loadGameOneDetails = saveGames['saveGame001'] print(loadGameOneDetails) {'level': 3, 'arrowCount': 12, 'playerName': 'Ethan'} >>> loadGameOneDetails == saveGameOneDetails True
The open() method we saw here, as provided by the shelve module, has two optional arguments that I would like to talk about: flag and writeback. The flag is the access mode in which the shelve is to be opened. It has two values: 'r' for reading only, and 'c' which is synonymous to 'w+' in the builtin open() method i.e. it opens the shelf for both reading as well as writing, creating the file if it doesn't exist. 'c' is the default value for the flag argument.
The other argument, writeback , is a boolean argument, with a default value of False. When it is set to True, it allows direct modification of a 'pickle'(if it is mutable). By default, modified 'pickles' are written to the file when they are assigned to the shelf, but when the 'writeback' is set to True, all pickles are also cached in memory, and written back on shelve.sync() and shelve.close(). Albeit this can make it easy to alter the pickles, it can also consume large amount of memory, and makes the close operation really slow, since all the access pickles are written back.
# when writeback = False >>> import shelve >>> saveGamesShelf = shelve.open('saveGames') # writeback = False by default >>> saveGamesShelf['saveGameOne'] = {'level': 3, 'arrowCount': 12, 'playerName': 'Ethan'} >>> print(saveGamesShelf['saveGameOne']) {'level': 3, 'arrowCount': 12, 'playerName': 'Ethan'} >>> saveGamesShelf['saveGameOne']['livesSpared'] = '90%' # adding information to 'pickle' >>> print(saveGamesShelf['saveGameOne']) {'level': 3, 'arrowCount': 12, 'playerName': 'Ethan'} # new information does not reflect as writeback is False by default >>> >>> alteredSaveGameOne = saveGamesShelf['saveGameOne'] # indirect modification of 'pickle' >>> alteredSaveGameOne['livesSpared'] = '90%' # adding information to 'pickle' >>> saveGamesShelf['saveGameOne'] = alteredSaveGameOne # assigning modified 'pickle' to original 'pickle' >>> print(saveGamesShelf['saveGameOne']) {'level': 3, 'arrowCount': 12, 'playerName': 'Ethan', 'livesSpared': '90%'} >>> saveGamesShelf.close() # when writeback = True >>> saveGamesShelf = shelve.open('saveGames', writeback = True) >>> saveGamesShelf['saveGameOne'] = {'level': 3, 'arrowCount': 12, 'playerName': 'Ethan'} >>> print(saveGamesShelf['saveGameOne']) {'level': 3, 'arrowCount': 12, 'playerName': 'Ethan'} >>> saveGamesShelf['saveGameOne']['livesSpared'] = '90%' >>> print(saveGamesShelf['saveGameOne']) {'level': 3, 'arrowCount': 12, 'playerName': 'Ethan', 'livesSpared': '90%'} # new information is reflected as writeback = True, which allows direct modification >>> saveGamesShelf.close()
To get a list of pickles, you can use the keys() method of the shelve module.
>>> with shelve.open('save_games.robinhood') as saveGamesDetails: for saveGame in saveGamesDetails.keys(): print(saveGame, end = " : ") print(saveGamesDetails[saveGame]) saveGame002 : {'level': 5, 'arrowCount': 6, 'playerName': 'Ethan'} saveGame001 : {'level': 3, 'arrowCount': 12, 'playerName': 'Ethan'}
The json Module§
The data format used by pickle and shelve modules is Python specific, and it doesn't help when it comes to cross-language compatibility. There is another serialization format available for this purpose: JSON.
JavaScript Object Notation, or JSON, is derived from Javascript object literal syntax. As the name suggests, it is a notation to depict objects. And these objects, serve as a compact way of transmitting data across systems. It replaced XML as the data interchange format when it first came out, dut to low overhead. One significant change from pickle and shelve is, that JSON format is human readable, since it is in text form and not in binary form. You can find Python documentation for the json module here. Here is an example of JSON data:
{ "firstName": "Ethan", "lastName": "Hunt", "address": { "streetAddress": "221B Baker Street", "city": "Narnia", "country": "La-La Land" }, "phoneNumbers": [ "+91-978-675-6452", "+91-978-567-2345" ] }
Python aside, let's have a brief overview of these JSON concepts:
- Object: An object in JSON is a collection of string-value pairs. It is enclosed by curly braces { }.
- Array: An array in JSON is an ordered list of values. It is enclosed in square brackets [ ].The values corresponding to the "phoneNumbers" string constitute an array.
- Value: Values in JSON could hold any of the following: a string, a number, an object, an array, literal 'true', literal 'false' and 'null'. In the example, "Ethan" is a value, so is "+91-978-675-6452".
Now let's implement serialization using JSON in Python to see how the json module transforms Python objects into JSON objects. The json module provides the dump() method, which has the following arguments, as given in the documentation:
json.dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)
The dump() serializes the object obj as a JSON object to a file descriptor/file handler fp. For the purpose of basic use, you can ignore rest of the arguments, except the indent, separators and sort_keys arguments. I'll elaborate on these arguments later, first let's have a look at an example of dumping a Python data structure to a JSON object.
>>> import json >>> jsonFileHandler = open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') >>> saveGameDetails = { 'playerName': 'Ethan', 'level': 3, 'livesSpared': 0.88, 'easyDifficultyLevel': False, 'intermediateDifficultyLevel': True, 'hardDifficultyLevel': False, 'visitsToHolyLand': None, 'merryMen': { 'Healer': 'Maid Marian', 'SwordsMan': 'Will Scarlet', 'FistFighter': 'Little John' }, 'locations': [ 'Nottinghamshire', 'Yorkshire', 'Sherwood' ] } >>> json.dump(saveGameDetails, jsonFileHandler) >>> jsonFileHandler.close()
Let's view how this data is stored in 'saveGame.robinhood':
{"locations": ["Nottinghamshire", "Yorkshire", "Sherwood"], "level": 3, "intermediateDifficultyLevel": true, "merryMen": {"FistFighter": "Little John", "Healer": "Maid Marian", "SwordsMan": "Will Scarlet"}, "easyDifficultyLevel": false, "hardDifficultyLevel": false, "livesSpared": 0.88, "visitsToHolyLand": null, "playerName": "Ethan"} # Putting it in a more readable form: { "locations": [ "Nottinghamshire", "Yorkshire", "Sherwood" ], "level": 3, "intermediateDifficultyLevel": true, "merryMen": { "FistFighter": "Little John", "Healer": "Maid Marian", "SwordsMan": "Will Scarlet" }, "easyDifficultyLevel": false, "hardDifficultyLevel": false, "livesSpared": 0.88, "visitsToHolyLand": null, "playerName": "Ethan" }
Wow! That's really neat. To highlight a few subtle changes that our data went through:
- Our dictionary 'merryMen' got converted to an JSON object.
- Our list 'locations' got converted to a JSON array.
- All string values got wrapped into double quotes instead of single quotes.
- Integer 'level' and floating point number 'livesSpared' remained as they were.
- Value changes: True -> true, False -> false, None -> null
We'll view these changes again later, let's focus on the open() and dump() we used here. For starters, we opened the file that we wanted to store data into, with an encoding of 'utf-8'. This is because JSON is a text-based format, and it is always safe to operate text files with a specific character encoding. By specifying a characer encoding while writing to the file, you are ensuring that, in the event that you are using special symbols, characters etc., you get the same characters while reading from it using the same encoding. If you are not familiar with encodings, I suggest you check this article out. Unicode has the largest number of characters in its character set, so it is safe to use 'utf-8' for our purposes.
Let's proceed to the arguments in dump() function. We'll talk about three optional arguments: indent, separators, sort_keys. The optional indent argument, when specified, helps to make your data readable. Its default value is None, which puts all content in a single line without any new line character, as we saw in the above example. When its value is set to 0 or "", or a negative number, then only new lines are inserted after each element, without any indentation. A positive number for this argument, results in that many spaces in front of each key-value pair. '\t', '\n', a combination of these both are also acceptable. In the above example, the JSON string under the comment "# Putting it in a more readable form:" is actually achieved using an indent of 4. Here are a few more examples.
# default indent i.e. None import json >>> jsonFileHandler = open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') >>> saveGameDetails = { 'playerName': 'Ethan', 'level': 3, 'livesSpared': 0.88, 'easyDifficultyLevel': False, 'intermediateDifficultyLevel': True, 'hardDifficultyLevel': False, 'visitsToHolyLand': None, 'merryMen': { 'Healer': 'Maid Marian', 'SwordsMan': 'Will Scarlet', 'FistFighter': 'Little John' }, 'locations': [ 'Nottinghamshire', 'Yorkshire', 'Sherwood' ] } >>> json.dump(saveGameDetails, jsonFileHandler) >>> jsonFileHandler.close() # content of saveGame.robinhood {"locations": ["Nottinghamshire", "Yorkshire", "Sherwood"], "level": 3, "intermediateDifficultyLevel": true, "merryMen": {"FistFighter": "Little John", "Healer": "Maid Marian", "SwordsMan": "Will Scarlet"}, "easyDifficultyLevel": false, "hardDifficultyLevel": false, "livesSpared": 0.88, "visitsToHolyLand": null, "playerName": "Ethan"} # indent = 0 >>> jsonFileHandler = open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') >>> json.dump(saveGameDetails, jsonFileHandler, indent = 0) >>> jsonFileHandler.close() # content of saveGame.robinhood { "locations": [ "Nottinghamshire", "Yorkshire", "Sherwood" ], "level": 3, "intermediateDifficultyLevel": true, "merryMen": { "FistFighter": "Little John", "Healer": "Maid Marian", "SwordsMan": "Will Scarlet" }, "easyDifficultyLevel": false, "hardDifficultyLevel": false, "livesSpared": 0.88, "visitsToHolyLand": null, "playerName": "Ethan" } # indent = 2 >>> jsonFileHandler = open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') >>> json.dump(saveGameDetails, jsonFileHandler, indent = 2) >>> jsonFileHandler.close() # contents of saveGame.robinhood { "locations": [ "Nottinghamshire", "Yorkshire", "Sherwood" ], "level": 3, "intermediateDifficultyLevel": true, "merryMen": { "FistFighter": "Little John", "Healer": "Maid Marian", "SwordsMan": "Will Scarlet" }, "easyDifficultyLevel": false, "hardDifficultyLevel": false, "livesSpared": 0.88, "visitsToHolyLand": null, "playerName": "Ethan" } # indent = 4 >>> jsonFileHandler = open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') >>> json.dump(saveGameDetails, jsonFileHandler, indent = 4) >>> jsonFileHandler.close() # contents of saveGame.robinhood { "locations": [ "Nottinghamshire", "Yorkshire", "Sherwood" ], "level": 3, "intermediateDifficultyLevel": true, "merryMen": { "FistFighter": "Little John", "Healer": "Maid Marian", "SwordsMan": "Will Scarlet" }, "easyDifficultyLevel": false, "hardDifficultyLevel": false, "livesSpared": 0.88, "visitsToHolyLand": null, "playerName": "Ethan" }
The optional separators argument in the dump() function, tells JSON how to delimit items and keys in the resultant JSON formatted string. This argument takes a 2-item tuple, first of which serves as item separator, and the other as key separator. The item separator serves as delimiter for items such as elements in arrays & key-value pairs in objects. The key separator serves as delimiter between keys and values in objects. The default value for the separators argument is (', ', ': ') if indent is None and (',', ':') otherwise. Check this example for a good illustration of these separators.
>>> jsonFileHandler = open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') >>> json.dump(saveGameDetails, jsonFileHandler, indent = 4, separators = ('#', '@')) >>> jsonFileHandler.close() { "locations"@[ "Nottinghamshire"# "Yorkshire"# "Sherwood" ]# "level"@3# "intermediateDifficultyLevel"@true# "merryMen"@{ "FistFighter"@"Little John"# "Healer"@"Maid Marian"# "SwordsMan"@"Will Scarlet" }# "easyDifficultyLevel"@false# "hardDifficultyLevel"@false# "livesSpared"@0.88# "visitsToHolyLand"@null# "playerName"@"Ethan" }
The optional sort_keys argument in the dump() function, as the name suggests, sorts the keys in an alphabetical order while storing, when it is set to True. Its default value is False.
# sort_keys = False by default >>> jsonFileHandler = open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') >>> json.dump(saveGameDetails, jsonFileHandler, indent = 4) >>> jsonFileHandler.close() { "locations": [ "Nottinghamshire", "Yorkshire", "Sherwood" ], "level": 3, "intermediateDifficultyLevel": true, "merryMen": { "FistFighter": "Little John", "Healer": "Maid Marian", "SwordsMan": "Will Scarlet" }, "easyDifficultyLevel": false, "hardDifficultyLevel": false, "livesSpared": 0.88, "visitsToHolyLand": null, "playerName": "Ethan" } # sort_keys = True >>> jsonFileHandler = open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') >>> json.dump(saveGameDetails, jsonFileHandler, indent = 4, sort_keys = True) >>> jsonFileHandler.close() { "easyDifficultyLevel": false, "hardDifficultyLevel": false, "intermediateDifficultyLevel": true, "level": 3, "livesSpared": 0.88, "locations": [ "Nottinghamshire", "Yorkshire", "Sherwood" ], "merryMen": { "FistFighter": "Little John", "Healer": "Maid Marian", "SwordsMan": "Will Scarlet" }, "playerName": "Ethan", "visitsToHolyLand": null }
Deserializing JSON data: Converting JSON formatted string to Python objects
The beauty of JSON format is that almost all modern languages have builtin functionality to make sense of JSON objects. Let's see how Python does it. The json module provides the load() method to deserialize information stored in JSON format into Python objects. The load() method has the following arguments, as given in the documentation.
json.load(fp, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
We will look at only the first argument here, you can look up the rest on the documentation link provided in the Helpful Links section right at the bottom of this page. The first argument takes the file object (or file handler or file descriptor, whatever you wish to call it) corresponding to the file containing JSON formatted data, opened with preferably the same encoding, and in one of the reading access modes.
# contents of 'saveGame.robinhood' { "easyDifficultyLevel": false, "hardDifficultyLevel": false, "intermediateDifficultyLevel": true, "level": 3, "livesSpared": 0.88, "locations": [ "Nottinghamshire", "Yorkshire", "Sherwood" ], "merryMen": { "FistFighter": "Little John", "Healer": "Maid Marian", "SwordsMan": "Will Scarlet" }, "playerName": "Ethan", "visitsToHolyLand": null } # loading it in Python >>> import json >>> jsonFileHandler = open('saveGame.robinhood', mode = 'r', encoding = 'utf-8') >>> loadGameDetails = json.load(jsonFileHandler) >>> loadGameDetails {'locations': ['Nottinghamshire', 'Yorkshire', 'Sherwood'], 'easyDifficultyLevel': False, 'hardDifficultyLevel': False, 'visitsToHolyLand': None, 'playerName': 'Ethan', 'intermediateDifficultyLevel': True, 'level': 3, 'merryMen': {'FistFighter': 'Little John', 'SwordsMan': 'Will Scarlet', 'Healer': 'Maid Marian'}, 'livesSpared': 0.88} >>> for key in loadGameDetails: print(key, ": ", loadGameDetails[key]) locations : ['Nottinghamshire', 'Yorkshire', 'Sherwood'] easyDifficultyLevel : False hardDifficultyLevel : False visitsToHolyLand : None playerName : Ethan intermediateDifficultyLevel : True level : 3 merryMen : {'FistFighter': 'Little John', 'SwordsMan': 'Will Scarlet', 'Healer': 'Maid Marian'} livesSpared : 0.88
Datatype matching in Python and JSON
JSON is supposed to be generic, it has its own data types such as objects, arrays, strings etc. Upon serialization, a Python dictionary becomes a JSON object, a Python list becomes a JSON array and so on. Upon deserialization, these JSON datatypes get converted into their Python counterparts. Follow along the below example and it will be crystal clear what becomes what.
>>> saveGameDetails = { 'playerName': 'Ethan', # Python string 'level': 3, # Python integer 'livesSpared': 0.88, # Python floating point number 'easyDifficultyLevel': False, # Python False 'intermediateDifficultyLevel': True, # Python True 'hardDifficultyLevel': False, 'visitsToHolyLand': None, 'merryMen': { # Python dictionary 'Healer': 'Maid Marian', 'SwordsMan': 'Will Scarlet', 'FistFighter': 'Little John' }, 'locations': [ # Python list 'Nottinghamshire', 'Yorkshire', 'Sherwood' ], 'arrowsAndMerryMen': (12, 4), # Python tuple } # SERIALIZATION >>> with open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') as jsonFileHandler: json.dump(saveGameDetails, jsonFileHandler, indent = 4) # contents of saveGame.robinhood { "locations": [ // Python list -> JSON array "Nottinghamshire", "Yorkshire", "Sherwood" ], "easyDifficultyLevel": false, // Python False -> JSON false "hardDifficultyLevel": false, "arrowsAndMerryMen": [ // Python tuple -> JSON array 12, 4 ], "visitsToHolyLand": null, // Python None -> JSON null "playerName": "Ethan", // Python string -> JSON string "intermediateDifficultyLevel": true, // Python True -> JSON true "level": 3, // Python integer -> JSON integer "merryMen": { // Python dictionary -> JSON object "FistFighter": "Little John", "SwordsMan": "Will Scarlet", "Healer": "Maid Marian" }, "livesSpared": 0.88 // Python floating point number -> JSON real number } # DESERIALIZATION >>> with open('saveGame.robinhood', mode = 'r', encoding = 'utf-8') as jsonFileHandler: loadGameDetails = json.load(jsonFileHandler) print(loadGameDetails) {'livesSpared': 0.88, 'easyDifficultyLevel': False, 'hardDifficultyLevel': False, 'visitsToHolyLand': None, 'level': 3, 'playerName': 'Ethan', 'locations': ['Nottinghamshire', 'Yorkshire', 'Sherwood'], 'intermediateDifficultyLevel': True, 'merryMen': {'FistFighter': 'Little John', 'SwordsMan': 'Will Scarlet', 'Healer': 'Maid Marian'}, 'arrowsAndMerryMen': [12, 4]} # Prettifying the output manually { 'livesSpared': 0.88, # Python floating point number -> JSON real number -> Python floating point number 'easyDifficultyLevel': False, # Python False -> JSON false 'hardDifficultyLevel': False, 'visitsToHolyLand': None, # Python None -> JSON null -> Python None 'level': 3, # Python integer -> JSON integer -> Python integer 'playerName': 'Ethan', # Python string -> JSON string -> Python string 'locations': [ # Python list -> JSON array -> Python list 'Nottinghamshire', 'Yorkshire', 'Sherwood' ], 'intermediateDifficultyLevel': True, # Python True -> JSON true -> Python True 'merryMen': { # Python dictionary -> JSON object -> Python dictionary 'FistFighter': 'Little John', 'SwordsMan': 'Will Scarlet', 'Healer': 'Maid Marian' }, 'arrowsAndMerryMen': [12, 4] # Python tuple -> JSON array -> Python list # Worth Noting } >>> type(loadGameDetails['arrowsAndMerryMen']) <class 'list'>
It's important to note that tuples and bytes do not have any matches in JSON datatypes. As we saw above with 'arrowsAndMerryMen', tuples turn to arrays upon serialization and when they are deserialized, they become Python lists rather than tuples. JSON doesn't support serialization of Python's bytes datatype. In the following example, the 'b' in front of the string denotes that it is a binary string.
>>> type(b'RobinHood') <class 'bytes'> >>> saveGameDetails = { 'playerName': 'Ethan', 'password': b'RobinHood' } >>> with open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') as jsonFileHandler: json.dump(saveGameDetails, jsonFileHandler, indent = 4) Traceback (most recent call last): # Traceback information TypeError: b'RobinHood' is not JSON serializable
The following table summarises the datatypes in Python and JSON.
Python 3 datatype/keyword | JSON counterpart |
string | string |
integer | integer |
float | real number |
dictionary | object |
list | array |
True | true |
False | false |
None | null |
tuple | Stored as array in JSON, and turns to Python list on deserialization. |
bytes | No equivalent in JSON. Can't be serialized. |
What happens when we try to deserialize invalid JSON data?
As we know by now, that JSON format is human readable. And since it is human readable, it also human editable. This is why JSON is most widely used for configuration files. But there are a few things you should be wary of before you jump straight into editing JSON data, in regards to its use with Python. In particular, JSON doesn't do well with trailing commas, single quotes, invalid literal values such as changing true to True. If any of these instances occur, Python raises a ValueError with a description (which may not always be helpful) and line number in the file containing JSON, pointing to the source of the error. Let's look at each of these with examples. If you know any other prominent caveats that JSON faces, please comment below and I'll include it here.
To begin with, JSON stores its elements wrapped in double quotes, and if you try to change that to single quotes, you will get an error such as one below.
>>> import json >>> configDetails = { "dbUser": "ABC", "dbPass": "DEF", "dbSID": "GHI" } >>> with open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') as jsonFileHandler: json.dump(configDetails, jsonFileHandler, indent = 4) # contents of saveGame.robinhood{ "dbUser": "ABC", "dbPass": "DEF", "dbSID": "GHI" } # deserializing JSON data to Python data >>> with open('saveGame.robinhood', mode = 'r', encoding = 'utf-8') as jsonFileHandler: configDetailsLoad = json.load(jsonFileHandler) print(configDetailsLoad) {'dbUser': 'ABC', 'dbPass': 'DEF', 'dbSID': 'GHI'} # Now, replacing "" with '' in the file, making contents of saveGame.robinhood as follows: { "dbUser": 'ABC', "dbPass": "DEF", "dbSID": "GHI" } # deserializing JSON data to Python data >>> with open('saveGame.robinhood', mode = 'r', encoding = 'utf-8') as jsonFileHandler: configDetailsLoad = json.load(jsonFileHandler) print(configDetailsLoad) Traceback (most recent call last): # Traceback information ValueError: Expecting value: line 2 column 15 (char 16)
Another thing to bear in mind is that JSON does not allow for trailing commas, that is a comma after the last element in an object. Python, on the other hand has no problem with trailing commas.
>>> configDetails = { "dbUser": "ABC", "dbPass": "DEF", "dbSID": "GHI", # note the trailing comma in the end } >>> configDetails {'dbUser': 'ABC', 'dbPass': 'DEF', 'dbSID': 'GHI'} >>> configDetails = { "dbUser": "ABC", "dbPass": "DEF", "dbSID": "GHI" # no trailing comma here } >>> configDetails {'dbUser': 'ABC', 'dbPass': 'DEF', 'dbSID': 'GHI'} # same result >>> import json >>> configDetails = { "dbUser": "ABC", "dbPass": "DEF", "dbSID": "GHI", } >>> with open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') as jsonFileHandler: json.dump(configDetails, jsonFileHandler, indent = 4) # contents of saveGame.robinhood, note that JSON removed the trailing comma. { "dbUser": "ABC", "dbPass": "DEF", "dbSID": "GHI" } # this will load fine. Let's tweak the JSON data and add a trailing comma in the end and see what happens when we load it. # tweaked content of saveGame.robinhood { "dbUser": "ABC", "dbPass": "DEF", "dbSID": "GHI", } # deserializing JSON data to Python data >>> with open('saveGame.robinhood', mode = 'r', encoding = 'utf-8') as jsonFileHandler: configDetailsLoad = json.load(jsonFileHandler) print(configDetailsLoad) Traceback (most recent call last): # Traceback information ValueError: Expecting property name enclosed in double quotes: line 5 column 1 (char 64)
Moving on. You might have noticed in our first example, that JSON converted the keywords True, False, None to true, false and null respectively. These are converted to language-specific keywords denoting similar meanings when they are loaded in any language. For example, true in JSON corresponds to True in Python, false corresponds to False and so on. So, if you are editing file with JSON data that you wish to load in Python, do NOT edit these keywords i.e. do not change true to True in the JSON file itself, and so on.
>>> import json >>> configDetails = { "dbUser": "ABC", "dbPass": "DEF", "dbSID": "GHI", "isAdmin": False } >>> with open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') as jsonFileHandler: json.dump(configDetails, jsonFileHandler, indent = 4) # contents of saveGame.robinhood { "dbUser": "ABC", "dbPass": "DEF", "dbSID": "GHI", "isAdmin": false } # Upon deserialization >>> with open('saveGame.robinhood', mode = 'r', encoding = 'utf-8') as jsonFileHandler: configDetailsLoad = json.load(jsonFileHandler) print(configDetailsLoad) {'dbUser': 'ABC', 'dbPass': 'DEF', 'dbSID': 'GHI', 'isAdmin': False} # Now, try tweaking the JSON data, by say, replacing false with False. # tweaked contents of saveGame.robinhood { "dbUser": "ABC", "dbPass": "DEF", "dbSID": "GHI", "isAdmin": False } # let's deserialize the JSON data now, or at least try to. >>> with open('saveGame.robinhood', mode = 'r', encoding = 'utf-8') as jsonFileHandler: configDetailsLoad = json.load(jsonFileHandler) print(configDetailsLoad) Traceback (most recent call last): # Traceback information ValueError: Expecting value: line 5 column 16 (char 79)
Serializing using JSON without a file
Like the pickle module, the json module also provides dumps() and loads() methods which convert the data into JSON formatted string, which can be transmitted over a network and deserialized on the destination machine.
>>> configDetails = { "dbPass": "DEF", "dbUser": "ABC", "dbSID": "GHI" } >>> configDetailsJSON = json.dumps(configDetails) >>> configDetailsJSON '{"dbUser": "ABC", "dbPass": "DEF", "dbSID": "GHI"}' >>> type(configDetailsJSON) <class 'str'> >>> configDetailsDeserialized = json.loads(configDetailsJSON) >>> configDetailsDeserialized {'dbUser': 'ABC', 'dbPass': 'DEF', 'dbSID': 'GHI'}
Summary of what we have learnt§
Today, we looked at the serialization modules in Python: pickle, shelve and json. To recapitulate what we learnt, here are a few code snippets outlining these modules.
####### pickle MODULE ####### ### Serialization: pickle.dump(obj, file, protocol=None, *, fix_imports=True) >>> import pickle >>> saveGameDetails = { 'playerName': 'Ethan', 'level': 3, 'arrowCount': 12 } >>> with open('saveGame01.robinhood', 'wb') as pickleFileHandler: pickle.dump(saveGameDetails, pickleFileHandler) ### Contents of 'saveGame01.robinhood' are in binary format. ### Deserialization: pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict") >>> with open('saveGame01.robinhood', 'rb') as pickleFileHandler: loadGameDetails = pickle.load(pickleFileHandler) >>> loadGameDetails {'level': 3, 'playerName': 'Ethan', 'arrowCount': 12} ### Pickling without a file: loads() and dumps() >>> saveGameDetails = { 'playerName': 'Ethan', 'level': 3, 'arrowCount': 12 } >>> saveGameDetailsBinary = pickle.dumps(saveGameDetails) >>> saveGameDetailsBinary b'\x80\x03}q\x00(X\n\x00\x00\x00arrowCountq\x01K\x0cX\n\x00\x00\x00playerNameq \x02X\x05\x00\x00\x00Ethanq\x03X\x05\x00\x00\x00levelq\x04K\x03u.' >>> loadGameDetails = pickle.loads(saveGameDetailsBinary) >>> loadGameDetails {'level': 3, 'playerName': 'Ethan', 'arrowCount': 12} ####### shelve MODULE ####### ### Serialization ### >>> import shelve >>> saveGameOneDetails = {'playerName': 'Ethan', 'level': 3, 'arrowCount': 12} >>> saveGameTwoDetails= {'playerName': 'Ethan', 'level': 5, 'arrowCount': 6} >>> with shelve.open('save_games.robinhood') as saveGames: # as good as saveGames = shelve.open('save_games.robinhood') saveGames['saveGame001'] = saveGameOneDetails >>> with shelve.open('save_games.robinhood') as saveGames: saveGames['saveGame002'] = saveGameTwoDetails ### creates 2 files: save_games.robinhood.dat(containing serialized bytes, like the output file of a pickle) and save_games.robinhood.dir(containing records of individual pickles, like a register.) ### # contents of .dir: 'saveGame002', (512, 70) 'saveGame001', (0, 70) ### Deserialization ### >>> with shelve.open('save_games.robinhood') as saveGames: loadGameOneDetails = saveGames['saveGame001'] print(loadGameOneDetails) {'level': 3, 'arrowCount': 12, 'playerName': 'Ethan'} >>> loadGameOneDetails == saveGameOneDetails True ####### json MODULE ####### ### Serialization: json.dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw) >>> import json >>> jsonFileHandler = open('saveGame.robinhood', mode = 'w', encoding = 'utf-8') >>> saveGameDetails = { 'playerName': 'Ethan', 'level': 3, 'livesSpared': 0.88, 'easyDifficultyLevel': False, 'intermediateDifficultyLevel': True, 'hardDifficultyLevel': False, 'visitsToHolyLand': None, 'merryMen': { 'Healer': 'Maid Marian', 'SwordsMan': 'Will Scarlet', 'FistFighter': 'Little John' }, 'locations': [ 'Nottinghamshire', 'Yorkshire', 'Sherwood' ] } >>> json.dump(saveGameDetails, jsonFileHandler, indent = 4) >>> jsonFileHandler.close() ### contents of saveGame.robinhood' in text format { "locations": [ "Nottinghamshire", "Yorkshire", "Sherwood" ], "level": 3, "intermediateDifficultyLevel": true, "merryMen": { "FistFighter": "Little John", "Healer": "Maid Marian", "SwordsMan": "Will Scarlet" }, "easyDifficultyLevel": false, "hardDifficultyLevel": false, "livesSpared": 0.88, "visitsToHolyLand": null, "playerName": "Ethan" } ### Deserialization: json.load(fp, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw) >>> jsonFileHandler = open('saveGame.robinhood', mode = 'r', encoding = 'utf-8') >>> loadGameDetails = json.load(jsonFileHandler) >>> loadGameDetails {'locations': ['Nottinghamshire', 'Yorkshire', 'Sherwood'], 'easyDifficultyLevel': False, 'hardDifficultyLevel': False, 'visitsToHolyLand': None, 'playerName': 'Ethan', 'intermediateDifficultyLevel': True, 'level': 3, 'merryMen': {'FistFighter': 'Little John', 'SwordsMan': 'Will Scarlet', 'Healer': 'Maid Marian'}, 'livesSpared': 0.88} >>> for key in loadGameDetails: print(key, ": ", loadGameDetails[key]) locations : ['Nottinghamshire', 'Yorkshire', 'Sherwood'] easyDifficultyLevel : False hardDifficultyLevel : False visitsToHolyLand : None playerName : Ethan intermediateDifficultyLevel : True level : 3 merryMen : {'FistFighter': 'Little John', 'SwordsMan': 'Will Scarlet', 'Healer': 'Maid Marian'} livesSpared : 0.88 ### Serialization without files using JSON: dumps() and loads() >>> configDetails = { "dbPass": "DEF", "dbUser": "ABC", "dbSID": "GHI" } >>> configDetailsJSON = json.dumps(configDetails) >>> configDetailsJSON '{"dbUser": "ABC", "dbPass": "DEF", "dbSID": "GHI"}' >>> type(configDetailsJSON) <class 'str'> >>> configDetailsDeserialized = json.loads(configDetailsJSON) >>> configDetailsDeserialized {'dbUser': 'ABC', 'dbPass': 'DEF', 'dbSID': 'GHI'} ### NOTE: JSON has no support for tuples and bytes object.
Helpful Links§
- Python documentation for marshal module
- Python documentation for shelve module
- Python documentation for pickle module
- Python documentation for json module
- Online JSON editor