Adding static data files to the package

If we're going to add static data files to the package, where should we put them?

Well we can put them anywhere that's convenient within the package folder, but it's often a good idea to create a subfolder specifically for holding the data files. This keeps data files separate from the source code and generally makes them a little easier to work with.

The data files that are part of a package should be assumed to be read-only.

There are many reasons that might cause the files to not be writable at runtime. So, if we want to write data to a file while our code is running, we need to pick somewhere else to store it. Only files that do not change are appropriate for inclusion in a package:

ls example/
__init__.py data
ls example/data
datafile.txt
cat example/data/datafile.txt
Hello world of data

So, that said, all we have to do to include a data file in our package is drop it into our package and then access the data with the get_data function from the util package in the standard library:

from pkgutil import get_data
get_data('example', 'data/datafile.txt')
b'Hello world of data\n'

The get_data function takes two parameters:

  • The name of the package we want to get the data from
  • The relative path of the data file inside the package

Using forward slashes to separate path components, we pass it these two pieces of information and it returns a byte object to us containing the contents of the file.

If we want a text string instead of bytes, that's easily done. We just need to apply the proper string decoder to the bytes object and we'll get back a unicode text string. This technique will work even if our package has been compressed into a ZIP file or otherwise hidden away because it uses the same underlying mechanism that Python uses to load module source code.

If Python can find the code, it can find the data file as well. That's all there is to working with static data that's packaged up alongside our code. It's simple and useful.