Archiving

The archive module

The archive module manages archival of notebooks to storage (i.e. S3) when a notebook save occurs.

ArchiveRecord

Bookstore uses an immutable ArchiveRecord to represent a notebook file by its storage path.

class bookstore.archive.ArchiveRecord

Represents an archival record.

An ArchiveRecord uses a Typed version of collections.namedtuple(). The record is immutable.

Example

An archive record (filepath, content, queued_time) contains:

  • a filepath to the record
  • the content for archival
  • the queued time length of time waiting in the queue for archiving
content

Alias for field number 1

filepath

Alias for field number 0

queued_time

Alias for field number 2

BookstoreContentsArchiver

class bookstore.archive.BookstoreContentsArchiver(*args, **kwargs)

Manages archival of notebooks to storage (S3) when notebook save occurs.

This class is a custom Jupyter FileContentsManager which holds information on storage location, path to it, and file to be written.

Example

  • Bookstore settings combine with the parent Jupyter application settings.
  • A session is created for the current event loop.
  • To write to a particular path on S3, acquire a lock.
  • After acquiring the lock, archive method authenticates using the storage service’s credentials.
  • If allowed, the notebook is queued to be written to storage (i.e. S3).
path_locks

Dictionary of paths to storage and the lock associated with a path.

Type:dict
path_lock_ready

A mutex lock associated with a path.

Type:asyncio mutex lock
archive(record: bookstore.archive.ArchiveRecord)

Process a record to write to storage.

Acquire a path lock before archive. Writing to storage will only be allowed to a path if a valid path_lock is held and the path is not locked by another process.

Parameters:record (ArchiveRecord) – A notebook and where it should be written to storage
run_pre_save_hook(model, path, **kwargs)

Send request to store notebook to S3.

This hook offloads the storage request to the event loop. When the event loop is available for execution of the request, the storage of the notebook will be done and the write to storage occurs.

Parameters:
  • model (dict) – The type of file and its contents
  • path (str) – The storage location