Folder Structure Download#

This example downloads a LabArchives notebook subtree to your local computer while preserving its directory hierarchy. Pages become folders, and individual entries are written out as separate files.

When to Use It#

This is useful for:

Creating local backups of your LabArchives notebooks.
Exporting notebook content for offline viewing.
Archiving completed projects.
Migrating content to other systems.
Version control integration for notebook content.

Requirements#

This example assumes the recommended local interactive profile, labapi[dotenv,builtin-auth]. See Installation.

No additional third-party packages are required.

Configuration#

For the local interactive workflow, create a .env file in the repository root:

API_URL="https://api.labarchives.com"
ACCESS_KEYID="your_access_key_id"
ACCESS_PWD="your_password"

You can also provide the same values through shell environment variables. See Your First Entry for both options.

Common Commands#

Download an entire notebook:

uv run python examples/folder_download/folder_download.py ./backup --notebook "My Notebook"

Download only a specific subtree:

uv run python examples/folder_download/folder_download.py ./2024_experiments --notebook "My Notebook" --path "Experiments/2024"

Overwrite an existing output directory:

uv run python examples/folder_download/folder_download.py ./backup --notebook "My Notebook" --overwrite

How It Works#

The script mirrors the LabArchives structure:

LabArchives Structure:          Local File Structure:

My Notebook/                    output/
|- Experiments/                 |- Experiments/
|  |- Trial 1/  (page)          |  |- Trial 1/
|  |  |- Header entry           |  |  |- 001_header.txt
|  |  |- Text entry             |  |  |- 002_text.html
|  |  `- Attachment             |  |  `- 003_attachment_image.png
|  `- Trial 2/  (page)          |  `- Trial 2/
|     `- Text entry             |     `- 001_text.html
`- Data/  (directory)           `- Data/
   `- Results/  (page)             `- Results/
      `- Attachment                  `- 001_attachment_data.csv

File Naming Convention#

Downloaded entries follow this naming pattern:

001_header.txt          # First entry (header)
002_text.html           # Second entry (rich text)
003_attachment_data.csv # Third entry (attachment)
003_caption.txt         # Caption for the attachment
004_plaintext.txt       # Fourth entry (plain text)

Entries are numbered in the order they appear on the page.
Entry type is indicated in the filename.
Attachments preserve their original filename.
Captions are saved in separate *_caption.txt files.

Output Layout#

Each downloaded location contains:

For pages:

_metadata.txt with page information such as name, ID, and entry count.
001_*, 002_*, and similar files for each entry on the page.

For directories:

Subdirectories for each child directory.
Subdirectories for each page.

Notes#

The script preserves the complete directory structure.
Filenames are sanitized to be filesystem-safe.
Widget entries are noted but cannot be fully exported because they are read-only.
Large notebooks may take significant time to download.
The script creates a _metadata.txt file for each page with additional information.

Ways to Extend It#

Add resume capability for interrupted downloads.
Verify downloaded files with checksums.
Create ZIP archives after export.
Filter by entry type or date range.
Add progress bars with tqdm.
Implement incremental backups for only new or changed content.
Export metadata as JSON.
Write detailed logs during long downloads.

Source Code##!/usr/bin/env python3
"""Download a LabArchives notebook folder tree to local disk."""

import argparse
import sys
from pathlib import Path

from labapi import AbstractTreeContainer, Client, NotebookPage, User


def sanitize_filename(name: str) -> str:
    """Sanitize a name to be safe for filesystem use."""
    # Replace problematic characters
    unsafe_chars = '<>:"/\\|?*'
    for char in unsafe_chars:
        name = name.replace(char, "_")

    # Remove leading/trailing spaces and dots
    name = name.strip(". ")

    # Limit length to avoid filesystem issues
    if len(name) > 200:
        name = name[:200]

    return name or "untitled"


def get_unique_path(
    base_dir: Path, name: str, used_paths: set[Path], unique_suffix: str
) -> Path:
    """Return a collision-safe path for a sanitized LabArchives name."""
    sanitized_name = sanitize_filename(name)
    candidate = base_dir / sanitized_name

    if candidate not in used_paths:
        used_paths.add(candidate)
        return candidate

    sanitized_suffix = sanitize_filename(unique_suffix)[:8] or "dup"
    candidate = base_dir / f"{sanitized_name}_{sanitized_suffix}"

    counter = 1
    while candidate in used_paths:
        candidate = base_dir / f"{sanitized_name}_{sanitized_suffix}_{counter}"
        counter += 1

    used_paths.add(candidate)
    return candidate


def download_page(page: NotebookPage, output_dir: Path, used_paths: set[Path]) -> None:
    """Download a page and its entries to a directory."""
    page_dir = get_unique_path(output_dir, page.name, used_paths, page.id)
    page_dir.mkdir(parents=True, exist_ok=True)

    print(f"  Downloading page: {page.name}")

    # Save page metadata
    metadata_file = page_dir / "_metadata.txt"
    with metadata_file.open("w", encoding="utf-8") as f:
        f.write(f"Page: {page.name}\n")
        f.write(f"ID: {page.id}\n")
        f.write(f"Entry count: {len(page.entries)}\n")

    # Maps content_type → (filename suffix, display label)
    text_entry_types = {
        "text entry": ("_text.html", "Text entry"),
        "plain text entry": ("_plaintext.txt", "Plain text entry"),
        "heading": ("_header.txt", "Header"),
    }

    # Download each entry
    for i, entry in enumerate(page.entries, start=1):
        entry_prefix = f"{i:03d}"

        try:
            if entry.content_type == "Attachment":
                attachment = entry.content
                filename = sanitize_filename(attachment.filename)
                output_path = page_dir / f"{entry_prefix}_attachment_{filename}"
                print(f"    Entry {i}: Attachment - {filename}")
                with output_path.open("wb") as f:
                    attachment.seek(0)
                    f.write(attachment.read())
                if attachment.caption:
                    caption_file = page_dir / f"{entry_prefix}_caption.txt"
                    with caption_file.open("w", encoding="utf-8") as f:
                        f.write(attachment.caption)

            elif entry.content_type in text_entry_types:
                suffix, label = text_entry_types[entry.content_type]
                output_path = page_dir / f"{entry_prefix}{suffix}"
                print(f"    Entry {i}: {label}")
                with output_path.open("w", encoding="utf-8") as f:
                    f.write(entry.content)

            elif entry.content_type == "widget entry":
                output_path = page_dir / f"{entry_prefix}_widget.txt"
                print(f"    Entry {i}: Widget (read-only)")
                with output_path.open("w", encoding="utf-8") as f:
                    f.write(
                        f"Widget Entry (ID: {entry.id})\nNote: Widget entries are read-only and cannot be fully exported\n"
                    )

            else:
                output_path = page_dir / f"{entry_prefix}_unknown.txt"
                print(f"    Entry {i}: Unknown type ({entry.content_type})")
                with output_path.open("w", encoding="utf-8") as f:
                    f.write(
                        f"Unknown entry type: {entry.content_type}\nEntry ID: {entry.id}\n"
                    )

        except Exception as e:
            print(f"    Entry {i}: Error - {e}")
            error_file = page_dir / f"{entry_prefix}_error.txt"
            with error_file.open("w", encoding="utf-8") as f:
                f.write(
                    f"Error downloading entry {i}: {e}\nEntry type: {entry.content_type}\n"
                )


def download_directory(
    directory: AbstractTreeContainer, output_dir: Path, used_paths: set[Path]
) -> None:
    """Recursively download a directory and its contents."""
    dir_path = get_unique_path(output_dir, directory.name, used_paths, directory.id)
    dir_path.mkdir(parents=True, exist_ok=True)

    print(f"Downloading directory: {directory.name}")

    # Process all children
    for child in directory.children:
        if child.is_dir():
            # Recursively download subdirectory
            download_directory(child.as_dir(), dir_path, used_paths)
        else:
            # Download page
            download_page(child.as_page(), dir_path, used_paths)


def download_notebook_or_folder(
    user: User, notebook_name: str, path: str | None, output_dir: Path
) -> None:
    """Download a notebook or folder from LabArchives."""
    notebooks = user.notebooks
    try:
        notebook = notebooks[notebook_name]

        # Navigate to subfolder if specified
        if path:
            print(f"Navigating to: {path}")
            target = notebook.traverse(path)
        else:
            target = notebook

        used_paths: set[Path] = set()

        # Download the target
        if target.is_dir():
            download_directory(target.as_dir(), output_dir, used_paths)
        else:
            # It's a page
            download_page(target.as_page(), output_dir, used_paths)

        print(f"\nDownload complete! Content saved to: {output_dir.absolute()}")

    except KeyError as e:
        print(f"Error: Could not find notebook '{notebook_name}' or path '{path}': {e}")
        print(f"Available notebooks: {list(notebooks.keys())}")
        sys.exit(1)
    except Exception as e:
        print(f"Error during download: {e}")
        sys.exit(1)


def main() -> None:
    """Run the folder download example CLI."""
    parser = argparse.ArgumentParser(
        description="Download LabArchives folder structure to local disk"
    )
    parser.add_argument("output", help="Local output directory path")
    parser.add_argument(
        "--notebook",
        "-n",
        required=True,
        help="Name of the LabArchives notebook to download from",
    )
    parser.add_argument(
        "--path",
        "-p",
        help="Optional path within notebook (e.g., 'Experiments/2024'). If not specified, downloads entire notebook.",
    )
    parser.add_argument(
        "--overwrite", action="store_true", help="Overwrite existing files"
    )

    args = parser.parse_args()

    output_dir = Path(args.output)

    # Check if output directory exists
    if output_dir.exists() and not args.overwrite and any(output_dir.iterdir()):
        print(f"Error: Output directory '{output_dir}' exists and is not empty")
        print("Use --overwrite to overwrite existing files")
        sys.exit(1)

    print("Connecting to LabArchives...")
    try:
        with Client() as client:
            print("Authenticating...")
            user = client.default_authenticate()
            print("✓ Authenticated successfully")

            download_notebook_or_folder(user, args.notebook, args.path, output_dir)
    except Exception as e:
        print(f"Authentication error: {e}")
        print("\nMake sure you have a .env file with your credentials:")
        print("  ACCESS_KEYID=your_access_key_id")
        print("  ACCESS_PWD=your_password")
        sys.exit(1)


if __name__ == "__main__":
    main()