Tools

s1f - File Splitter

Extract and reconstruct original files from m1f bundles with full metadata preservation

A modern file extraction tool with async I/O that reconstructs original files from combined archives with full metadata preservation.

Overview

The s1f tool (v2.0.0) is the counterpart to m1f, designed to extract and reconstruct original files from a combined file. Built with Python 3.10+ and modern async architecture, it ensures reliable extraction with checksum verification and proper encoding handling.

Key Features

  • Async I/O: High-performance concurrent file writing
  • Smart Parser Framework: Automatic format detection with dedicated parsers
  • Type Safety: Full type annotations throughout the codebase
  • Modern Architecture: Clean modular design with dependency injection
  • Checksum Verification: SHA256 integrity checking with line ending normalization
  • Encoding Support: Intelligent encoding detection and conversion
  • Error Recovery: Graceful fallbacks and detailed error reporting
  • Progress Tracking: Real-time extraction statistics

Quick Start

# Basic extraction (positional arguments - recommended)
m1f-s1f ./combined.txt ./extracted_files

# Basic extraction (option-style arguments)
m1f-s1f -i ./combined.txt -d ./extracted_files

# List files without extracting
m1f-s1f --list ./combined.txt

# Force overwrite of existing files
m1f-s1f ./combined.txt ./extracted_files -f

# Verbose output to see detailed extraction progress
m1f-s1f ./combined.txt ./extracted_files -v

# Extract with specific encoding
m1f-s1f ./combined.txt ./extracted_files --target-encoding utf-16-le

Command Line Options

s1f supports both positional and option-style arguments for flexibility:

s1f <input_file> <destination_directory>

Option-Style Arguments (backward compatibility)

s1f -i <input_file> -d <destination_directory>

All Options

OptionDescription
-i, --input-filePath to the combined input file (can also be first positional argument)
-d, --destination-directoryDirectory where extracted files will be saved (can also be second positional argument)
-l, --listList files in the archive without extracting them
-f, --forceForce overwrite of existing files without prompting
-v, --verboseEnable verbose output
--versionShow version information and exit
--timestamp-modeHow to set file timestamps (original or current)
--ignore-checksumSkip checksum verification for MachineReadable files
--respect-encodingTry to use the original file encoding when writing extracted files
--target-encodingExplicitly specify the character encoding for all extracted files

Usage Examples

Basic Operations

# Basic command (positional arguments)
m1f-s1f /path/to/combined_output.txt /path/to/output_folder

# Basic command (option-style)
m1f-s1f --input-file /path/to/combined_output.txt \
  --destination-directory /path/to/output_folder

# List files in archive without extracting
m1f-s1f --list ./output/bundle.m1f.txt

# Splitting a MachineReadable file with force overwrite and verbose output
m1f-s1f ./output/bundle.m1f.txt ./extracted_project -f -v

# Check version
m1f-s1f --version

Advanced Operations

# Using current system time for timestamps
m1f-s1f -i ./combined_file.txt -d ./extracted_files \
  --timestamp-mode current

# Preserving original file encodings
m1f-s1f -i ./with_encodings.txt -d ./extracted_files \
  --respect-encoding

# Using a specific encoding for all extracted files
m1f-s1f -i ./combined_file.txt -d ./extracted_files \
  --target-encoding utf-8

# Ignoring checksum verification (when files were intentionally modified)
m1f-s1f -i ./modified_bundle.m1f.txt -d ./extracted_files \
  --ignore-checksum

Supported File Formats

The s1f tool can extract files from combined files created with any of the m1f separator styles:

  • Standard Style - Simple separators with file paths and checksums
  • Detailed Style - Comprehensive separators with full metadata
  • Markdown Style - Formatted with Markdown syntax for documentation
  • MachineReadable Style - Structured format with JSON metadata and UUID boundaries
  • None Style - Files combined without separators (limited extraction capability)

For the most reliable extraction, use files created with the MachineReadable separator style, as these contain complete metadata and checksums for verification.

Common Workflows

Extract and Verify

This workflow ensures the integrity of extracted files:

# Step 1: Extract the files with verification
m1f-s1f -i ./project_bundle.m1f.txt -d ./extracted_project -v

# Step 2: Check for any checksum errors in the output
# If any errors are reported, consider using --ignore-checksum if appropriate

Multiple Extraction Targets

When you need to extract the same combined file to different locations:

# Extract for development
m1f-s1f -i ./project.m1f.txt -d ./dev_workspace

# Extract for backup with original timestamps
m1f-s1f -i ./project.m1f.txt -d ./backup --timestamp-mode original

Working with Different Encodings

# Respect original file encodings where possible
m1f-s1f ./combined.txt ./extracted/ --respect-encoding

# Force all files to UTF-8
m1f-s1f ./combined.txt ./extracted/ --target-encoding utf-8

# Use specific encoding for all files
m1f-s1f ./combined.txt ./extracted/ --target-encoding latin-1

Architecture

S1F v2.0.0 features a modern, modular architecture:

tools/s1f/
├── __init__.py       # Package initialization
├── __main__.py       # Entry point for module execution
├── cli.py            # Command-line interface
├── config.py         # Configuration management
├── core.py           # Core extraction logic with async I/O
├── exceptions.py     # Custom exceptions
├── logging.py        # Structured logging
├── models.py         # Data models (ExtractedFile, etc.)
├── parsers.py        # Abstract parser framework
├── utils.py          # Utility functions
└── writers.py        # Output writers (file, stdout)

Key Components

  • Async I/O: Concurrent file operations for better performance
  • Parser Framework: Extensible system for handling different file formats
  • Type Safety: Full type hints and dataclass models
  • Clean Architecture: Separation of concerns with dependency injection

Parser Framework

s1f uses a parser framework to handle different separator styles:

Standard Parser

Handles simple separators like:

======= path/to/file.py ======

Detailed Parser

Handles comprehensive metadata like:

========================================================================================
== FILE: path/to/file.py
== DATE: 2025-05-15 14:30:21 | SIZE: 2.50 KB | TYPE: .py
== CHECKSUM_SHA256: abcdef1234567890...
========================================================================================

Markdown Parser

Handles Markdown-formatted separators with code blocks.

MachineReadable Parser

Handles structured format with JSON metadata:

--- PYMK1F_BEGIN_FILE_METADATA_BLOCK_uuid ---
METADATA_JSON: {...}
--- PYMK1F_END_FILE_METADATA_BLOCK_uuid ---

Performance

S1F v2.0.0 includes significant performance improvements:

  • Async I/O: Concurrent file writing for 3-5x faster extraction on SSDs
  • Optimized Parsing: Efficient line-by-line processing with minimal memory usage
  • Smart Buffering: Adaptive buffer sizes based on file characteristics

Error Handling

The tool provides comprehensive error handling:

  • Checksum Verification: Automatic integrity checking with clear error messages
  • Encoding Fallbacks: Graceful handling of encoding issues with multiple fallback strategies
  • Permission Errors: Clear reporting of file system permission issues
  • Partial Recovery: Continue extraction even if individual files fail

Timestamp Modes

Control how file timestamps are set:

Original Mode (default)

m1f-s1f ./combined.txt ./extracted/ --timestamp-mode original

Uses the timestamps from when files were originally combined.

Current Mode

m1f-s1f ./combined.txt ./extracted/ --timestamp-mode current

Uses the current system time for all extracted files.

Encoding Handling

s1f provides flexible encoding options:

Default Behavior

All files are written in UTF-8 encoding.

Respect Original Encoding

m1f-s1f ./combined.txt ./extracted/ --respect-encoding

Attempts to use the original file encoding when available in metadata.

Force Specific Encoding

m1f-s1f ./combined.txt ./extracted/ --target-encoding latin-1

Forces all files to use the specified encoding.

Checksum Verification

For MachineReadable files, s1f automatically verifies file integrity:

Automatic Verification

m1f-s1f ./bundle.m1f.txt ./extracted/

Checksums are verified automatically and errors are reported.

Skip Verification

m1f-s1f ./bundle.m1f.txt ./extracted/ --ignore-checksum

Useful when files were intentionally modified after being combined.

Integration with m1f Workflow

s1f complements the m1f workflow:

# 1. Create bundle with m1f
m1f -s ./project -o ./bundle.m1f.txt --separator-style MachineReadable

# 2. Extract with s1f
m1f-s1f ./bundle.m1f.txt ./extracted_project

# 3. Verify extraction
diff -r ./project ./extracted_project

Common Use Cases

Development Backup and Restore

# Create backup
m1f -s ./project -o ./backup.m1f.txt --separator-style MachineReadable

# Restore from backup
m1f-s1f ./backup.m1f.txt ./restored_project

Code Review Distribution

# Reviewer receives bundle
m1f-s1f ./code_review.m1f.txt ./review_files

# Make changes and create new bundle
m1f -s ./review_files -o ./reviewed_code.m1f.txt

Template Distribution

# Create template bundle
m1f -s ./template -o ./project_template.m1f.txt

# Extract template for new project
m1f-s1f ./project_template.m1f.txt ./new_project

Troubleshooting

Checksum Errors

# If checksums don't match, you can ignore them
m1f-s1f ./bundle.m1f.txt ./extracted/ --ignore-checksum

Encoding Issues

# Try respecting original encoding
m1f-s1f ./bundle.txt ./extracted/ --respect-encoding

# Or force UTF-8
m1f-s1f ./bundle.txt ./extracted/ --target-encoding utf-8

Permission Errors

# Extract with verbose output to see detailed errors
m1f-s1f ./bundle.txt ./extracted/ -v

Empty Extractions

# List files first to verify content
m1f-s1f --list ./bundle.txt

Best Practices

  1. Use MachineReadable format: For most reliable extraction
  2. Verify checksums: Don’t ignore checksum errors unless necessary
  3. Test extraction: Always verify extracted files match originals
  4. Use verbose mode: For debugging and progress tracking
  5. Backup before extraction: Use --force carefully
  • m1f - Create bundles that s1f can extract
  • html2md - Convert HTML before bundling
  • scrape - Download web content for bundling
  • token-counter - Estimate token usage

Next Steps