s1f - File Splitter

A modern file extraction tool with async I/O that reconstructs original files from combined archives with full metadata preservation.

Overview

The s1f tool (v2.0.0) is the counterpart to m1f, designed to extract and reconstruct original files from a combined file. Built with Python 3.10+ and modern async architecture, it ensures reliable extraction with checksum verification and proper encoding handling.

Key Features

Async I/O: High-performance concurrent file writing
Smart Parser Framework: Automatic format detection with dedicated parsers
Type Safety: Full type annotations throughout the codebase
Modern Architecture: Clean modular design with dependency injection
Checksum Verification: SHA256 integrity checking with line ending normalization
Encoding Support: Intelligent encoding detection and conversion
Error Recovery: Graceful fallbacks and detailed error reporting
Progress Tracking: Real-time extraction statistics

Quick Start

# Basic extraction (positional arguments - recommended)
m1f-s1f ./combined.txt ./extracted_files

# Basic extraction (option-style arguments)
m1f-s1f -i ./combined.txt -d ./extracted_files

# List files without extracting
m1f-s1f --list ./combined.txt

# Force overwrite of existing files
m1f-s1f ./combined.txt ./extracted_files -f

# Verbose output to see detailed extraction progress
m1f-s1f ./combined.txt ./extracted_files -v

# Extract with specific encoding
m1f-s1f ./combined.txt ./extracted_files --target-encoding utf-16-le

Command Line Options

s1f supports both positional and option-style arguments for flexibility:

Positional Arguments (recommended)

s1f <input_file> <destination_directory>

Option-Style Arguments (backward compatibility)

s1f -i <input_file> -d <destination_directory>

All Options

Option	Description
`-i, --input-file`	Path to the combined input file (can also be first positional argument)
`-d, --destination-directory`	Directory where extracted files will be saved (can also be second positional argument)
`-l, --list`	List files in the archive without extracting them
`-f, --force`	Force overwrite of existing files without prompting
`-v, --verbose`	Enable verbose output
`--version`	Show version information and exit
`--timestamp-mode`	How to set file timestamps (original or current)
`--ignore-checksum`	Skip checksum verification for MachineReadable files
`--respect-encoding`	Try to use the original file encoding when writing extracted files
`--target-encoding`	Explicitly specify the character encoding for all extracted files

Usage Examples

Basic Operations

# Basic command (positional arguments)
m1f-s1f /path/to/combined_output.txt /path/to/output_folder

# Basic command (option-style)
m1f-s1f --input-file /path/to/combined_output.txt \
  --destination-directory /path/to/output_folder

# List files in archive without extracting
m1f-s1f --list ./output/bundle.m1f.txt

# Splitting a MachineReadable file with force overwrite and verbose output
m1f-s1f ./output/bundle.m1f.txt ./extracted_project -f -v

# Check version
m1f-s1f --version

Advanced Operations

# Using current system time for timestamps
m1f-s1f -i ./combined_file.txt -d ./extracted_files \
  --timestamp-mode current

# Preserving original file encodings
m1f-s1f -i ./with_encodings.txt -d ./extracted_files \
  --respect-encoding

# Using a specific encoding for all extracted files
m1f-s1f -i ./combined_file.txt -d ./extracted_files \
  --target-encoding utf-8

# Ignoring checksum verification (when files were intentionally modified)
m1f-s1f -i ./modified_bundle.m1f.txt -d ./extracted_files \
  --ignore-checksum

Supported File Formats

The s1f tool can extract files from combined files created with any of the m1f separator styles:

Standard Style - Simple separators with file paths and checksums
Detailed Style - Comprehensive separators with full metadata
Markdown Style - Formatted with Markdown syntax for documentation
MachineReadable Style - Structured format with JSON metadata and UUID boundaries
None Style - Files combined without separators (limited extraction capability)

For the most reliable extraction, use files created with the MachineReadable separator style, as these contain complete metadata and checksums for verification.

Common Workflows

Extract and Verify

This workflow ensures the integrity of extracted files:

# Step 1: Extract the files with verification
m1f-s1f -i ./project_bundle.m1f.txt -d ./extracted_project -v

# Step 2: Check for any checksum errors in the output
# If any errors are reported, consider using --ignore-checksum if appropriate

Multiple Extraction Targets

When you need to extract the same combined file to different locations:

# Extract for development
m1f-s1f -i ./project.m1f.txt -d ./dev_workspace

# Extract for backup with original timestamps
m1f-s1f -i ./project.m1f.txt -d ./backup --timestamp-mode original

Working with Different Encodings

# Respect original file encodings where possible
m1f-s1f ./combined.txt ./extracted/ --respect-encoding

# Force all files to UTF-8
m1f-s1f ./combined.txt ./extracted/ --target-encoding utf-8

# Use specific encoding for all files
m1f-s1f ./combined.txt ./extracted/ --target-encoding latin-1

Architecture

S1F v2.0.0 features a modern, modular architecture:

tools/s1f/
├── __init__.py       # Package initialization
├── __main__.py       # Entry point for module execution
├── cli.py            # Command-line interface
├── config.py         # Configuration management
├── core.py           # Core extraction logic with async I/O
├── exceptions.py     # Custom exceptions
├── logging.py        # Structured logging
├── models.py         # Data models (ExtractedFile, etc.)
├── parsers.py        # Abstract parser framework
├── utils.py          # Utility functions
└── writers.py        # Output writers (file, stdout)

Key Components

Async I/O: Concurrent file operations for better performance
Parser Framework: Extensible system for handling different file formats
Type Safety: Full type hints and dataclass models
Clean Architecture: Separation of concerns with dependency injection

Parser Framework

s1f uses a parser framework to handle different separator styles:

Standard Parser

Handles simple separators like:

======= path/to/file.py ======

Detailed Parser

Handles comprehensive metadata like:

========================================================================================
== FILE: path/to/file.py
== DATE: 2025-05-15 14:30:21 | SIZE: 2.50 KB | TYPE: .py
== CHECKSUM_SHA256: abcdef1234567890...
========================================================================================

Markdown Parser

Handles Markdown-formatted separators with code blocks.

MachineReadable Parser

Handles structured format with JSON metadata:

--- PYMK1F_BEGIN_FILE_METADATA_BLOCK_uuid ---
METADATA_JSON: {...}
--- PYMK1F_END_FILE_METADATA_BLOCK_uuid ---

Performance

S1F v2.0.0 includes significant performance improvements:

Async I/O: Concurrent file writing for 3-5x faster extraction on SSDs
Optimized Parsing: Efficient line-by-line processing with minimal memory usage
Smart Buffering: Adaptive buffer sizes based on file characteristics

Error Handling

The tool provides comprehensive error handling:

Checksum Verification: Automatic integrity checking with clear error messages
Encoding Fallbacks: Graceful handling of encoding issues with multiple fallback strategies
Permission Errors: Clear reporting of file system permission issues
Partial Recovery: Continue extraction even if individual files fail

Timestamp Modes

Control how file timestamps are set:

Original Mode (default)

m1f-s1f ./combined.txt ./extracted/ --timestamp-mode original

Uses the timestamps from when files were originally combined.

Current Mode

m1f-s1f ./combined.txt ./extracted/ --timestamp-mode current

Uses the current system time for all extracted files.

Encoding Handling

s1f provides flexible encoding options:

Default Behavior

All files are written in UTF-8 encoding.

Respect Original Encoding

m1f-s1f ./combined.txt ./extracted/ --respect-encoding

Attempts to use the original file encoding when available in metadata.

Force Specific Encoding

m1f-s1f ./combined.txt ./extracted/ --target-encoding latin-1

Forces all files to use the specified encoding.

Checksum Verification

For MachineReadable files, s1f automatically verifies file integrity:

Automatic Verification

m1f-s1f ./bundle.m1f.txt ./extracted/

Checksums are verified automatically and errors are reported.

Skip Verification

m1f-s1f ./bundle.m1f.txt ./extracted/ --ignore-checksum

Useful when files were intentionally modified after being combined.

Integration with m1f Workflow

s1f complements the m1f workflow:

# 1. Create bundle with m1f
m1f -s ./project -o ./bundle.m1f.txt --separator-style MachineReadable

# 2. Extract with s1f
m1f-s1f ./bundle.m1f.txt ./extracted_project

# 3. Verify extraction
diff -r ./project ./extracted_project

Common Use Cases

Development Backup and Restore

# Create backup
m1f -s ./project -o ./backup.m1f.txt --separator-style MachineReadable

# Restore from backup
m1f-s1f ./backup.m1f.txt ./restored_project

Code Review Distribution

# Reviewer receives bundle
m1f-s1f ./code_review.m1f.txt ./review_files

# Make changes and create new bundle
m1f -s ./review_files -o ./reviewed_code.m1f.txt

Template Distribution

# Create template bundle
m1f -s ./template -o ./project_template.m1f.txt

# Extract template for new project
m1f-s1f ./project_template.m1f.txt ./new_project

Troubleshooting

Checksum Errors

# If checksums don't match, you can ignore them
m1f-s1f ./bundle.m1f.txt ./extracted/ --ignore-checksum

Encoding Issues

# Try respecting original encoding
m1f-s1f ./bundle.txt ./extracted/ --respect-encoding

# Or force UTF-8
m1f-s1f ./bundle.txt ./extracted/ --target-encoding utf-8

Permission Errors

# Extract with verbose output to see detailed errors
m1f-s1f ./bundle.txt ./extracted/ -v

Empty Extractions

# List files first to verify content
m1f-s1f --list ./bundle.txt

Best Practices

Use MachineReadable format: For most reliable extraction
Verify checksums: Don’t ignore checksum errors unless necessary
Test extraction: Always verify extracted files match originals
Use verbose mode: For debugging and progress tracking
Backup before extraction: Use --force carefully

m1f - Create bundles that s1f can extract
html2md - Convert HTML before bundling
scrape - Download web content for bundling
token-counter - Estimate token usage

Next Steps

Learn about m1f bundling to create extractable files
Explore presets for consistent bundling
Check out auto-bundling workflows
Review security best practices