Tools
s1f - File Splitter
Extract and reconstruct original files from m1f bundles with full metadata preservation
A modern file extraction tool with async I/O that reconstructs original files from combined archives with full metadata preservation.
Overview
The s1f tool (v2.0.0) is the counterpart to m1f, designed to extract and reconstruct original files from a combined file. Built with Python 3.10+ and modern async architecture, it ensures reliable extraction with checksum verification and proper encoding handling.
Key Features
- Async I/O: High-performance concurrent file writing
- Smart Parser Framework: Automatic format detection with dedicated parsers
- Type Safety: Full type annotations throughout the codebase
- Modern Architecture: Clean modular design with dependency injection
- Checksum Verification: SHA256 integrity checking with line ending normalization
- Encoding Support: Intelligent encoding detection and conversion
- Error Recovery: Graceful fallbacks and detailed error reporting
- Progress Tracking: Real-time extraction statistics
Quick Start
# Basic extraction (positional arguments - recommended)
m1f-s1f ./combined.txt ./extracted_files
# Basic extraction (option-style arguments)
m1f-s1f -i ./combined.txt -d ./extracted_files
# List files without extracting
m1f-s1f --list ./combined.txt
# Force overwrite of existing files
m1f-s1f ./combined.txt ./extracted_files -f
# Verbose output to see detailed extraction progress
m1f-s1f ./combined.txt ./extracted_files -v
# Extract with specific encoding
m1f-s1f ./combined.txt ./extracted_files --target-encoding utf-16-le
Command Line Options
s1f supports both positional and option-style arguments for flexibility:
Positional Arguments (recommended)
s1f <input_file> <destination_directory>
Option-Style Arguments (backward compatibility)
s1f -i <input_file> -d <destination_directory>
All Options
Option | Description |
---|---|
-i, --input-file | Path to the combined input file (can also be first positional argument) |
-d, --destination-directory | Directory where extracted files will be saved (can also be second positional argument) |
-l, --list | List files in the archive without extracting them |
-f, --force | Force overwrite of existing files without prompting |
-v, --verbose | Enable verbose output |
--version | Show version information and exit |
--timestamp-mode | How to set file timestamps (original or current) |
--ignore-checksum | Skip checksum verification for MachineReadable files |
--respect-encoding | Try to use the original file encoding when writing extracted files |
--target-encoding | Explicitly specify the character encoding for all extracted files |
Usage Examples
Basic Operations
# Basic command (positional arguments)
m1f-s1f /path/to/combined_output.txt /path/to/output_folder
# Basic command (option-style)
m1f-s1f --input-file /path/to/combined_output.txt \
--destination-directory /path/to/output_folder
# List files in archive without extracting
m1f-s1f --list ./output/bundle.m1f.txt
# Splitting a MachineReadable file with force overwrite and verbose output
m1f-s1f ./output/bundle.m1f.txt ./extracted_project -f -v
# Check version
m1f-s1f --version
Advanced Operations
# Using current system time for timestamps
m1f-s1f -i ./combined_file.txt -d ./extracted_files \
--timestamp-mode current
# Preserving original file encodings
m1f-s1f -i ./with_encodings.txt -d ./extracted_files \
--respect-encoding
# Using a specific encoding for all extracted files
m1f-s1f -i ./combined_file.txt -d ./extracted_files \
--target-encoding utf-8
# Ignoring checksum verification (when files were intentionally modified)
m1f-s1f -i ./modified_bundle.m1f.txt -d ./extracted_files \
--ignore-checksum
Supported File Formats
The s1f tool can extract files from combined files created with any of the m1f separator styles:
- Standard Style - Simple separators with file paths and checksums
- Detailed Style - Comprehensive separators with full metadata
- Markdown Style - Formatted with Markdown syntax for documentation
- MachineReadable Style - Structured format with JSON metadata and UUID boundaries
- None Style - Files combined without separators (limited extraction capability)
For the most reliable extraction, use files created with the MachineReadable separator style, as these contain complete metadata and checksums for verification.
Common Workflows
Extract and Verify
This workflow ensures the integrity of extracted files:
# Step 1: Extract the files with verification
m1f-s1f -i ./project_bundle.m1f.txt -d ./extracted_project -v
# Step 2: Check for any checksum errors in the output
# If any errors are reported, consider using --ignore-checksum if appropriate
Multiple Extraction Targets
When you need to extract the same combined file to different locations:
# Extract for development
m1f-s1f -i ./project.m1f.txt -d ./dev_workspace
# Extract for backup with original timestamps
m1f-s1f -i ./project.m1f.txt -d ./backup --timestamp-mode original
Working with Different Encodings
# Respect original file encodings where possible
m1f-s1f ./combined.txt ./extracted/ --respect-encoding
# Force all files to UTF-8
m1f-s1f ./combined.txt ./extracted/ --target-encoding utf-8
# Use specific encoding for all files
m1f-s1f ./combined.txt ./extracted/ --target-encoding latin-1
Architecture
S1F v2.0.0 features a modern, modular architecture:
tools/s1f/
├── __init__.py # Package initialization
├── __main__.py # Entry point for module execution
├── cli.py # Command-line interface
├── config.py # Configuration management
├── core.py # Core extraction logic with async I/O
├── exceptions.py # Custom exceptions
├── logging.py # Structured logging
├── models.py # Data models (ExtractedFile, etc.)
├── parsers.py # Abstract parser framework
├── utils.py # Utility functions
└── writers.py # Output writers (file, stdout)
Key Components
- Async I/O: Concurrent file operations for better performance
- Parser Framework: Extensible system for handling different file formats
- Type Safety: Full type hints and dataclass models
- Clean Architecture: Separation of concerns with dependency injection
Parser Framework
s1f uses a parser framework to handle different separator styles:
Standard Parser
Handles simple separators like:
======= path/to/file.py ======
Detailed Parser
Handles comprehensive metadata like:
========================================================================================
== FILE: path/to/file.py
== DATE: 2025-05-15 14:30:21 | SIZE: 2.50 KB | TYPE: .py
== CHECKSUM_SHA256: abcdef1234567890...
========================================================================================
Markdown Parser
Handles Markdown-formatted separators with code blocks.
MachineReadable Parser
Handles structured format with JSON metadata:
--- PYMK1F_BEGIN_FILE_METADATA_BLOCK_uuid ---
METADATA_JSON: {...}
--- PYMK1F_END_FILE_METADATA_BLOCK_uuid ---
Performance
S1F v2.0.0 includes significant performance improvements:
- Async I/O: Concurrent file writing for 3-5x faster extraction on SSDs
- Optimized Parsing: Efficient line-by-line processing with minimal memory usage
- Smart Buffering: Adaptive buffer sizes based on file characteristics
Error Handling
The tool provides comprehensive error handling:
- Checksum Verification: Automatic integrity checking with clear error messages
- Encoding Fallbacks: Graceful handling of encoding issues with multiple fallback strategies
- Permission Errors: Clear reporting of file system permission issues
- Partial Recovery: Continue extraction even if individual files fail
Timestamp Modes
Control how file timestamps are set:
Original Mode (default)
m1f-s1f ./combined.txt ./extracted/ --timestamp-mode original
Uses the timestamps from when files were originally combined.
Current Mode
m1f-s1f ./combined.txt ./extracted/ --timestamp-mode current
Uses the current system time for all extracted files.
Encoding Handling
s1f provides flexible encoding options:
Default Behavior
All files are written in UTF-8 encoding.
Respect Original Encoding
m1f-s1f ./combined.txt ./extracted/ --respect-encoding
Attempts to use the original file encoding when available in metadata.
Force Specific Encoding
m1f-s1f ./combined.txt ./extracted/ --target-encoding latin-1
Forces all files to use the specified encoding.
Checksum Verification
For MachineReadable files, s1f automatically verifies file integrity:
Automatic Verification
m1f-s1f ./bundle.m1f.txt ./extracted/
Checksums are verified automatically and errors are reported.
Skip Verification
m1f-s1f ./bundle.m1f.txt ./extracted/ --ignore-checksum
Useful when files were intentionally modified after being combined.
Integration with m1f Workflow
s1f complements the m1f workflow:
# 1. Create bundle with m1f
m1f -s ./project -o ./bundle.m1f.txt --separator-style MachineReadable
# 2. Extract with s1f
m1f-s1f ./bundle.m1f.txt ./extracted_project
# 3. Verify extraction
diff -r ./project ./extracted_project
Common Use Cases
Development Backup and Restore
# Create backup
m1f -s ./project -o ./backup.m1f.txt --separator-style MachineReadable
# Restore from backup
m1f-s1f ./backup.m1f.txt ./restored_project
Code Review Distribution
# Reviewer receives bundle
m1f-s1f ./code_review.m1f.txt ./review_files
# Make changes and create new bundle
m1f -s ./review_files -o ./reviewed_code.m1f.txt
Template Distribution
# Create template bundle
m1f -s ./template -o ./project_template.m1f.txt
# Extract template for new project
m1f-s1f ./project_template.m1f.txt ./new_project
Troubleshooting
Checksum Errors
# If checksums don't match, you can ignore them
m1f-s1f ./bundle.m1f.txt ./extracted/ --ignore-checksum
Encoding Issues
# Try respecting original encoding
m1f-s1f ./bundle.txt ./extracted/ --respect-encoding
# Or force UTF-8
m1f-s1f ./bundle.txt ./extracted/ --target-encoding utf-8
Permission Errors
# Extract with verbose output to see detailed errors
m1f-s1f ./bundle.txt ./extracted/ -v
Empty Extractions
# List files first to verify content
m1f-s1f --list ./bundle.txt
Best Practices
- Use MachineReadable format: For most reliable extraction
- Verify checksums: Don’t ignore checksum errors unless necessary
- Test extraction: Always verify extracted files match originals
- Use verbose mode: For debugging and progress tracking
- Backup before extraction: Use
--force
carefully
Related Tools
- m1f - Create bundles that s1f can extract
- html2md - Convert HTML before bundling
- scrape - Download web content for bundling
- token-counter - Estimate token usage
Next Steps
- Learn about m1f bundling to create extractable files
- Explore presets for consistent bundling
- Check out auto-bundling workflows
- Review security best practices
- Previous
- Configuration
- Next
- Core Concepts