Tools
m1f - Main Bundler
The core tool for combining multiple files into a single file with metadata and deduplication
A modern, high-performance tool that combines multiple files into a single file with rich metadata, content deduplication, and async I/O support.
Overview
The m1f tool (v3.4.0) solves a common challenge when working with LLMs: providing sufficient context without exceeding token limits. Built with Python 3.10+ and modern architecture patterns, it creates optimized reference files from multiple sources while automatically handling duplicates and providing comprehensive metadata.
Key Features
- Content Deduplication: Automatically detects and skips duplicate files based on SHA256 checksums
- Async I/O: High-performance file operations with concurrent processing
- Type Safety: Full type annotations throughout the codebase
- Modern Architecture: Modular package structure with clean separation of concerns
- Smart Filtering: Advanced file filtering with size limits, extensions, and patterns
- Symlink Support: Intelligent symlink handling with cycle detection
- Professional Security: Integration with detect-secrets for sensitive data detection
- Colorized Output: Beautiful console output with progress indicators
Quick Start
Initialize m1f in Your Project
# Quick setup for any project
cd /your/project
m1f-init
# Initialize without creating symlink to m1f documentation
m1f-init --no-symlink
# This creates in the m1f/ directory:
# - <project>_complete.txt # Full project bundle
# - <project>_complete_filelist.txt # List of all included files
# - <project>_complete_dirlist.txt # List of all directories
# - <project>_docs.txt # Documentation bundle
# - <project>_docs_filelist.txt # List of documentation files
# - <project>_docs_dirlist.txt # Documentation directories
# - .m1f.config.yml # Configuration file
Basic m1f Commands
# Basic usage with a source directory
m1f -s ./your_project -o ./combined.txt
# Include only specific file types
m1f -s ./your_project -o ./combined.txt --include-extensions .py .js .md
# Include only documentation files (62 extensions)
m1f -s ./your_project -o ./docs_bundle.txt --docs-only
# Exclude specific directories
m1f -s ./your_project -o ./combined.txt --excludes "node_modules/" "build/" "dist/"
# Filter by file size
m1f -s ./your_project -o ./combined.txt --max-file-size 50KB
Command Line Options
Option | Description |
---|---|
-s, --source-directory | Path to the directory containing files to process. Can be specified multiple times |
-i, --input-file | Path to a file containing a list of files/directories to process |
-o, --output-file | Path for the combined output file |
-f, --force | Force overwrite of existing output file without prompting |
-t, --add-timestamp | Add a timestamp (_YYYYMMDD_HHMMSS) to the output filename |
--filename-mtime-hash | Append a hash of file modification timestamps to the filename |
--include-extensions | Space-separated list of file extensions to include |
--exclude-extensions | Space-separated list of file extensions to exclude |
--includes | Space-separated list of gitignore-style patterns to include |
--docs-only | Include only documentation files (62 extensions) |
--max-file-size | Skip files larger than the specified size (e.g., 50KB) |
--exclude-paths-file | Path to file containing paths or patterns to exclude |
--no-default-excludes | Disable default directory exclusions |
--excludes | Space-separated list of paths to exclude |
--include-dot-paths | Include files and directories that start with a dot |
--include-binary-files | Attempt to include files with binary extensions |
--remove-scraped-metadata | Remove scraped metadata from HTML2MD files |
--separator-style | Style of separators between files (Standard, Detailed, Markdown, MachineReadable, None) |
--line-ending | Line ending for script-generated separators (lf or crlf) |
--convert-to-charset | Convert all files to the specified character encoding |
--abort-on-encoding-error | Abort processing if encoding conversion errors occur |
-v, --verbose | Enable verbose logging |
--minimal-output | Generate only the combined output file (no auxiliary files) |
--skip-output-file | Execute operations but skip writing the final output file |
-q, --quiet | Suppress all console output |
--create-archive | Create a backup archive of all processed files |
--archive-type | Type of archive to create (zip or tar.gz) |
--security-check | Scan files for secrets before merging (abort, skip, warn) |
--preset | One or more preset configuration files for file-specific processing |
--preset-group | Specific preset group to use from the configuration |
--disable-presets | Disable all preset processing |
Usage Examples
Basic Operations
# Basic command using a source directory
m1f --source-directory /path/to/your/code \
--output-file /path/to/combined_output.txt
# Using multiple source directories
m1f -s ./src -s ./docs -s ./tests -o combined_output.txt
# Using an input file containing paths to process
m1f -i filelist.txt -o combined_output.txt
# Using both source directory and input file together
m1f -s ./source_code -i ./file_list.txt -o ./combined.txt
# Using include patterns to filter files
m1f -s ./project -o output.txt --includes "src/**" "*.py" "!*_test.py"
# Remove scraped metadata from HTML2MD files
m1f -s ./scraped_docs -o ./clean_docs.txt \
--include-extensions .md --remove-scraped-metadata
Advanced Operations
# Using MachineReadable style with verbose logging
m1f -s ./my_project -o ./output/bundle.m1f.txt \
--separator-style MachineReadable --force --verbose
# Creating a combined file and a backup zip archive
m1f -s ./source_code -o ./dist/combined.txt \
--create-archive --archive-type zip
# Only include text files under 50KB to avoid large generated files
m1f -s ./project -o ./text_only.txt \
--max-file-size 50KB --include-extensions .py .js .md .txt .json
Security Check
The --security-check
option scans files for potential secrets using detect-secrets
if available:
abort
– stop processing immediately and do not create the output fileskip
– omit files that contain secrets from the final outputwarn
– include all files but print a summary warning at the end
# Stop processing if secrets are found
m1f -s . -o output.txt --security-check abort
# Skip files with secrets
m1f -s . -o output.txt --security-check skip
# Include all files but show warnings
m1f -s . -o output.txt --security-check warn
Output Files
By default, m1f creates several output files:
- Primary output file - The combined file specified by
--output-file
- Log file - A
.log
file with detailed processing information - File list - A
_filelist.txt
file containing paths of all included files - Directory list - A
_dirlist.txt
file containing all unique directories - Archive file - An optional backup archive if
--create-archive
is specified
To create only the primary output file:
m1f -s ./src -o ./combined.txt --minimal-output
Separator Styles
Choose how files are separated in the combined output:
Standard Style
======= path/to/file.py ======
Detailed Style (Default)
========================================================================================
== FILE: path/to/file.py
== DATE: 2025-05-15 14:30:21 | SIZE: 2.50 KB | TYPE: .py
== CHECKSUM_SHA256: abcdef1234567890...
========================================================================================
Markdown Style
## path/to/file.py
**Date Modified:** 2025-05-15 14:30:21 | **Size:** 2.50 KB | **Type:** .py
```python
# File content starts here
def example():
return "Hello, world!"
```
MachineReadable Style
--- PYMK1F_BEGIN_FILE_METADATA_BLOCK_12345678-1234-1234-1234-123456789abc ---
METADATA_JSON:
{
"original_filepath": "path/to/file.py",
"timestamp_utc_iso": "2025-05-15T14:30:21Z",
"size_bytes": 2560,
"checksum_sha256": "abcdef1234567890..."
}
--- PYMK1F_END_FILE_METADATA_BLOCK_12345678-1234-1234-1234-123456789abc ---
--- PYMK1F_BEGIN_FILE_CONTENT_BLOCK_12345678-1234-1234-1234-123456789abc ---
# File content here
--- PYMK1F_END_FILE_CONTENT_BLOCK_12345678-1234-1234-1234-123456789abc ---
None Style
Files are concatenated directly without separators.
Common Use Cases
Documentation Compilation
m1f -s ./docs -o ./doc_bundle.m1f.txt --include-extensions .md
Code Review Preparation
m1f -i code_review_files.txt -o ./review_bundle.m1f.txt
WordPress Development
m1f -s ./wp-content/themes/my-theme -o ./theme_context.m1f.txt \
--include-extensions .php .js .css --exclude-paths-file ./exclude_build_files.txt
Project Knowledge Base
m1f -s ./project -o ./knowledge_base.m1f.txt \
--include-extensions .md .txt .rst --minimal-output
Documentation Bundles
# Create a documentation-only bundle
m1f -s ./project -o ./docs_bundle.txt --docs-only
HTML2MD Integration
# Combine scraped markdown files and remove metadata
m1f -s ./scraped_content -o ./clean_content.m1f.txt \
--include-extensions .md --remove-scraped-metadata
# Merge multiple scraped websites into clean documentation
m1f -s ./web_content -o ./web_docs.m1f.txt \
--include-extensions .md --remove-scraped-metadata --separator-style Markdown
Working with File Lists
The generated file lists can be edited and used as input for subsequent operations:
# Initial project analysis
m1f-init
# Creates: m1f/<project>_complete_filelist.txt and m1f/<project>_docs_filelist.txt
# Edit the file list to remove unwanted files
vi m1f/myproject_complete_filelist.txt
# Use the edited list for a custom bundle
m1f -i m1f/myproject_complete_filelist.txt -o m1f/custom_bundle.txt
# Combine multiple file lists
cat m1f/*_filelist.txt | sort -u > m1f/all_files.txt
m1f -i m1f/all_files.txt -o m1f/combined.txt
Performance Considerations
With the new async I/O architecture, m1f handles large projects efficiently:
- Concurrent file reading and processing
- Memory-efficient streaming for large files
- Smart caching to avoid redundant operations
- Content deduplication saves space and processing time
Architecture
The m1f tool features a modular Python package structure:
tools/m1f/
├── __init__.py # Package initialization
├── cli.py # Command-line interface
├── core.py # Main orchestration logic
├── config.py # Configuration management
├── constants.py # Constants and enums
├── exceptions.py # Custom exceptions
├── file_processor.py # File handling with async I/O
├── encoding_handler.py # Smart encoding detection
├── security_scanner.py # Secret detection integration
├── output_writer.py # Output generation
├── archive_creator.py # Archive functionality
├── separator_generator.py # Separator formatting
├── logging.py # Structured logging
└── utils.py # Utility functions
Documentation File Extensions
m1f recognizes these extensions as documentation files (when using --docs-only
):
- Man pages: .1, .1st, .2, .3, .4, .5, .6, .7, .8
- Markup formats: .adoc, .asciidoc, .md, .markdown, .mdx, .rst, .org, .textile, .wiki
- Text formats: .txt, .text, .readme, .changelog, .changes, .todo, .notes
- Developer docs: .pod, .rdoc, .yard, .lhs, .litcoffee
- LaTeX/TeX: .tex, .ltx, .texi, .texinfo
- Other: .rtf, .nfo, .faq, .help, .history, .info, .news, .release, .story
Best Practices
- Start with exclusions: Always use
--exclude-paths-file .gitignore
to exclude build artifacts - Use appropriate separators: Markdown for documentation, MachineReadable for programmatic use
- Monitor file sizes: Use
--max-file-size
to avoid including large generated files - Enable security scanning: Use
--security-check warn
to detect potential secrets - Create backups: Use
--create-archive
for important bundles
Related Tools
- s1f - Extract files from m1f bundles
- html2md - Convert HTML to Markdown before bundling
- scrape - Download web content for bundling
- token-counter - Estimate token usage of bundles
Next Steps
- Learn about presets for reusable configurations
- Explore auto-bundling for automated workflows
- Check out Claude integration for AI-powered development
- Review security best practices
- Previous
- CLI Reference
- Next
- Quick Start