Tools

m1f - Main Bundler

The core tool for combining multiple files into a single file with metadata and deduplication

A modern, high-performance tool that combines multiple files into a single file with rich metadata, content deduplication, and async I/O support.

Overview

The m1f tool (v3.4.0) solves a common challenge when working with LLMs: providing sufficient context without exceeding token limits. Built with Python 3.10+ and modern architecture patterns, it creates optimized reference files from multiple sources while automatically handling duplicates and providing comprehensive metadata.

Key Features

  • Content Deduplication: Automatically detects and skips duplicate files based on SHA256 checksums
  • Async I/O: High-performance file operations with concurrent processing
  • Type Safety: Full type annotations throughout the codebase
  • Modern Architecture: Modular package structure with clean separation of concerns
  • Smart Filtering: Advanced file filtering with size limits, extensions, and patterns
  • Symlink Support: Intelligent symlink handling with cycle detection
  • Professional Security: Integration with detect-secrets for sensitive data detection
  • Colorized Output: Beautiful console output with progress indicators

Quick Start

Initialize m1f in Your Project

# Quick setup for any project
cd /your/project
m1f-init

# Initialize without creating symlink to m1f documentation
m1f-init --no-symlink

# This creates in the m1f/ directory:
# - <project>_complete.txt          # Full project bundle
# - <project>_complete_filelist.txt # List of all included files
# - <project>_complete_dirlist.txt  # List of all directories
# - <project>_docs.txt              # Documentation bundle
# - <project>_docs_filelist.txt     # List of documentation files
# - <project>_docs_dirlist.txt      # Documentation directories
# - .m1f.config.yml                 # Configuration file

Basic m1f Commands

# Basic usage with a source directory
m1f -s ./your_project -o ./combined.txt

# Include only specific file types
m1f -s ./your_project -o ./combined.txt --include-extensions .py .js .md

# Include only documentation files (62 extensions)
m1f -s ./your_project -o ./docs_bundle.txt --docs-only

# Exclude specific directories
m1f -s ./your_project -o ./combined.txt --excludes "node_modules/" "build/" "dist/"

# Filter by file size
m1f -s ./your_project -o ./combined.txt --max-file-size 50KB

Command Line Options

OptionDescription
-s, --source-directoryPath to the directory containing files to process. Can be specified multiple times
-i, --input-filePath to a file containing a list of files/directories to process
-o, --output-filePath for the combined output file
-f, --forceForce overwrite of existing output file without prompting
-t, --add-timestampAdd a timestamp (_YYYYMMDD_HHMMSS) to the output filename
--filename-mtime-hashAppend a hash of file modification timestamps to the filename
--include-extensionsSpace-separated list of file extensions to include
--exclude-extensionsSpace-separated list of file extensions to exclude
--includesSpace-separated list of gitignore-style patterns to include
--docs-onlyInclude only documentation files (62 extensions)
--max-file-sizeSkip files larger than the specified size (e.g., 50KB)
--exclude-paths-filePath to file containing paths or patterns to exclude
--no-default-excludesDisable default directory exclusions
--excludesSpace-separated list of paths to exclude
--include-dot-pathsInclude files and directories that start with a dot
--include-binary-filesAttempt to include files with binary extensions
--remove-scraped-metadataRemove scraped metadata from HTML2MD files
--separator-styleStyle of separators between files (Standard, Detailed, Markdown, MachineReadable, None)
--line-endingLine ending for script-generated separators (lf or crlf)
--convert-to-charsetConvert all files to the specified character encoding
--abort-on-encoding-errorAbort processing if encoding conversion errors occur
-v, --verboseEnable verbose logging
--minimal-outputGenerate only the combined output file (no auxiliary files)
--skip-output-fileExecute operations but skip writing the final output file
-q, --quietSuppress all console output
--create-archiveCreate a backup archive of all processed files
--archive-typeType of archive to create (zip or tar.gz)
--security-checkScan files for secrets before merging (abort, skip, warn)
--presetOne or more preset configuration files for file-specific processing
--preset-groupSpecific preset group to use from the configuration
--disable-presetsDisable all preset processing

Usage Examples

Basic Operations

# Basic command using a source directory
m1f --source-directory /path/to/your/code \
  --output-file /path/to/combined_output.txt

# Using multiple source directories
m1f -s ./src -s ./docs -s ./tests -o combined_output.txt

# Using an input file containing paths to process
m1f -i filelist.txt -o combined_output.txt

# Using both source directory and input file together
m1f -s ./source_code -i ./file_list.txt -o ./combined.txt

# Using include patterns to filter files
m1f -s ./project -o output.txt --includes "src/**" "*.py" "!*_test.py"

# Remove scraped metadata from HTML2MD files
m1f -s ./scraped_docs -o ./clean_docs.txt \
  --include-extensions .md --remove-scraped-metadata

Advanced Operations

# Using MachineReadable style with verbose logging
m1f -s ./my_project -o ./output/bundle.m1f.txt \
  --separator-style MachineReadable --force --verbose

# Creating a combined file and a backup zip archive
m1f -s ./source_code -o ./dist/combined.txt \
  --create-archive --archive-type zip

# Only include text files under 50KB to avoid large generated files
m1f -s ./project -o ./text_only.txt \
  --max-file-size 50KB --include-extensions .py .js .md .txt .json

Security Check

The --security-check option scans files for potential secrets using detect-secrets if available:

  • abort – stop processing immediately and do not create the output file
  • skip – omit files that contain secrets from the final output
  • warn – include all files but print a summary warning at the end
# Stop processing if secrets are found
m1f -s . -o output.txt --security-check abort

# Skip files with secrets
m1f -s . -o output.txt --security-check skip

# Include all files but show warnings
m1f -s . -o output.txt --security-check warn

Output Files

By default, m1f creates several output files:

  1. Primary output file - The combined file specified by --output-file
  2. Log file - A .log file with detailed processing information
  3. File list - A _filelist.txt file containing paths of all included files
  4. Directory list - A _dirlist.txt file containing all unique directories
  5. Archive file - An optional backup archive if --create-archive is specified

To create only the primary output file:

m1f -s ./src -o ./combined.txt --minimal-output

Separator Styles

Choose how files are separated in the combined output:

Standard Style

======= path/to/file.py ======

Detailed Style (Default)

========================================================================================
== FILE: path/to/file.py
== DATE: 2025-05-15 14:30:21 | SIZE: 2.50 KB | TYPE: .py
== CHECKSUM_SHA256: abcdef1234567890...
========================================================================================

Markdown Style

## path/to/file.py

**Date Modified:** 2025-05-15 14:30:21 | **Size:** 2.50 KB | **Type:** .py

```python
# File content starts here
def example():
    return "Hello, world!"
```

MachineReadable Style

--- PYMK1F_BEGIN_FILE_METADATA_BLOCK_12345678-1234-1234-1234-123456789abc ---
METADATA_JSON:
{
    "original_filepath": "path/to/file.py",
    "timestamp_utc_iso": "2025-05-15T14:30:21Z",
    "size_bytes": 2560,
    "checksum_sha256": "abcdef1234567890..."
}
--- PYMK1F_END_FILE_METADATA_BLOCK_12345678-1234-1234-1234-123456789abc ---
--- PYMK1F_BEGIN_FILE_CONTENT_BLOCK_12345678-1234-1234-1234-123456789abc ---

# File content here

--- PYMK1F_END_FILE_CONTENT_BLOCK_12345678-1234-1234-1234-123456789abc ---

None Style

Files are concatenated directly without separators.

Common Use Cases

Documentation Compilation

m1f -s ./docs -o ./doc_bundle.m1f.txt --include-extensions .md

Code Review Preparation

m1f -i code_review_files.txt -o ./review_bundle.m1f.txt

WordPress Development

m1f -s ./wp-content/themes/my-theme -o ./theme_context.m1f.txt \
  --include-extensions .php .js .css --exclude-paths-file ./exclude_build_files.txt

Project Knowledge Base

m1f -s ./project -o ./knowledge_base.m1f.txt \
  --include-extensions .md .txt .rst --minimal-output

Documentation Bundles

# Create a documentation-only bundle
m1f -s ./project -o ./docs_bundle.txt --docs-only

HTML2MD Integration

# Combine scraped markdown files and remove metadata
m1f -s ./scraped_content -o ./clean_content.m1f.txt \
  --include-extensions .md --remove-scraped-metadata

# Merge multiple scraped websites into clean documentation
m1f -s ./web_content -o ./web_docs.m1f.txt \
  --include-extensions .md --remove-scraped-metadata --separator-style Markdown

Working with File Lists

The generated file lists can be edited and used as input for subsequent operations:

# Initial project analysis
m1f-init
# Creates: m1f/<project>_complete_filelist.txt and m1f/<project>_docs_filelist.txt

# Edit the file list to remove unwanted files
vi m1f/myproject_complete_filelist.txt

# Use the edited list for a custom bundle
m1f -i m1f/myproject_complete_filelist.txt -o m1f/custom_bundle.txt

# Combine multiple file lists
cat m1f/*_filelist.txt | sort -u > m1f/all_files.txt
m1f -i m1f/all_files.txt -o m1f/combined.txt

Performance Considerations

With the new async I/O architecture, m1f handles large projects efficiently:

  • Concurrent file reading and processing
  • Memory-efficient streaming for large files
  • Smart caching to avoid redundant operations
  • Content deduplication saves space and processing time

Architecture

The m1f tool features a modular Python package structure:

tools/m1f/
├── __init__.py          # Package initialization
├── cli.py               # Command-line interface
├── core.py              # Main orchestration logic
├── config.py            # Configuration management
├── constants.py         # Constants and enums
├── exceptions.py        # Custom exceptions
├── file_processor.py    # File handling with async I/O
├── encoding_handler.py  # Smart encoding detection
├── security_scanner.py  # Secret detection integration
├── output_writer.py     # Output generation
├── archive_creator.py   # Archive functionality
├── separator_generator.py # Separator formatting
├── logging.py           # Structured logging
└── utils.py             # Utility functions

Documentation File Extensions

m1f recognizes these extensions as documentation files (when using --docs-only):

  • Man pages: .1, .1st, .2, .3, .4, .5, .6, .7, .8
  • Markup formats: .adoc, .asciidoc, .md, .markdown, .mdx, .rst, .org, .textile, .wiki
  • Text formats: .txt, .text, .readme, .changelog, .changes, .todo, .notes
  • Developer docs: .pod, .rdoc, .yard, .lhs, .litcoffee
  • LaTeX/TeX: .tex, .ltx, .texi, .texinfo
  • Other: .rtf, .nfo, .faq, .help, .history, .info, .news, .release, .story

Best Practices

  1. Start with exclusions: Always use --exclude-paths-file .gitignore to exclude build artifacts
  2. Use appropriate separators: Markdown for documentation, MachineReadable for programmatic use
  3. Monitor file sizes: Use --max-file-size to avoid including large generated files
  4. Enable security scanning: Use --security-check warn to detect potential secrets
  5. Create backups: Use --create-archive for important bundles
  • s1f - Extract files from m1f bundles
  • html2md - Convert HTML to Markdown before bundling
  • scrape - Download web content for bundling
  • token-counter - Estimate token usage of bundles

Next Steps