m1f - Main Bundler

A modern, high-performance tool that combines multiple files into a single file with rich metadata, content deduplication, and async I/O support.

Overview

The m1f tool (v3.4.0) solves a common challenge when working with LLMs: providing sufficient context without exceeding token limits. Built with Python 3.10+ and modern architecture patterns, it creates optimized reference files from multiple sources while automatically handling duplicates and providing comprehensive metadata.

Key Features

Content Deduplication: Automatically detects and skips duplicate files based on SHA256 checksums
Async I/O: High-performance file operations with concurrent processing
Type Safety: Full type annotations throughout the codebase
Modern Architecture: Modular package structure with clean separation of concerns
Smart Filtering: Advanced file filtering with size limits, extensions, and patterns
Symlink Support: Intelligent symlink handling with cycle detection
Professional Security: Integration with detect-secrets for sensitive data detection
Colorized Output: Beautiful console output with progress indicators

Quick Start

Initialize m1f in Your Project

# Quick setup for any project
cd /your/project
m1f-init

# Initialize without creating symlink to m1f documentation
m1f-init --no-symlink

# This creates in the m1f/ directory:
# - <project>_complete.txt          # Full project bundle
# - <project>_complete_filelist.txt # List of all included files
# - <project>_complete_dirlist.txt  # List of all directories
# - <project>_docs.txt              # Documentation bundle
# - <project>_docs_filelist.txt     # List of documentation files
# - <project>_docs_dirlist.txt      # Documentation directories
# - .m1f.config.yml                 # Configuration file

Basic m1f Commands

# Basic usage with a source directory
m1f -s ./your_project -o ./combined.txt

# Include only specific file types
m1f -s ./your_project -o ./combined.txt --include-extensions .py .js .md

# Include only documentation files (62 extensions)
m1f -s ./your_project -o ./docs_bundle.txt --docs-only

# Exclude specific directories
m1f -s ./your_project -o ./combined.txt --excludes "node_modules/" "build/" "dist/"

# Filter by file size
m1f -s ./your_project -o ./combined.txt --max-file-size 50KB

Command Line Options

Option	Description
`-s, --source-directory`	Path to the directory containing files to process. Can be specified multiple times
`-i, --input-file`	Path to a file containing a list of files/directories to process
`-o, --output-file`	Path for the combined output file
`-f, --force`	Force overwrite of existing output file without prompting
`-t, --add-timestamp`	Add a timestamp (_YYYYMMDD_HHMMSS) to the output filename
`--filename-mtime-hash`	Append a hash of file modification timestamps to the filename
`--include-extensions`	Space-separated list of file extensions to include
`--exclude-extensions`	Space-separated list of file extensions to exclude
`--includes`	Space-separated list of gitignore-style patterns to include
`--docs-only`	Include only documentation files (62 extensions)
`--max-file-size`	Skip files larger than the specified size (e.g., 50KB)
`--exclude-paths-file`	Path to file containing paths or patterns to exclude
`--no-default-excludes`	Disable default directory exclusions
`--excludes`	Space-separated list of paths to exclude
`--include-dot-paths`	Include files and directories that start with a dot
`--include-binary-files`	Attempt to include files with binary extensions
`--remove-scraped-metadata`	Remove scraped metadata from HTML2MD files
`--separator-style`	Style of separators between files (Standard, Detailed, Markdown, MachineReadable, None)
`--line-ending`	Line ending for script-generated separators (lf or crlf)
`--convert-to-charset`	Convert all files to the specified character encoding
`--abort-on-encoding-error`	Abort processing if encoding conversion errors occur
`-v, --verbose`	Enable verbose logging
`--minimal-output`	Generate only the combined output file (no auxiliary files)
`--skip-output-file`	Execute operations but skip writing the final output file
`-q, --quiet`	Suppress all console output
`--create-archive`	Create a backup archive of all processed files
`--archive-type`	Type of archive to create (zip or tar.gz)
`--security-check`	Scan files for secrets before merging (abort, skip, warn)
`--preset`	One or more preset configuration files for file-specific processing
`--preset-group`	Specific preset group to use from the configuration
`--disable-presets`	Disable all preset processing

Usage Examples

Basic Operations

# Basic command using a source directory
m1f --source-directory /path/to/your/code \
  --output-file /path/to/combined_output.txt

# Using multiple source directories
m1f -s ./src -s ./docs -s ./tests -o combined_output.txt

# Using an input file containing paths to process
m1f -i filelist.txt -o combined_output.txt

# Using both source directory and input file together
m1f -s ./source_code -i ./file_list.txt -o ./combined.txt

# Using include patterns to filter files
m1f -s ./project -o output.txt --includes "src/**" "*.py" "!*_test.py"

# Remove scraped metadata from HTML2MD files
m1f -s ./scraped_docs -o ./clean_docs.txt \
  --include-extensions .md --remove-scraped-metadata

Advanced Operations

# Using MachineReadable style with verbose logging
m1f -s ./my_project -o ./output/bundle.m1f.txt \
  --separator-style MachineReadable --force --verbose

# Creating a combined file and a backup zip archive
m1f -s ./source_code -o ./dist/combined.txt \
  --create-archive --archive-type zip

# Only include text files under 50KB to avoid large generated files
m1f -s ./project -o ./text_only.txt \
  --max-file-size 50KB --include-extensions .py .js .md .txt .json

Security Check

The --security-check option scans files for potential secrets using detect-secrets if available:

abort – stop processing immediately and do not create the output file
skip – omit files that contain secrets from the final output
warn – include all files but print a summary warning at the end

# Stop processing if secrets are found
m1f -s . -o output.txt --security-check abort

# Skip files with secrets
m1f -s . -o output.txt --security-check skip

# Include all files but show warnings
m1f -s . -o output.txt --security-check warn

Output Files

By default, m1f creates several output files:

Primary output file - The combined file specified by --output-file
Log file - A .log file with detailed processing information
File list - A _filelist.txt file containing paths of all included files
Directory list - A _dirlist.txt file containing all unique directories
Archive file - An optional backup archive if --create-archive is specified

To create only the primary output file:

m1f -s ./src -o ./combined.txt --minimal-output

Separator Styles

Choose how files are separated in the combined output:

Standard Style

======= path/to/file.py ======

Detailed Style (Default)

========================================================================================
== FILE: path/to/file.py
== DATE: 2025-05-15 14:30:21 | SIZE: 2.50 KB | TYPE: .py
== CHECKSUM_SHA256: abcdef1234567890...
========================================================================================

Markdown Style

## path/to/file.py

**Date Modified:** 2025-05-15 14:30:21 | **Size:** 2.50 KB | **Type:** .py

```python
# File content starts here
def example():
    return "Hello, world!"
```

MachineReadable Style

--- PYMK1F_BEGIN_FILE_METADATA_BLOCK_12345678-1234-1234-1234-123456789abc ---
METADATA_JSON:
{
    "original_filepath": "path/to/file.py",
    "timestamp_utc_iso": "2025-05-15T14:30:21Z",
    "size_bytes": 2560,
    "checksum_sha256": "abcdef1234567890..."
}
--- PYMK1F_END_FILE_METADATA_BLOCK_12345678-1234-1234-1234-123456789abc ---
--- PYMK1F_BEGIN_FILE_CONTENT_BLOCK_12345678-1234-1234-1234-123456789abc ---

# File content here

--- PYMK1F_END_FILE_CONTENT_BLOCK_12345678-1234-1234-1234-123456789abc ---

None Style

Files are concatenated directly without separators.

Common Use Cases

Documentation Compilation

m1f -s ./docs -o ./doc_bundle.m1f.txt --include-extensions .md

Code Review Preparation

m1f -i code_review_files.txt -o ./review_bundle.m1f.txt

WordPress Development

m1f -s ./wp-content/themes/my-theme -o ./theme_context.m1f.txt \
  --include-extensions .php .js .css --exclude-paths-file ./exclude_build_files.txt

Project Knowledge Base

m1f -s ./project -o ./knowledge_base.m1f.txt \
  --include-extensions .md .txt .rst --minimal-output

Documentation Bundles

# Create a documentation-only bundle
m1f -s ./project -o ./docs_bundle.txt --docs-only

HTML2MD Integration

# Combine scraped markdown files and remove metadata
m1f -s ./scraped_content -o ./clean_content.m1f.txt \
  --include-extensions .md --remove-scraped-metadata

# Merge multiple scraped websites into clean documentation
m1f -s ./web_content -o ./web_docs.m1f.txt \
  --include-extensions .md --remove-scraped-metadata --separator-style Markdown

Working with File Lists

The generated file lists can be edited and used as input for subsequent operations:

# Initial project analysis
m1f-init
# Creates: m1f/<project>_complete_filelist.txt and m1f/<project>_docs_filelist.txt

# Edit the file list to remove unwanted files
vi m1f/myproject_complete_filelist.txt

# Use the edited list for a custom bundle
m1f -i m1f/myproject_complete_filelist.txt -o m1f/custom_bundle.txt

# Combine multiple file lists
cat m1f/*_filelist.txt | sort -u > m1f/all_files.txt
m1f -i m1f/all_files.txt -o m1f/combined.txt

Performance Considerations

With the new async I/O architecture, m1f handles large projects efficiently:

Concurrent file reading and processing
Memory-efficient streaming for large files
Smart caching to avoid redundant operations
Content deduplication saves space and processing time

Architecture

The m1f tool features a modular Python package structure:

tools/m1f/
├── __init__.py          # Package initialization
├── cli.py               # Command-line interface
├── core.py              # Main orchestration logic
├── config.py            # Configuration management
├── constants.py         # Constants and enums
├── exceptions.py        # Custom exceptions
├── file_processor.py    # File handling with async I/O
├── encoding_handler.py  # Smart encoding detection
├── security_scanner.py  # Secret detection integration
├── output_writer.py     # Output generation
├── archive_creator.py   # Archive functionality
├── separator_generator.py # Separator formatting
├── logging.py           # Structured logging
└── utils.py             # Utility functions

Documentation File Extensions

m1f recognizes these extensions as documentation files (when using --docs-only):

Man pages: .1, .1st, .2, .3, .4, .5, .6, .7, .8
Markup formats: .adoc, .asciidoc, .md, .markdown, .mdx, .rst, .org, .textile, .wiki
Text formats: .txt, .text, .readme, .changelog, .changes, .todo, .notes
Developer docs: .pod, .rdoc, .yard, .lhs, .litcoffee
LaTeX/TeX: .tex, .ltx, .texi, .texinfo
Other: .rtf, .nfo, .faq, .help, .history, .info, .news, .release, .story

Best Practices

Start with exclusions: Always use --exclude-paths-file .gitignore to exclude build artifacts
Use appropriate separators: Markdown for documentation, MachineReadable for programmatic use
Monitor file sizes: Use --max-file-size to avoid including large generated files
Enable security scanning: Use --security-check warn to detect potential secrets
Create backups: Use --create-archive for important bundles

s1f - Extract files from m1f bundles
html2md - Convert HTML to Markdown before bundling
scrape - Download web content for bundling
token-counter - Estimate token usage of bundles

Next Steps

Learn about presets for reusable configurations
Explore auto-bundling for automated workflows
Check out Claude integration for AI-powered development
Review security best practices