Core Concepts

Understanding these core concepts will help you get the most out of m1f and make it easier to work with Large Language Models (LLMs).

What is m1f?

m1f (Make One File) is a toolkit designed to solve the context limitation challenge when working with Large Language Models (LLMs). It combines multiple files into a single, well-structured file that can be easily consumed by AI assistants like Claude, GPT, or other LLMs.

The Problem m1f Solves

When working with LLMs, you often need to provide context from multiple files in your codebase. Manually copying and pasting files is:

Time-consuming: Selecting and copying multiple files
Error-prone: Missing files or including wrong versions
Difficult to maintain: Context becomes stale as code changes
Limited by token counts: Hard to optimize for token usage

m1f automates this process and provides intelligent features like deduplication, filtering, and formatting.

Core Architecture

The m1f Toolkit

m1f is actually a suite of tools that work together:

┌─────────────────────────────────────────────────────────────┐
│                    m1f Toolkit                              │
├─────────────────────────────────────────────────────────────┤
│  m1f        │ Main bundler - combines files                 │
│  s1f        │ File splitter - extracts files from bundle   │
│  html2md    │ HTML to Markdown converter                    │
│  scrape     │ Web scraper for documentation                 │
│  token-counter │ Estimates token usage                     │
└─────────────────────────────────────────────────────────────┘

Modern Architecture (v3.4.0)

Built with Python 3.10+ and modern patterns:

Async I/O: Concurrent file processing for better performance
Type Safety: Full type annotations throughout
Modular Design: Clean separation of concerns
Content Deduplication: Automatic SHA256-based duplicate detection
Memory Efficient: Streaming operations for large files

Key Concepts

1. File Bundling

The process of combining multiple files into a single output file:

# Basic bundling
m1f -s ./src -o bundle.txt

What happens:

Scans the source directory
Filters files based on your criteria
Reads each file and adds metadata
Combines everything with separators
Writes the output file

2. Content Deduplication

m1f automatically detects and skips duplicate files using SHA256 checksums:

# Files with identical content are included only once
m1f -s ./project -o bundle.txt

Benefits:

Reduces file size
Eliminates redundant context
Speeds up processing
Saves token usage

3. Separator Styles

Different ways to format the boundaries between files:

Standard Style

======= path/to/file.py ======

Detailed Style (default)

========================================================================================
== FILE: path/to/file.py
== DATE: 2025-05-15 14:30:21 | SIZE: 2.50 KB | TYPE: .py
== CHECKSUM_SHA256: abcdef1234567890...
========================================================================================

Markdown Style

## path/to/file.py

**Date Modified:** 2025-05-15 14:30:21 | **Size:** 2.50 KB

```python
# File content here
```

MachineReadable Style

--- PYMK1F_BEGIN_FILE_METADATA_BLOCK_uuid ---
METADATA_JSON: {"original_filepath": "path/to/file.py", ...}
--- PYMK1F_END_FILE_METADATA_BLOCK_uuid ---
--- PYMK1F_BEGIN_FILE_CONTENT_BLOCK_uuid ---
# File content here
--- PYMK1F_END_FILE_CONTENT_BLOCK_uuid ---

4. File Filtering

Multiple ways to control which files are included:

By Extension

# Only Python files
m1f -s . -o code.txt --include-extensions .py

# Multiple extensions
m1f -s . -o docs.txt --include-extensions .md .txt .rst

# Exclude extensions
m1f -s . -o output.txt --exclude-extensions .pyc .log

By Path Patterns

# Exclude directories
m1f -s . -o output.txt --excludes "node_modules/" "build/"

# Use gitignore patterns
m1f -s . -o output.txt --exclude-paths-file .gitignore

# Include patterns
m1f -s . -o output.txt --includes "src/**" "*.py"

By Size

# Skip large files
m1f -s . -o output.txt --max-file-size 100KB

5. Preset System

Reusable configurations for consistent processing:

# .m1f-presets.yml
globals:
  global_settings:
    include_extensions: [.py, .js, .md]
    max_file_size: 1MB
    
  presets:
    frontend:
      extensions: [.js, .jsx, .ts, .tsx]
      actions: [minify]
    
    backend:
      extensions: [.py]
      security_check: warn

# Use preset
m1f -s . -o output.txt --preset .m1f-presets.yml --preset-group frontend

6. Security Scanning

Automatic detection of secrets and sensitive data:

# Stop if secrets found
m1f -s . -o output.txt --security-check abort

# Skip files with secrets
m1f -s . -o output.txt --security-check skip

# Include all but warn
m1f -s . -o output.txt --security-check warn

File Lifecycle

Understanding how files flow through the system:

1. Discovery    → Scan directories and build file list
2. Filtering    → Apply include/exclude rules
3. Reading      → Read file contents with encoding detection
4. Processing   → Apply presets and transformations
5. Deduplication → Skip files with identical content
6. Security     → Scan for secrets (optional)
7. Output       → Write to bundle with separators

Working with LLMs

Token Management

m1f helps optimize token usage:

# Estimate tokens
m1f-token-counter ./bundle.txt

# Limit file sizes
m1f -s . -o output.txt --max-file-size 50KB

# Include only essential files
m1f -s . -o output.txt --includes "src/**" "*.py" "!test_*"

Context Strategies

Different approaches for different use cases:

Full Project Context

# Everything for comprehensive analysis
m1f -s . -o full-context.txt --exclude-paths-file .gitignore

Focused Context

# Specific feature or component
m1f -s ./src/components/auth -o auth-context.txt

Documentation Context

# Documentation only
m1f -s . -o docs-context.txt --docs-only

File Extraction

The reverse process - extracting files from a bundle:

# Extract all files
m1f-s1f ./bundle.txt ./extracted/

# List files without extracting
m1f-s1f --list ./bundle.txt

# Force overwrite
m1f-s1f ./bundle.txt ./extracted/ -f

Web Documentation Workflow

Complete workflow for processing web documentation:

# 1. Download documentation
m1f-scrape https://docs.example.com -o ./html/

# 2. Analyze HTML structure
m1f-html2md analyze ./html/ --suggest-selectors

# 3. Convert to Markdown
m1f-html2md convert ./html/ -o ./markdown/ \
    --content-selector "main.content"

# 4. Bundle for LLM
m1f -s ./markdown/ -o ./docs-bundle.txt \
    --remove-scraped-metadata

Best Practices

1. Start Small

# Test with limited scope first
m1f -s ./src -o test.txt --max-files 10

2. Use Exclusions

# Always exclude build artifacts
m1f -s . -o output.txt --exclude-paths-file .gitignore

3. Choose Appropriate Separators

# Markdown for documentation
m1f -s ./docs -o docs.txt --separator-style Markdown

# MachineReadable for programmatic use
m1f -s . -o data.txt --separator-style MachineReadable

4. Monitor Size

# Keep files manageable
m1f -s . -o output.txt --max-file-size 100KB

# Check token count
m1f-token-counter ./output.txt

5. Use Presets

# Consistent configuration
m1f -s . -o output.txt --preset project-settings.yml

Common Patterns

Documentation Bundle

m1f -s ./docs -o documentation.txt \
    --include-extensions .md .rst .txt \
    --separator-style Markdown

Source Code Analysis

m1f -s ./src -o source.txt \
    --include-extensions .py .js .ts \
    --excludes "*.test.*" "*.spec.*" \
    --max-file-size 500KB

Security Review

m1f -s . -o security-review.txt \
    --security-check warn \
    --include-extensions .py .js .php \
    --excludes "node_modules/" "vendor/"

Next Steps

Now that you understand the core concepts:

Explore tools: Learn about s1f, html2md, and scrape
Advanced features: Check out presets and auto-bundling
AI workflows: See Claude integration
Security: Read security best practices

Understanding these concepts will help you use m1f more effectively and get better results when working with LLMs.