Getting Started
Core Concepts
Understanding the fundamental concepts behind m1f
Understanding these core concepts will help you get the most out of m1f and make it easier to work with Large Language Models (LLMs).
What is m1f?
m1f (Make One File) is a toolkit designed to solve the context limitation challenge when working with Large Language Models (LLMs). It combines multiple files into a single, well-structured file that can be easily consumed by AI assistants like Claude, GPT, or other LLMs.
The Problem m1f Solves
When working with LLMs, you often need to provide context from multiple files in your codebase. Manually copying and pasting files is:
- Time-consuming: Selecting and copying multiple files
- Error-prone: Missing files or including wrong versions
- Difficult to maintain: Context becomes stale as code changes
- Limited by token counts: Hard to optimize for token usage
m1f automates this process and provides intelligent features like deduplication, filtering, and formatting.
Core Architecture
The m1f Toolkit
m1f is actually a suite of tools that work together:
┌─────────────────────────────────────────────────────────────┐
│ m1f Toolkit │
├─────────────────────────────────────────────────────────────┤
│ m1f │ Main bundler - combines files │
│ s1f │ File splitter - extracts files from bundle │
│ html2md │ HTML to Markdown converter │
│ scrape │ Web scraper for documentation │
│ token-counter │ Estimates token usage │
└─────────────────────────────────────────────────────────────┘
Modern Architecture (v3.4.0)
Built with Python 3.10+ and modern patterns:
- Async I/O: Concurrent file processing for better performance
- Type Safety: Full type annotations throughout
- Modular Design: Clean separation of concerns
- Content Deduplication: Automatic SHA256-based duplicate detection
- Memory Efficient: Streaming operations for large files
Key Concepts
1. File Bundling
The process of combining multiple files into a single output file:
# Basic bundling
m1f -s ./src -o bundle.txt
What happens:
- Scans the source directory
- Filters files based on your criteria
- Reads each file and adds metadata
- Combines everything with separators
- Writes the output file
2. Content Deduplication
m1f automatically detects and skips duplicate files using SHA256 checksums:
# Files with identical content are included only once
m1f -s ./project -o bundle.txt
Benefits:
- Reduces file size
- Eliminates redundant context
- Speeds up processing
- Saves token usage
3. Separator Styles
Different ways to format the boundaries between files:
Standard Style
======= path/to/file.py ======
Detailed Style (default)
========================================================================================
== FILE: path/to/file.py
== DATE: 2025-05-15 14:30:21 | SIZE: 2.50 KB | TYPE: .py
== CHECKSUM_SHA256: abcdef1234567890...
========================================================================================
Markdown Style
## path/to/file.py
**Date Modified:** 2025-05-15 14:30:21 | **Size:** 2.50 KB
```python
# File content here
```
MachineReadable Style
--- PYMK1F_BEGIN_FILE_METADATA_BLOCK_uuid ---
METADATA_JSON: {"original_filepath": "path/to/file.py", ...}
--- PYMK1F_END_FILE_METADATA_BLOCK_uuid ---
--- PYMK1F_BEGIN_FILE_CONTENT_BLOCK_uuid ---
# File content here
--- PYMK1F_END_FILE_CONTENT_BLOCK_uuid ---
4. File Filtering
Multiple ways to control which files are included:
By Extension
# Only Python files
m1f -s . -o code.txt --include-extensions .py
# Multiple extensions
m1f -s . -o docs.txt --include-extensions .md .txt .rst
# Exclude extensions
m1f -s . -o output.txt --exclude-extensions .pyc .log
By Path Patterns
# Exclude directories
m1f -s . -o output.txt --excludes "node_modules/" "build/"
# Use gitignore patterns
m1f -s . -o output.txt --exclude-paths-file .gitignore
# Include patterns
m1f -s . -o output.txt --includes "src/**" "*.py"
By Size
# Skip large files
m1f -s . -o output.txt --max-file-size 100KB
5. Preset System
Reusable configurations for consistent processing:
# .m1f-presets.yml
globals:
global_settings:
include_extensions: [.py, .js, .md]
max_file_size: 1MB
presets:
frontend:
extensions: [.js, .jsx, .ts, .tsx]
actions: [minify]
backend:
extensions: [.py]
security_check: warn
# Use preset
m1f -s . -o output.txt --preset .m1f-presets.yml --preset-group frontend
6. Security Scanning
Automatic detection of secrets and sensitive data:
# Stop if secrets found
m1f -s . -o output.txt --security-check abort
# Skip files with secrets
m1f -s . -o output.txt --security-check skip
# Include all but warn
m1f -s . -o output.txt --security-check warn
File Lifecycle
Understanding how files flow through the system:
1. Discovery → Scan directories and build file list
2. Filtering → Apply include/exclude rules
3. Reading → Read file contents with encoding detection
4. Processing → Apply presets and transformations
5. Deduplication → Skip files with identical content
6. Security → Scan for secrets (optional)
7. Output → Write to bundle with separators
Working with LLMs
Token Management
m1f helps optimize token usage:
# Estimate tokens
m1f-token-counter ./bundle.txt
# Limit file sizes
m1f -s . -o output.txt --max-file-size 50KB
# Include only essential files
m1f -s . -o output.txt --includes "src/**" "*.py" "!test_*"
Context Strategies
Different approaches for different use cases:
Full Project Context
# Everything for comprehensive analysis
m1f -s . -o full-context.txt --exclude-paths-file .gitignore
Focused Context
# Specific feature or component
m1f -s ./src/components/auth -o auth-context.txt
Documentation Context
# Documentation only
m1f -s . -o docs-context.txt --docs-only
File Extraction
The reverse process - extracting files from a bundle:
# Extract all files
m1f-s1f ./bundle.txt ./extracted/
# List files without extracting
m1f-s1f --list ./bundle.txt
# Force overwrite
m1f-s1f ./bundle.txt ./extracted/ -f
Web Documentation Workflow
Complete workflow for processing web documentation:
# 1. Download documentation
m1f-scrape https://docs.example.com -o ./html/
# 2. Analyze HTML structure
m1f-html2md analyze ./html/ --suggest-selectors
# 3. Convert to Markdown
m1f-html2md convert ./html/ -o ./markdown/ \
--content-selector "main.content"
# 4. Bundle for LLM
m1f -s ./markdown/ -o ./docs-bundle.txt \
--remove-scraped-metadata
Best Practices
1. Start Small
# Test with limited scope first
m1f -s ./src -o test.txt --max-files 10
2. Use Exclusions
# Always exclude build artifacts
m1f -s . -o output.txt --exclude-paths-file .gitignore
3. Choose Appropriate Separators
# Markdown for documentation
m1f -s ./docs -o docs.txt --separator-style Markdown
# MachineReadable for programmatic use
m1f -s . -o data.txt --separator-style MachineReadable
4. Monitor Size
# Keep files manageable
m1f -s . -o output.txt --max-file-size 100KB
# Check token count
m1f-token-counter ./output.txt
5. Use Presets
# Consistent configuration
m1f -s . -o output.txt --preset project-settings.yml
Common Patterns
Documentation Bundle
m1f -s ./docs -o documentation.txt \
--include-extensions .md .rst .txt \
--separator-style Markdown
Source Code Analysis
m1f -s ./src -o source.txt \
--include-extensions .py .js .ts \
--excludes "*.test.*" "*.spec.*" \
--max-file-size 500KB
Security Review
m1f -s . -o security-review.txt \
--security-check warn \
--include-extensions .py .js .php \
--excludes "node_modules/" "vendor/"
Next Steps
Now that you understand the core concepts:
- Explore tools: Learn about s1f, html2md, and scrape
- Advanced features: Check out presets and auto-bundling
- AI workflows: See Claude integration
- Security: Read security best practices
Understanding these concepts will help you use m1f more effectively and get better results when working with LLMs.
- Previous
- s1f - File Splitter
- Next
- Claude Integration