Reference
Bundle Format Specification
Technical specification of the m1f bundle file format
Bundle Format Specification
The m1f tool generates bundle files that combine multiple source files into a single output file. This document provides a complete technical specification of the bundle format, including separator styles, metadata structure, and deduplication mechanisms.
Overview
An m1f bundle consists of:
- File separators that mark boundaries between files
- File metadata containing information about each file
- File content with optional processing applied
- Deduplication references for files with identical content
Separator Styles
m1f supports five different separator styles, each designed for specific use cases:
Standard Style
A simple, concise separator showing only the file path:
======= path/to/file.py ======
Use cases:
- Quick file navigation
- Minimal overhead for AI tools
- Human-readable bundles
Detailed Style (Default)
Comprehensive separator including full metadata:
========================================================================================
== FILE: path/to/file.py
== DATE: 2025-05-15 14:30:21 | SIZE: 2.50 KB | TYPE: .py
== CHECKSUM_SHA256: abcdef1234567890abcdef1234567890abcdef1234567890abcdef1234567890
========================================================================================
Metadata fields:
FILE
: Relative path from the source directoryDATE
: Last modification timestamp (local timezone)SIZE
: Human-readable file sizeTYPE
: File extensionCHECKSUM_SHA256
: SHA-256 hash of the file content
Markdown Style
Formats metadata as Markdown with proper code blocks:
## path/to/file.py
**Date Modified:** 2025-05-15 14:30:21 | **Size:** 2.50 KB | **Type:** .py | **Checksum (SHA256):** abcdef1234567890...
```python
# File content starts here
def example():
return "Hello, world!"
```
Features:
- Uses file extension for syntax highlighting
- Renders nicely in Markdown viewers
- Ideal for documentation bundles
MachineReadable Style
A robust format for automated parsing and processing:
--- PYMK1F_BEGIN_FILE_METADATA_BLOCK_12345678-1234-1234-1234-123456789abc ---
METADATA_JSON:
{
"original_filepath": "path/to/file.py",
"original_filename": "file.py",
"timestamp_utc_iso": "2025-05-15T14:30:21Z",
"type": ".py",
"size_bytes": 2560,
"checksum_sha256": "abcdef1234567890abcdef1234567890abcdef1234567890abcdef1234567890",
"encoding": "utf-8",
"original_encoding": "utf-8"
}
--- PYMK1F_END_FILE_METADATA_BLOCK_12345678-1234-1234-1234-123456789abc ---
--- PYMK1F_BEGIN_FILE_CONTENT_BLOCK_12345678-1234-1234-1234-123456789abc ---
# File content here
--- PYMK1F_END_FILE_CONTENT_BLOCK_12345678-1234-1234-1234-123456789abc ---
Features:
- UUID-based block markers for reliable parsing
- JSON metadata for easy extraction
- Timestamps in UTC ISO format
- Encoding information included
None Style
Files are concatenated directly without any separators:
# First file content
def function1():
pass
# Second file content
def function2():
pass
Use cases:
- Creating single concatenated files
- Minimal processing needed
- When file boundaries don’t matter
Content Deduplication
By default, m1f deduplicates files with identical content to save space and reduce token usage.
Deduplication Format
When a duplicate file is encountered:
========================================================================================
== FILE: src/components/Header.js
== DUPE OF: src/widgets/Header.js
========================================================================================
Behavior:
- First occurrence includes full content
- Subsequent duplicates show reference only
- Based on SHA-256 checksum matching
- Can be disabled with
--allow-duplicate-files
Deduplication Example
========================================================================================
== FILE: src/utils/constants.js
== DATE: 2025-05-15 10:00:00 | SIZE: 150 B | TYPE: .js
== CHECKSUM_SHA256: 123abc...
========================================================================================
export const API_KEY = "example";
export const API_URL = "https://api.example.com";
========================================================================================
== FILE: lib/shared/constants.js
== DUPE OF: src/utils/constants.js
========================================================================================
========================================================================================
== FILE: backup/constants.js
== DUPE OF: src/utils/constants.js
========================================================================================
Metadata Structure
Core Metadata Fields
All separator styles (except None) include these fields:
Field | Description | Format |
---|---|---|
filepath | Relative path from source directory | String |
filename | Base filename | String |
timestamp | File modification time | ISO 8601 or Unix timestamp |
size | File size | Bytes (number) or human-readable |
type | File extension | String (including dot) |
checksum_sha256 | Content hash | 64-character hex string |
Extended Metadata (MachineReadable)
Additional fields in JSON format:
Field | Description | Example |
---|---|---|
encoding | Output encoding | "utf-8" |
original_encoding | Detected source encoding | "windows-1252" |
line_count | Number of lines | 150 |
binary | Is binary file | false |
mime_type | MIME type (if detected) | "text/javascript" |
Encoding Handling
Character Encoding
- Default output encoding: UTF-8
- Original encoding detected and preserved in metadata
- Configurable with
--convert-to-charset
- Supported encodings:
utf-8
,utf-16
,ascii
,latin-1
,cp1252
Line Endings
- Preserved from source files by default
- Separator line endings configurable with
--line-ending
- Options:
lf
(Unix) orcrlf
(Windows)
Binary File Handling
When --include-binary-files
is enabled:
========================================================================================
== FILE: images/logo.png
== DATE: 2025-05-15 12:00:00 | SIZE: 45.2 KB | TYPE: .png
== BINARY FILE: Content read as UTF-8 with errors ignored
========================================================================================
[Binary content appears as garbled text]
Note: Binary content is not suitable for text processing and significantly increases bundle size.
Special Processing
Scraped Content Metadata Removal
When --remove-scraped-metadata
is enabled, m1f removes metadata blocks from HTML2MD files:
# Original file with metadata
Article content here...
---
**Source:** https://example.com/article
**Scraped:** 2025-05-15 10:00:00
---
# After processing
Article content here...
Content Processors
When using presets, files can have processors applied:
per_file_settings:
"*.min.js":
processors:
- minify_content
This affects the content but not the metadata structure.
Bundle File Structure Example
A complete bundle showing various features:
========================================================================================
== FILE: README.md
== DATE: 2025-05-15 09:00:00 | SIZE: 1.2 KB | TYPE: .md
== CHECKSUM_SHA256: abc123...
========================================================================================
# My Project
This is the main documentation...
========================================================================================
== FILE: src/index.js
== DATE: 2025-05-15 10:30:00 | SIZE: 3.5 KB | TYPE: .js
== CHECKSUM_SHA256: def456...
========================================================================================
import { utils } from './utils';
export function main() {
// Main application code
}
========================================================================================
== FILE: src/utils.js
== DATE: 2025-05-15 10:00:00 | SIZE: 2.1 KB | TYPE: .js
== CHECKSUM_SHA256: ghi789...
========================================================================================
export const utils = {
// Utility functions
};
========================================================================================
== FILE: tests/index.test.js
== DUPE OF: src/index.js
========================================================================================
Parsing Bundle Files
Regular Expression Patterns
For parsing Standard style:
^=======\s+(.+?)\s+======$
For parsing Detailed style:
^==\s+FILE:\s+(.+)$
^==\s+DATE:\s+(.+?)\s+\|\s+SIZE:\s+(.+?)\s+\|\s+TYPE:\s+(.+)$
^==\s+CHECKSUM_SHA256:\s+([a-fA-F0-9]{64})$
For parsing MachineReadable blocks:
^---\s+PYMK1F_BEGIN_FILE_METADATA_BLOCK_([a-f0-9-]+)\s+---$
^---\s+PYMK1F_END_FILE_METADATA_BLOCK_([a-f0-9-]+)\s+---$
Example Parser (Python)
import json
import re
def parse_machine_readable_bundle(content):
"""Parse a MachineReadable format bundle."""
files = []
# Pattern for metadata blocks
metadata_pattern = re.compile(
r'--- PYMK1F_BEGIN_FILE_METADATA_BLOCK_(.+?) ---\n'
r'METADATA_JSON:\n(.+?)\n'
r'--- PYMK1F_END_FILE_METADATA_BLOCK_\1 ---',
re.DOTALL
)
# Pattern for content blocks
content_pattern = re.compile(
r'--- PYMK1F_BEGIN_FILE_CONTENT_BLOCK_(.+?) ---\n'
r'(.*?)\n'
r'--- PYMK1F_END_FILE_CONTENT_BLOCK_\1 ---',
re.DOTALL
)
# Extract all metadata
for match in metadata_pattern.finditer(content):
uuid = match.group(1)
metadata = json.loads(match.group(2))
# Find corresponding content
content_match = content_pattern.search(content)
if content_match and content_match.group(1) == uuid:
metadata['content'] = content_match.group(2)
files.append(metadata)
return files
Version Compatibility
- v3.2+: Current format as documented
- v3.1: Standard separator included SHA256 in the separator line
- v2.x: Different metadata format, fewer separator styles
- s1f tool: Can parse all m1f bundle formats
Best Practices
-
Choose appropriate separator style:
- Standard for AI tools (minimal tokens)
- Detailed for debugging and verification
- Markdown for documentation
- MachineReadable for automation
- None for simple concatenation
-
Deduplication considerations:
- Enable for large codebases with repeated files
- Disable when preserving file instances matters
- Check DUPE references when debugging
-
Encoding handling:
- Use UTF-8 output for maximum compatibility
- Preserve original encoding when needed
- Include encoding metadata for processing tools
-
Bundle organization:
- Group related files together
- Use consistent source paths
- Include metadata for traceability
See Also
- CLI Reference - Command-line options
- Auto-Bundle Guide - Automated bundle generation
- Preset Reference - Advanced configuration
- Previous
- Security
- Next
- scrape - Web Scraper