CLI & Config
m1f Preset System Complete Reference
Comprehensive reference for the m1f preset system including all settings and advanced patterns
m1f Preset System Complete Reference
This document provides a comprehensive reference for the m1f preset system, including all available settings, clarifications, and advanced usage patterns.
Table of Contents
- Quick Start
- Preset File Format
- All Available Settings
- Available Actions
- Pattern Matching
- Processing Order
- Important Clarifications
- Advanced Features
- Examples
- Debugging and Best Practices
Quick Start
The m1f preset system allows you to define file-specific processing rules and configurations. Here’s a minimal example:
# my-preset.yml
web_assets:
description: "Process web assets"
presets:
javascript:
extensions: [".js", ".jsx"]
actions: ["minify", "strip_comments"]
Use it with:
# Module invocation (recommended)
m1f -s ./src -o bundle.txt --preset my-preset.yml
# Direct command invocation (if installed)
m1f -s ./src -o bundle.txt --preset my-preset.yml
Preset File Format
Modern Format (Recommended)
# Group name - can be selected with --preset-group
group_name:
description: "Optional description of this preset group"
enabled: true # Can disable entire group
priority: 10 # Higher numbers are processed first (default: 0)
base_path: "src" # Optional base path for all patterns in this group
presets:
# Preset name (for internal reference)
preset_name:
patterns: ["*.js", "*.jsx"] # Glob patterns
extensions: [".js", ".jsx"] # Extension matching (with or without dot)
actions:
- minify
- strip_comments
- compress_whitespace
# Per-file overrides
security_check: "warn" # error, skip, warn
max_file_size: "500KB"
include_dot_paths: true
include_binary_files: false
remove_scraped_metadata: true
# Custom processor with arguments
custom_processor: "truncate"
processor_args:
max_lines: 100
add_marker: true
# Global settings (apply to all groups)
globals:
global_settings:
# Input/Output settings (NEW in v3.2.0)
source_directory: "./src"
input_file: "files_to_process.txt"
output_file: "bundle.txt"
input_include_files:
- "README.md"
- "INTRO.txt"
# Output control (NEW in v3.2.0)
add_timestamp: true
filename_mtime_hash: false
force: false
minimal_output: false
skip_output_file: false
# Archive settings (NEW in v3.2.0)
create_archive: false
archive_type: "zip" # zip or tar.gz
# Runtime behavior (NEW in v3.2.0)
verbose: false
quiet: false
# Default file processing
security_check: "warn"
max_file_size: "1MB"
# Per-extension settings
extensions:
.py:
security_check: "error"
max_file_size: "2MB"
.env:
security_check: "skip"
actions: ["redact_secrets"]
All Available Settings
Group-Level Settings
Setting | Type | Default | Description |
---|---|---|---|
description | string | none | Human-readable description |
enabled | boolean | true | Enable/disable this group |
priority | integer | 0 | Processing order (higher first) |
base_path | string | none | Base path for pattern matching |
enabled_if_exists | string | none | Only enable if this path exists |
Global Settings (NEW in v3.2.0)
These settings can be specified in the global_settings
section and override CLI defaults:
Input/Output Settings
Setting | Type | Default | Description |
---|---|---|---|
source_directory | string | none | Source directory path |
input_file | string | none | Input file listing paths to process |
output_file | string | none | Output file path |
input_include_files | string/list | [] | Files to include at beginning (intro files) |
Output Control Settings
Setting | Type | Default | Description |
---|---|---|---|
add_timestamp | boolean | false | Add timestamp to output filename |
filename_mtime_hash | boolean | false | Add hash of file mtimes to filename |
force | boolean | false | Force overwrite existing output file |
minimal_output | boolean | false | Only create main output file |
skip_output_file | boolean | false | Skip creating main output file |
allow_duplicate_files | boolean | false | Allow duplicate content (v3.2) |
Archive Settings
Setting | Type | Default | Description |
---|---|---|---|
create_archive | boolean | false | Create backup archive of files |
archive_type | string | ”zip” | Archive format: “zip” or “tar.gz” |
Runtime Settings
Setting | Type | Default | Description |
---|---|---|---|
verbose | boolean | false | Enable verbose output |
quiet | boolean | false | Suppress all console output |
File Processing Settings
Setting | Type | Default | Description |
---|---|---|---|
encoding | string | ”utf-8” | Target encoding for all files |
separator_style | string | none | File separator style |
line_ending | string | ”lf” | Line ending style (lf/crlf) |
security_check | string | ”warn” | How to handle secrets |
max_file_size | string | none | Maximum file size to process |
enable_content_deduplication | boolean | true | Enable content deduplication (v3.2) |
prefer_utf8_for_text_files | boolean | true | Prefer UTF-8 for text files (v3.2) |
Preset-Level Settings
Setting | Type | Default | Description |
---|---|---|---|
patterns | list | [] | Glob patterns to match files |
extensions | list | [] | File extensions to match |
actions | list | [] | Processing actions to apply |
security_check | string | ”warn” | How to handle secrets |
max_file_size | string | none | Maximum file size to process |
include_dot_paths | boolean | false | Include hidden files |
include_binary_files | boolean | false | Process binary files |
remove_scraped_metadata | boolean | false | Remove HTML2MD metadata |
custom_processor | string | none | Name of custom processor |
processor_args | dict | {} | Arguments for custom processor |
line_ending | string | ”lf” | Convert line endings (lf, crlf) |
separator_style | string | none | Override default separator style |
include_metadata | boolean | true | Include file metadata in output |
max_lines | integer | none | Truncate file after N lines |
strip_tags | list | [] | HTML tags to remove (for strip_tags action) |
preserve_tags | list | [] | HTML tags to preserve when stripping |
Available Actions
Built-in Actions
-
minify
- Remove unnecessary whitespace and formatting- Reduces file size
- Maintains functionality
- Best for: JS, CSS, HTML
-
strip_tags
- Remove HTML/XML tags- Extracts text content only
- Preserves text between tags
- Best for: HTML, XML, Markdown with HTML
-
strip_comments
- Remove code comments- Removes single and multi-line comments
- Language-aware (JS, Python, CSS, etc.)
- Best for: Production code bundles
-
compress_whitespace
- Reduce multiple spaces/newlines- Converts multiple spaces to single space
- Reduces multiple newlines to double newline
- Best for: Documentation, logs
-
remove_empty_lines
- Remove blank lines- Removes lines with only whitespace
- Keeps single blank lines between sections
- Best for: Clean documentation
Custom Processors
Currently implemented:
-
truncate
- Limit file lengthcustom_processor: "truncate" processor_args: max_lines: 100 max_chars: 10000 add_marker: true # Add "... truncated ..." marker
-
redact_secrets
- Remove sensitive datacustom_processor: "redact_secrets" processor_args: patterns: - '(?i)(api[_-]?key|secret|password|token)\s*[:=]\s*["\\']?[\w-]+["\\']?' - '(?i)bearer\s+[\w-]+' replacement: "[REDACTED]"
-
extract_functions
- Extract function definitionscustom_processor: "extract_functions" processor_args: languages: ["python", "javascript"] include_docstrings: true
Note: Other processors mentioned in examples (like extract_code_cells
) are illustrative and would need to be implemented.
Pattern Matching
Pattern Types
-
Extension Matching
extensions: [".py", ".pyx", "py"] # All are equivalent
-
Glob Patterns
patterns: - "*.test.js" # All test files - "src/**/*.js" # All JS in src/
-
Combined Matching
# File must match BOTH extension AND pattern extensions: [".js"] patterns: ["src/**/*"]
Base Path Behavior
group_name:
base_path: "src"
presets:
example:
patterns: ["components/*.js"] # Actually matches: src/components/*.js
Processing Order
- Group Priority - Higher priority groups are checked first
- Preset Order - Within a group, presets are checked in definition order
- First Match Wins - First matching preset is applied
- Action Order - Actions are applied in the order listed
Setting Precedence
- CLI arguments (highest priority)
- Preset-specific settings
- Global per-extension settings
- Global default settings
- m1f defaults (lowest priority)
Note: CLI arguments ALWAYS override preset values.
Important Clarifications
Pattern Matching Limitations
Exclude patterns with !
prefix are not supported in preset patterns. To exclude files:
-
Use Global Settings (Recommended):
globals: global_settings: exclude_patterns: ["*.min.js", "*.map", "dist/**/*"]
-
Use CLI Arguments:
m1f -s . -o out.txt --exclude-patterns "*.min.js" "*.map"
Settings Hierarchy
Understanding where settings can be applied:
-
Global Settings Level (
globals.global_settings
):include_patterns
/exclude_patterns
include_extensions
/exclude_extensions
- All general m1f settings
-
Preset Level (individual presets):
patterns
andextensions
(for matching)actions
(processing actions)- Override settings like
security_check
-
Extension-Specific Global Settings (
globals.global_settings.extensions.{ext}
):- All preset-level settings per extension
Common Misconceptions
-
Exclude Patterns in Presets
❌ Incorrect:
presets: my_preset: exclude_patterns: ["*.min.js"] # Doesn't work here
✅ Correct:
globals: global_settings: exclude_patterns: ["*.min.js"] # Works here
-
Actions vs Settings
Actions (go in
actions
list):minify
,strip_tags
,strip_comments
, etc.
Settings (separate fields):
strip_tags: ["script", "style"]
(configuration)max_lines: 100
(configuration)
Advanced Features
Conditional Presets
production:
enabled_if_exists: ".env.production" # Only active in production
presets:
minify_all:
extensions: [".js", ".css", ".html"]
actions: ["minify", "strip_comments"]
Multiple Preset Files
# Files are merged in order (later files override earlier ones)
m1f -s . -o out.txt \
--preset base.yml \
--preset project.yml \
--preset overrides.yml
Preset Locations
- Project presets:
./presets/*.m1f-presets.yml
- Local preset:
./.m1f-presets.yml
- User presets:
~/m1f/*.m1f-presets.yml
- Specified presets: Via
--preset
flag
Complete Parameter Control (v3.2.0+)
Starting with v3.2.0, ALL m1f parameters can be controlled via presets:
# production.m1f-presets.yml
production:
description: "Production build configuration"
global_settings:
# Define all inputs/outputs
source_directory: "./src"
output_file: "dist/bundle.txt"
input_include_files: ["README.md", "LICENSE"]
# Enable production features
add_timestamp: true
create_archive: true
archive_type: "tar.gz"
force: true
# Production optimizations
minimal_output: true
quiet: true
# File processing
separator_style: "MachineReadable"
encoding: "utf-8"
security_check: "error"
Usage comparison:
Before v3.2.0 (long command):
m1f -s ./src -o dist/bundle.txt \
--input-include-files README.md LICENSE \
--add-timestamp --create-archive --archive-type tar.gz \
--force --minimal-output --quiet \
--separator-style MachineReadable \
--security-check error
After v3.2.0 (simple command):
m1f --preset production.m1f-presets.yml -o output.txt
Examples
Web Development Preset
web_development:
description: "Modern web development bundle"
presets:
# Minify production assets
production_assets:
patterns: ["dist/**/*", "build/**/*"]
extensions: [".js", ".css"]
actions: ["minify", "strip_comments"]
# Source code - keep readable
source_code:
patterns: ["src/**/*"]
extensions: [".js", ".jsx", ".ts", ".tsx"]
actions: ["strip_comments"]
security_check: "error"
# Documentation
docs:
extensions: [".md", ".mdx"]
actions: ["compress_whitespace", "remove_empty_lines"]
# Configuration files
config:
patterns: ["*.json", "*.yml", "*.yaml"]
security_check: "error"
custom_processor: "redact_secrets"
Data Science Preset
data_science:
presets:
# Large data files - truncate
data_files:
extensions: [".csv", ".json", ".parquet"]
max_file_size: "100KB"
custom_processor: "truncate"
processor_args:
max_lines: 1000
# Scripts - full content
scripts:
extensions: [".py", ".r", ".jl"]
actions: ["strip_comments"]
Multiple Environment Presets
# environments.m1f-presets.yml
development:
priority: 10
global_settings:
source_directory: "./src"
output_file: "dev-bundle.txt"
verbose: true
include_dot_paths: true
security_check: "warn"
staging:
priority: 20
global_settings:
source_directory: "./src"
output_file: "stage-bundle.txt"
create_archive: true
security_check: "error"
production:
priority: 30
global_settings:
source_directory: "./dist"
output_file: "prod-bundle.txt"
minimal_output: true
quiet: true
create_archive: true
archive_type: "tar.gz"
Use with --preset-group
:
# Development build
m1f --preset environments.yml --preset-group development
# Production build
m1f --preset environments.yml --preset-group production
Debugging and Best Practices
Debugging Tips
-
Verbose Mode
m1f -s . -o out.txt --preset my.yml --verbose
Shows which preset is applied to each file and processing details.
-
Check What’s Applied
m1f -s . -o out.txt --preset my.yml --verbose 2>&1 | grep "Applying preset"
-
Validate YAML
python -c "import yaml; yaml.safe_load(open('my-preset.yml'))"
-
Test Small First Create a test directory with a few files to verify preset behavior before running on large codebases.
Best Practices
- Start Simple - Begin with basic actions, add complexity as needed
- Test Thoroughly - Use verbose mode to verify behavior
- Layer Presets - Use multiple files for base + overrides
- Document Presets - Add descriptions to groups and complex presets
- Version Control - Keep presets in your repository
- Performance First - Apply expensive actions only where needed
- Use Priority Wisely - Higher priority groups are checked first
Common Issues
-
Preset not applied
- Check pattern matching
- Verify preset group is enabled
- Use verbose mode to debug
-
Wrong action order
- Actions are applied sequentially
- Order matters (e.g., minify before strip_comments)
-
Performance issues
- Limit expensive actions to necessary files
- Use
max_file_size
to skip large files - Consider
minimal_output
mode
Version Information
This documentation is accurate as of m1f version 3.2.0.
- Previous
- HTML2MD Workflow Guide