Guides
Filtering Files in m1f Bundles
Learn how to include and exclude files when creating m1f bundles using patterns, extensions, and configuration files
Control which files are included in your m1f bundles with powerful filtering options. This guide covers everything from basic exclusions to advanced pattern matching.
Understanding Default Excludes
m1f automatically excludes many common directories and files to keep your bundles lean. You don’t need to explicitly exclude these:
Automatically Excluded Directories
# These are excluded by default - no need to add them!
- vendor/ # Composer dependencies (PHP)
- node_modules/ # NPM dependencies (JavaScript)
- build/ # Common build output directory
- dist/ # Distribution/compiled files
- cache/ # Cache directories
- .git/ # Git repository data
- .svn/ # Subversion data
- .hg/ # Mercurial data
- __pycache__/ # Python bytecode cache
- .pytest_cache/ # Pytest cache
- .mypy_cache/ # MyPy type checker cache
- .tox/ # Tox testing cache
- .coverage/ # Coverage.py data
- .eggs/ # Python eggs
- htmlcov/ # HTML coverage reports
- .idea/ # IntelliJ IDEA settings
- .vscode/ # Visual Studio Code settings
Automatically Excluded Files
- LICENSE # License files
- package-lock.json # NPM lock file
- composer.lock # Composer lock file
- poetry.lock # Poetry lock file
- Pipfile.lock # Pipenv lock file
- yarn.lock # Yarn lock file
To include these normally excluded items, use the --no-default-excludes
flag.
Basic Filtering Options
Filter by File Extensions
Include only specific file types:
# Include only Python and JavaScript files
m1f -s . -o output.txt --include-extensions .py .js
# Exclude certain file types
m1f -s . -o output.txt --exclude-extensions .log .tmp .bak
# Include only documentation files (62 extensions)
m1f -s . -o output.txt --docs-only
Filter by File Size
Skip large files to keep bundles manageable:
# Exclude files larger than 1MB
m1f -s . -o output.txt --max-file-size 1MB
# Supports various units: B, KB, MB, GB, TB
m1f -s . -o output.txt --max-file-size 500KB
Pattern-Based Filtering
Using Include Patterns
The --includes
parameter supports gitignore-style patterns for precise control:
# Include all Python files in src directory
m1f -s . -o output.txt --includes "src/**/*.py"
# Multiple patterns
m1f -s . -o output.txt --includes "*.py" "src/**" "!test_*.py"
# Combine with extensions for extra precision
m1f -s . -o output.txt --include-extensions .py .js \
--includes "src/**" "lib/**" "!**/tests/**"
Pattern syntax:
*.py
- All Python filessrc/**
- Everything in src directory!test_*.py
- Exclude test files (negation)src/**/*.js
- All JS files under src
Using Exclude Patterns
Add custom exclusions on top of defaults:
# Exclude specific patterns
m1f -s . -o output.txt --excludes "**/tmp/**" "**/*.log" "**/secrets/**"
Using Configuration Files
Leverage Your .gitignore
Instead of manually listing excludes, use your existing .gitignore:
# .m1f.config.yml
global:
global_settings:
exclude_paths_file: ".gitignore"
Or use multiple exclude files:
global:
global_settings:
exclude_paths_file:
- ".gitignore" # Version control ignores
- ".m1fignore" # m1f-specific ignores
Create Include/Exclude Files
Create pattern files for complex filtering:
# important-files.txt
*.py
src/**/*
api/**/*
!test_*.py
# Use the include file
m1f -s . -o output.txt --include-paths-file important-files.txt
# Combine multiple files
m1f -s . -o output.txt \
--include-paths-file core-files.txt api-files.txt \
--exclude-paths-file test-files.txt
Configuration Examples
Minimal Configuration
Keep your .m1f.config.yml
clean by only adding project-specific exclusions:
# ✅ GOOD - Only project-specific patterns
global:
global_excludes:
- "**/*.pyc" # Python bytecode
- "**/logs/**" # Your log files
- "**/tmp/**" # Temporary directories
- "/m1f/**" # Output directory
- "**/secrets/**" # Sensitive data
bundles:
- name: main
sources:
- "./src"
output_file: "m1f/main.txt"
Python Project Example
global:
global_excludes:
# Virtual environments (not in defaults)
- "**/venv/**"
- "**/.venv/**"
- "**/env/**"
# Python-specific
- "**/*.pyc"
- "**/*.pyo"
- "**/*.pyd"
# Project-specific
- "**/migrations/**"
- "**/.coverage"
- "**/htmlcov/**"
bundles:
- name: python-app
sources:
- "./app"
- "./tests"
output_file: "m1f/python-app.txt"
includes:
- "*.py"
- "requirements.txt"
- "pyproject.toml"
Node.js Project Example
global:
global_excludes:
# Framework-specific (not in defaults)
- "**/.next/**"
- "**/.nuxt/**"
- "**/coverage/**"
- "**/*.log"
bundles:
- name: frontend
sources:
- "./src"
output_file: "m1f/frontend.txt"
includes:
- "*.js"
- "*.jsx"
- "*.ts"
- "*.tsx"
- "*.json"
- "!*.test.js"
- "!*.spec.js"
Documentation Bundle Example
bundles:
- name: docs
sources:
- "./docs"
- "./README.md"
- "./CONTRIBUTING.md"
output_file: "m1f/docs.txt"
include_extensions:
- .md
- .mdx
- .rst
- .txt
excludes:
- "**/drafts/**"
- "**/_archive/**"
Processing Order
Understanding how m1f processes filters helps you debug issues:
- Input files (
-i
) are always included, bypassing all filters - Source directory (
-s
) files are processed through:- Include patterns/extensions first (if specified)
- Then exclude patterns/extensions
- Default excludes (unless disabled)
- File size limits
Advanced Filtering
Hidden Files and Directories
By default, hidden files (starting with .
) are excluded:
# Include hidden files
m1f -s . -o output.txt --include-dot-paths
Binary Files
Binary files are excluded by default:
# Include binary files (use with caution)
m1f -s . -o output.txt --include-binary-files
Symbolic Links
Symlinks are not followed by default:
# Follow symbolic links (beware of loops!)
m1f -s . -o output.txt --include-symlinks
Debugging Filters
Check What’s Being Excluded
Use verbose mode to see filtering in action:
m1f -s . -o test.txt --verbose
This shows:
- Default excluded directories
- Patterns from your config
- Files matched by exclude patterns
Test Include Patterns
Test your patterns before adding to config:
# See what files match your pattern
m1f -s . -o test.txt --includes "src/**/*.py" --verbose
Best Practices
- Start Simple: Begin with default excludes and add only as needed
- Use .gitignore: Leverage existing ignore patterns when possible
- Document Patterns: Add comments explaining non-obvious excludes
- Test First: Use
--verbose
to verify your filters work correctly - Be Specific: Use precise patterns to avoid accidentally excluding important files
Example with Comments
global:
global_excludes:
# Build artifacts specific to our toolchain
- "**/generated/**" # Auto-generated code from protobuf
- "**/reports/**" # Test coverage and lint reports
# Large data files
- "**/*.sqlite" # Local development databases
- "**/*.csv" # Data exports over 10MB
# Sensitive information
- "**/.env*" # Environment files with API keys
- "**/secrets/**" # SSL certificates and private keys
Common Filtering Scenarios
Create a Code-Only Bundle
bundles:
- name: source-code
sources:
- "."
output_file: "m1f/source-code.txt"
include_extensions:
- .py
- .js
- .java
- .cpp
- .h
excludes:
- "**/test/**"
- "**/tests/**"
- "**/*_test.*"
Create a Documentation Bundle
bundles:
- name: all-docs
sources:
- "."
output_file: "m1f/all-docs.txt"
includes:
- "**/*.md"
- "**/*.rst"
- "**/*.txt"
- "docs/**"
- "!**/node_modules/**" # Exclude even from docs/
Create a Configuration Bundle
bundles:
- name: configs
sources:
- "."
output_file: "m1f/configs.txt"
includes:
- "*.yml"
- "*.yaml"
- "*.json"
- "*.toml"
- ".env.example"
- "!**/node_modules/**"
Related Documentation
- Configuration Guide - Full
.m1f.config.yml
reference - CLI Reference - Complete command-line options
- Auto Bundle Guide - Automated bundle creation
- Best Practices - General m1f usage tips
- Previous
- scrape - Web Scraper
- Next
- Preset Reference