Guides

Filtering Files in m1f Bundles

Learn how to include and exclude files when creating m1f bundles using patterns, extensions, and configuration files

Control which files are included in your m1f bundles with powerful filtering options. This guide covers everything from basic exclusions to advanced pattern matching.

Understanding Default Excludes

m1f automatically excludes many common directories and files to keep your bundles lean. You don’t need to explicitly exclude these:

Automatically Excluded Directories

# These are excluded by default - no need to add them!
- vendor/          # Composer dependencies (PHP)
- node_modules/    # NPM dependencies (JavaScript)
- build/           # Common build output directory
- dist/            # Distribution/compiled files
- cache/           # Cache directories
- .git/            # Git repository data
- .svn/            # Subversion data
- .hg/             # Mercurial data
- __pycache__/     # Python bytecode cache
- .pytest_cache/   # Pytest cache
- .mypy_cache/     # MyPy type checker cache
- .tox/            # Tox testing cache
- .coverage/       # Coverage.py data
- .eggs/           # Python eggs
- htmlcov/         # HTML coverage reports
- .idea/           # IntelliJ IDEA settings
- .vscode/         # Visual Studio Code settings

Automatically Excluded Files

- LICENSE           # License files
- package-lock.json # NPM lock file
- composer.lock     # Composer lock file
- poetry.lock       # Poetry lock file
- Pipfile.lock      # Pipenv lock file
- yarn.lock         # Yarn lock file

To include these normally excluded items, use the --no-default-excludes flag.

Basic Filtering Options

Filter by File Extensions

Include only specific file types:

# Include only Python and JavaScript files
m1f -s . -o output.txt --include-extensions .py .js

# Exclude certain file types
m1f -s . -o output.txt --exclude-extensions .log .tmp .bak

# Include only documentation files (62 extensions)
m1f -s . -o output.txt --docs-only

Filter by File Size

Skip large files to keep bundles manageable:

# Exclude files larger than 1MB
m1f -s . -o output.txt --max-file-size 1MB

# Supports various units: B, KB, MB, GB, TB
m1f -s . -o output.txt --max-file-size 500KB

Pattern-Based Filtering

Using Include Patterns

The --includes parameter supports gitignore-style patterns for precise control:

# Include all Python files in src directory
m1f -s . -o output.txt --includes "src/**/*.py"

# Multiple patterns
m1f -s . -o output.txt --includes "*.py" "src/**" "!test_*.py"

# Combine with extensions for extra precision
m1f -s . -o output.txt --include-extensions .py .js \
  --includes "src/**" "lib/**" "!**/tests/**"

Pattern syntax:

  • *.py - All Python files
  • src/** - Everything in src directory
  • !test_*.py - Exclude test files (negation)
  • src/**/*.js - All JS files under src

Using Exclude Patterns

Add custom exclusions on top of defaults:

# Exclude specific patterns
m1f -s . -o output.txt --excludes "**/tmp/**" "**/*.log" "**/secrets/**"

Using Configuration Files

Leverage Your .gitignore

Instead of manually listing excludes, use your existing .gitignore:

# .m1f.config.yml
global:
  global_settings:
    exclude_paths_file: ".gitignore"

Or use multiple exclude files:

global:
  global_settings:
    exclude_paths_file:
      - ".gitignore"      # Version control ignores
      - ".m1fignore"      # m1f-specific ignores

Create Include/Exclude Files

Create pattern files for complex filtering:

# important-files.txt
*.py
src/**/*
api/**/*
!test_*.py
# Use the include file
m1f -s . -o output.txt --include-paths-file important-files.txt

# Combine multiple files
m1f -s . -o output.txt \
  --include-paths-file core-files.txt api-files.txt \
  --exclude-paths-file test-files.txt

Configuration Examples

Minimal Configuration

Keep your .m1f.config.yml clean by only adding project-specific exclusions:

# ✅ GOOD - Only project-specific patterns
global:
  global_excludes:
    - "**/*.pyc"        # Python bytecode
    - "**/logs/**"      # Your log files
    - "**/tmp/**"       # Temporary directories
    - "/m1f/**"         # Output directory
    - "**/secrets/**"   # Sensitive data

bundles:
  - name: main
    sources:
      - "./src"
    output_file: "m1f/main.txt"

Python Project Example

global:
  global_excludes:
    # Virtual environments (not in defaults)
    - "**/venv/**"
    - "**/.venv/**"
    - "**/env/**"
    # Python-specific
    - "**/*.pyc"
    - "**/*.pyo"
    - "**/*.pyd"
    # Project-specific
    - "**/migrations/**"
    - "**/.coverage"
    - "**/htmlcov/**"

bundles:
  - name: python-app
    sources:
      - "./app"
      - "./tests"
    output_file: "m1f/python-app.txt"
    includes:
      - "*.py"
      - "requirements.txt"
      - "pyproject.toml"

Node.js Project Example

global:
  global_excludes:
    # Framework-specific (not in defaults)
    - "**/.next/**"
    - "**/.nuxt/**"
    - "**/coverage/**"
    - "**/*.log"

bundles:
  - name: frontend
    sources:
      - "./src"
    output_file: "m1f/frontend.txt"
    includes:
      - "*.js"
      - "*.jsx"
      - "*.ts"
      - "*.tsx"
      - "*.json"
      - "!*.test.js"
      - "!*.spec.js"

Documentation Bundle Example

bundles:
  - name: docs
    sources:
      - "./docs"
      - "./README.md"
      - "./CONTRIBUTING.md"
    output_file: "m1f/docs.txt"
    include_extensions:
      - .md
      - .mdx
      - .rst
      - .txt
    excludes:
      - "**/drafts/**"
      - "**/_archive/**"

Processing Order

Understanding how m1f processes filters helps you debug issues:

  1. Input files (-i) are always included, bypassing all filters
  2. Source directory (-s) files are processed through:
    • Include patterns/extensions first (if specified)
    • Then exclude patterns/extensions
    • Default excludes (unless disabled)
    • File size limits

Advanced Filtering

Hidden Files and Directories

By default, hidden files (starting with .) are excluded:

# Include hidden files
m1f -s . -o output.txt --include-dot-paths

Binary Files

Binary files are excluded by default:

# Include binary files (use with caution)
m1f -s . -o output.txt --include-binary-files

Symlinks are not followed by default:

# Follow symbolic links (beware of loops!)
m1f -s . -o output.txt --include-symlinks

Debugging Filters

Check What’s Being Excluded

Use verbose mode to see filtering in action:

m1f -s . -o test.txt --verbose

This shows:

  • Default excluded directories
  • Patterns from your config
  • Files matched by exclude patterns

Test Include Patterns

Test your patterns before adding to config:

# See what files match your pattern
m1f -s . -o test.txt --includes "src/**/*.py" --verbose

Best Practices

  1. Start Simple: Begin with default excludes and add only as needed
  2. Use .gitignore: Leverage existing ignore patterns when possible
  3. Document Patterns: Add comments explaining non-obvious excludes
  4. Test First: Use --verbose to verify your filters work correctly
  5. Be Specific: Use precise patterns to avoid accidentally excluding important files

Example with Comments

global:
  global_excludes:
    # Build artifacts specific to our toolchain
    - "**/generated/**"    # Auto-generated code from protobuf
    - "**/reports/**"      # Test coverage and lint reports
    
    # Large data files
    - "**/*.sqlite"        # Local development databases
    - "**/*.csv"           # Data exports over 10MB
    
    # Sensitive information
    - "**/.env*"           # Environment files with API keys
    - "**/secrets/**"      # SSL certificates and private keys

Common Filtering Scenarios

Create a Code-Only Bundle

bundles:
  - name: source-code
    sources:
      - "."
    output_file: "m1f/source-code.txt"
    include_extensions:
      - .py
      - .js
      - .java
      - .cpp
      - .h
    excludes:
      - "**/test/**"
      - "**/tests/**"
      - "**/*_test.*"

Create a Documentation Bundle

bundles:
  - name: all-docs
    sources:
      - "."
    output_file: "m1f/all-docs.txt"
    includes:
      - "**/*.md"
      - "**/*.rst"
      - "**/*.txt"
      - "docs/**"
      - "!**/node_modules/**"  # Exclude even from docs/

Create a Configuration Bundle

bundles:
  - name: configs
    sources:
      - "."
    output_file: "m1f/configs.txt"
    includes:
      - "*.yml"
      - "*.yaml"
      - "*.json"
      - "*.toml"
      - ".env.example"
      - "!**/node_modules/**"