Security

import Callout from ’@/components/ui/Callout.astro’;

This guide documents security best practices and protective measures implemented in the m1f toolkit. Following these practices ensures safe operation and prevents common security vulnerabilities.

Overview

m1f implements multiple layers of security protection:

Path traversal protection - Prevents access to files outside intended directories
Secret detection - Automatically scans for sensitive data in files
SSRF protection - Blocks requests to internal network resources
Input validation - Validates all user inputs and configuration files
Safe command execution - Prevents command injection attacks

Path Validation and Traversal Protection

Why It Matters

Path traversal attacks can allow malicious actors to access files outside intended directories, potentially exposing sensitive system files or overwriting critical data.

Best Practices

1. Always Validate Resolved Paths

# Good practice - validate after resolving
from tools.m1f.utils import validate_safe_path

target_path = Path(user_input).resolve()
validate_safe_path(target_path, base_path)

2. Use Provided Validation Utilities

validate_safe_path() in tools/m1f/utils.py ensures paths stay within allowed boundaries
All user-provided paths should be validated before use

3. Symlink Safety

Symlinks are resolved and validated to prevent escaping directories
Target of symlinks must be within the allowed base path

Common Pitfalls to Avoid

**Security Risk**: Common mistakes that can lead to vulnerabilities:

Never use user input directly in file paths without validation
Don’t trust relative paths without resolving and validating them
Always validate paths from configuration files and presets

Security Scanning for Sensitive Data

Built-in Secret Detection

m1f includes automatic scanning for:

API keys and tokens
Passwords and credentials
Private keys
High-entropy strings that might be secrets

Security Check Modes

1. Abort Mode (Recommended)

Stops processing if secrets are found:

m1f -s ./src -o output.txt --security-check abort

2. Skip Mode

Excludes files with secrets but continues processing:

m1f -s ./src -o output.txt --security-check skip

3. Warn Mode

Logs warnings but continues processing:

m1f -s ./src -o output.txt --security-check warn

4. Disabled Mode

**Not Recommended**: Disabling security scanning leaves your data vulnerable.

m1f -s ./src -o output.txt --security-check null

Handling False Positives

If legitimate content is flagged as sensitive:

Review the warnings carefully
Use --security-check warn if you’re certain the content is safe
Consider refactoring code to avoid patterns that trigger detection

Web Scraping Security

SSRF (Server-Side Request Forgery) Protection

**Built-in Protection**: The toolkit automatically blocks access to internal network resources.

The toolkit blocks access to:

Private IP ranges (10.x.x.x, 172.16.x.x, 192.168.x.x)
Localhost and loopback addresses (127.0.0.1, ::1)
Link-local addresses (169.254.x.x)
Cloud metadata endpoints (169.254.169.254)

SSL/TLS Validation

Default Behavior

SSL certificates are validated by default.

Disabling Validation (Use with Caution)

# Only for trusted internal sites or testing
m1f-scrape --ignore-https-errors https://internal-site.com

**Warning**: Disabling SSL validation exposes you to man-in-the-middle attacks. Only use for trusted internal resources.

robots.txt Compliance

All scrapers automatically respect robots.txt files:

Automatically fetched and parsed for each domain
Scraping is blocked for disallowed paths
User-agent specific rules are respected
This is always enabled - no configuration option to disable

JavaScript Execution Safety

**Caution**: Only execute JavaScript from trusted sources.

When using Playwright with custom scripts:

Scripts are validated for dangerous patterns
Avoid executing untrusted JavaScript code
Use built-in actions instead of custom scripts when possible

Command Injection Prevention

Safe Command Execution

The toolkit uses proper escaping for all system commands:

# Good - using shlex.quote()
import shlex
command = f"httrack {shlex.quote(url)} -O {shlex.quote(output_dir)}"

# Bad - direct string interpolation
command = f"httrack {url} -O {output_dir}"  # DON'T DO THIS

**Never use direct string interpolation** for system commands - it can lead to command injection vulnerabilities.

Preset System Security

File Size Limits

Preset files are limited to 10MB to prevent memory exhaustion
Large preset files are rejected with an error

Path Validation in Presets

All paths in preset files are validated
Paths cannot escape the project directory
Absolute paths outside the project are blocked

Custom Processor Validation

Processor names must be alphanumeric with underscores only
Special characters that could enable code injection are blocked

Secure Temporary File Handling

The toolkit uses Python’s tempfile module for all temporary files:

Temporary directories are created with restricted permissions
All temporary files are cleaned up after use
No sensitive data is left in temporary locations

Input Validation Best Practices

File Type Validation

Use include/exclude patterns to limit processed file types
Be explicit about allowed file extensions
Validate file contents match expected formats

Size and Resource Limits

Set appropriate limits for file sizes
Use --max-file-size to prevent processing huge files
Monitor memory usage for large file sets

Encoding Safety

The toolkit automatically detects file encodings
UTF-8 is preferred for text files by default
Binary files are handled safely without interpretation

Configuration Security

Secure Configuration Files

# Example secure configuration
bundles:
  secure-bundle:
    description: "Security-focused bundle"
    output: "secure/bundle.txt"
    
    # Enable strict security checking
    security_check: "abort"
    
    # Limit file sizes
    max_file_size: "1MB"
    
    # Exclude sensitive patterns
    exclude_patterns:
      - "**/*.key"
      - "**/*.pem"
      - "**/.env*"
      - "**/secrets/**"
      - "**/config/database*"
    
    # Only include specific file types
    include_extensions:
      - ".py"
      - ".js"
      - ".md"
      - ".yml"

Environment-Specific Security

# Different security levels for different environments
development:
  global_settings:
    security_check: "warn"  # More lenient for development
    
production:
  global_settings:
    security_check: "abort"  # Strict for production
    exclude_patterns:
      - "**/*.test.*"
      - "**/*.spec.*"
      - "**/debug/**"

Deployment Security Recommendations

Environment Configuration

Run with minimal required permissions
Use dedicated service accounts when possible
Avoid running as root/administrator

Network Security

Use HTTPS for all web scraping when possible
Configure firewall rules to limit outbound connections
Monitor for unusual network activity

Logging and Monitoring

Enable verbose logging for security-sensitive operations
Review logs regularly for suspicious patterns
Set up alerts for security check failures

Security Checklist for Users

**Production Readiness**: Complete this checklist before deploying m1f in production environments.

Before running m1f in production:

Validate all input paths and patterns
Review security check mode settings
Enable SSL validation for web scraping
Set appropriate file size limits
Use minimal required permissions
Review preset files for suspicious content
Test security scanning on sample data
Configure proper logging and monitoring
Keep the toolkit updated to the latest version

Common Security Patterns

# Create a secure bundle for sharing with external parties
m1f -s ./src -o external-bundle.txt \
  --security-check abort \
  --exclude-patterns "**/*.env*" "**/*.key" "**/secrets/**" \
  --max-file-size 500KB \
  --include-extensions .py .js .md .yml

Internal Development Bundle

# More permissive for internal development
m1f -s ./src -o dev-bundle.txt \
  --security-check warn \
  --exclude-paths-file .gitignore \
  --max-file-size 2MB

Production Deployment Bundle

# Strict security for production
m1f -s ./src -o prod-bundle.txt \
  --security-check abort \
  --exclude-patterns "**/*.test.*" "**/*.spec.*" "**/debug/**" \
  --max-file-size 1MB \
  --minimal-output

Security Monitoring

Log Analysis

Monitor m1f logs for:

Security check failures
Unusual file access patterns
Large file processing attempts
Failed path validations

Automated Security Checks

#!/bin/bash
# Security monitoring script
m1f -s ./src -o /tmp/security-check.txt \
  --security-check abort \
  --verbose 2>&1 | grep -E "(SECURITY|ERROR|WARNING)"

Incident Response

If Security Issues Are Detected

Stop processing immediately
Review the flagged content
Determine if it’s a false positive
Update exclusion patterns if needed
Restart with appropriate security settings

Security Audit Trail

# Create an audit trail of security checks
m1f -s ./src -o audit-bundle.txt \
  --security-check abort \
  --verbose \
  --log-file security-audit.log

Updates and Security Patches

Stay informed about security updates:

Check the CHANGELOG for security-related fixes
Update to new versions promptly
Review breaking changes that might affect security
Subscribe to security notifications

Reporting Security Issues

If you discover a security vulnerability in m1f:

Do NOT open a public issue
Email security details to the maintainers
Include steps to reproduce the issue
Allow time for a fix before public disclosure

Advanced Security Features

Per-File Security Settings

# Example preset with per-file security settings
security_preset:
  global_settings:
    security_check: "abort"  # Default strict
    
    extensions:
      .md:
        security_check: null  # Disable for markdown
      .py:
        security_check: "abort"  # Strict for Python
      .js:
        security_check: "warn"  # Warn for JavaScript
      .env:
        security_check: "abort"  # Very strict for env files

Content-Based Security

# Security based on file content patterns
content_security:
  presets:
    sensitive_files:
      patterns: ["**/config/**", "**/secrets/**"]
      security_check: "abort"
      max_file_size: "10KB"
      
    public_files:
      patterns: ["**/public/**", "**/static/**"]
      security_check: "warn"

Best Practices Summary

Always enable security scanning in production environments
Use the strictest security settings appropriate for your use case
Regularly review and update exclusion patterns
Monitor logs for security-related events
Test security configurations before deployment
Keep the toolkit updated with the latest security patches
Follow the principle of least privilege for file access
Document your security configurations for team members

Auto Bundle - Automated bundling with security considerations
Presets - Advanced configuration with security settings
Claude Integration - AI-powered security configuration
CLI Reference - Complete command-line security options

Next Steps

Assess your security requirements for different environments
Configure appropriate security settings for your use cases
Test security configurations with sample data
Set up monitoring and alerting for security events
Train team members on security best practices

Remember: Security is a shared responsibility. While m1f implements many protective measures, proper configuration and usage are essential for maintaining a secure environment.