Guides

Security

Security best practices and protective measures for safe m1f operation

import Callout from ’@/components/ui/Callout.astro’;

This guide documents security best practices and protective measures implemented in the m1f toolkit. Following these practices ensures safe operation and prevents common security vulnerabilities.

Overview

m1f implements multiple layers of security protection:

  • Path traversal protection - Prevents access to files outside intended directories
  • Secret detection - Automatically scans for sensitive data in files
  • SSRF protection - Blocks requests to internal network resources
  • Input validation - Validates all user inputs and configuration files
  • Safe command execution - Prevents command injection attacks

Path Validation and Traversal Protection

Why It Matters

Path traversal attacks can allow malicious actors to access files outside intended directories, potentially exposing sensitive system files or overwriting critical data.

Best Practices

1. Always Validate Resolved Paths

# Good practice - validate after resolving
from tools.m1f.utils import validate_safe_path

target_path = Path(user_input).resolve()
validate_safe_path(target_path, base_path)

2. Use Provided Validation Utilities

  • validate_safe_path() in tools/m1f/utils.py ensures paths stay within allowed boundaries
  • All user-provided paths should be validated before use
  • Symlinks are resolved and validated to prevent escaping directories
  • Target of symlinks must be within the allowed base path

Common Pitfalls to Avoid

**Security Risk**: Common mistakes that can lead to vulnerabilities:
  • Never use user input directly in file paths without validation
  • Don’t trust relative paths without resolving and validating them
  • Always validate paths from configuration files and presets

Security Scanning for Sensitive Data

Built-in Secret Detection

m1f includes automatic scanning for:

  • API keys and tokens
  • Passwords and credentials
  • Private keys
  • High-entropy strings that might be secrets

Security Check Modes

Stops processing if secrets are found:

m1f -s ./src -o output.txt --security-check abort

2. Skip Mode

Excludes files with secrets but continues processing:

m1f -s ./src -o output.txt --security-check skip

3. Warn Mode

Logs warnings but continues processing:

m1f -s ./src -o output.txt --security-check warn

4. Disabled Mode

**Not Recommended**: Disabling security scanning leaves your data vulnerable.
m1f -s ./src -o output.txt --security-check null

Handling False Positives

If legitimate content is flagged as sensitive:

  1. Review the warnings carefully
  2. Use --security-check warn if you’re certain the content is safe
  3. Consider refactoring code to avoid patterns that trigger detection

Web Scraping Security

SSRF (Server-Side Request Forgery) Protection

**Built-in Protection**: The toolkit automatically blocks access to internal network resources.

The toolkit blocks access to:

  • Private IP ranges (10.x.x.x, 172.16.x.x, 192.168.x.x)
  • Localhost and loopback addresses (127.0.0.1, ::1)
  • Link-local addresses (169.254.x.x)
  • Cloud metadata endpoints (169.254.169.254)

SSL/TLS Validation

Default Behavior

SSL certificates are validated by default.

Disabling Validation (Use with Caution)

# Only for trusted internal sites or testing
m1f-scrape --ignore-https-errors https://internal-site.com
**Warning**: Disabling SSL validation exposes you to man-in-the-middle attacks. Only use for trusted internal resources.

robots.txt Compliance

All scrapers automatically respect robots.txt files:

  • Automatically fetched and parsed for each domain
  • Scraping is blocked for disallowed paths
  • User-agent specific rules are respected
  • This is always enabled - no configuration option to disable

JavaScript Execution Safety

**Caution**: Only execute JavaScript from trusted sources.

When using Playwright with custom scripts:

  • Scripts are validated for dangerous patterns
  • Avoid executing untrusted JavaScript code
  • Use built-in actions instead of custom scripts when possible

Command Injection Prevention

Safe Command Execution

The toolkit uses proper escaping for all system commands:

# Good - using shlex.quote()
import shlex
command = f"httrack {shlex.quote(url)} -O {shlex.quote(output_dir)}"

# Bad - direct string interpolation
command = f"httrack {url} -O {output_dir}"  # DON'T DO THIS
**Never use direct string interpolation** for system commands - it can lead to command injection vulnerabilities.

Preset System Security

File Size Limits

  • Preset files are limited to 10MB to prevent memory exhaustion
  • Large preset files are rejected with an error

Path Validation in Presets

  • All paths in preset files are validated
  • Paths cannot escape the project directory
  • Absolute paths outside the project are blocked

Custom Processor Validation

  • Processor names must be alphanumeric with underscores only
  • Special characters that could enable code injection are blocked

Secure Temporary File Handling

The toolkit uses Python’s tempfile module for all temporary files:

  • Temporary directories are created with restricted permissions
  • All temporary files are cleaned up after use
  • No sensitive data is left in temporary locations

Input Validation Best Practices

File Type Validation

  • Use include/exclude patterns to limit processed file types
  • Be explicit about allowed file extensions
  • Validate file contents match expected formats

Size and Resource Limits

  • Set appropriate limits for file sizes
  • Use --max-file-size to prevent processing huge files
  • Monitor memory usage for large file sets

Encoding Safety

  • The toolkit automatically detects file encodings
  • UTF-8 is preferred for text files by default
  • Binary files are handled safely without interpretation

Configuration Security

Secure Configuration Files

# Example secure configuration
bundles:
  secure-bundle:
    description: "Security-focused bundle"
    output: "secure/bundle.txt"
    
    # Enable strict security checking
    security_check: "abort"
    
    # Limit file sizes
    max_file_size: "1MB"
    
    # Exclude sensitive patterns
    exclude_patterns:
      - "**/*.key"
      - "**/*.pem"
      - "**/.env*"
      - "**/secrets/**"
      - "**/config/database*"
    
    # Only include specific file types
    include_extensions:
      - ".py"
      - ".js"
      - ".md"
      - ".yml"

Environment-Specific Security

# Different security levels for different environments
development:
  global_settings:
    security_check: "warn"  # More lenient for development
    
production:
  global_settings:
    security_check: "abort"  # Strict for production
    exclude_patterns:
      - "**/*.test.*"
      - "**/*.spec.*"
      - "**/debug/**"

Deployment Security Recommendations

Environment Configuration

  1. Run with minimal required permissions
  2. Use dedicated service accounts when possible
  3. Avoid running as root/administrator

Network Security

  1. Use HTTPS for all web scraping when possible
  2. Configure firewall rules to limit outbound connections
  3. Monitor for unusual network activity

Logging and Monitoring

  1. Enable verbose logging for security-sensitive operations
  2. Review logs regularly for suspicious patterns
  3. Set up alerts for security check failures

Security Checklist for Users

**Production Readiness**: Complete this checklist before deploying m1f in production environments.

Before running m1f in production:

  • Validate all input paths and patterns
  • Review security check mode settings
  • Enable SSL validation for web scraping
  • Set appropriate file size limits
  • Use minimal required permissions
  • Review preset files for suspicious content
  • Test security scanning on sample data
  • Configure proper logging and monitoring
  • Keep the toolkit updated to the latest version

Common Security Patterns

Secure Bundle for External Sharing

# Create a secure bundle for sharing with external parties
m1f -s ./src -o external-bundle.txt \
  --security-check abort \
  --exclude-patterns "**/*.env*" "**/*.key" "**/secrets/**" \
  --max-file-size 500KB \
  --include-extensions .py .js .md .yml

Internal Development Bundle

# More permissive for internal development
m1f -s ./src -o dev-bundle.txt \
  --security-check warn \
  --exclude-paths-file .gitignore \
  --max-file-size 2MB

Production Deployment Bundle

# Strict security for production
m1f -s ./src -o prod-bundle.txt \
  --security-check abort \
  --exclude-patterns "**/*.test.*" "**/*.spec.*" "**/debug/**" \
  --max-file-size 1MB \
  --minimal-output

Security Monitoring

Log Analysis

Monitor m1f logs for:

  • Security check failures
  • Unusual file access patterns
  • Large file processing attempts
  • Failed path validations

Automated Security Checks

#!/bin/bash
# Security monitoring script
m1f -s ./src -o /tmp/security-check.txt \
  --security-check abort \
  --verbose 2>&1 | grep -E "(SECURITY|ERROR|WARNING)"

Incident Response

If Security Issues Are Detected

  1. Stop processing immediately
  2. Review the flagged content
  3. Determine if it’s a false positive
  4. Update exclusion patterns if needed
  5. Restart with appropriate security settings

Security Audit Trail

# Create an audit trail of security checks
m1f -s ./src -o audit-bundle.txt \
  --security-check abort \
  --verbose \
  --log-file security-audit.log

Updates and Security Patches

Stay informed about security updates:

  • Check the CHANGELOG for security-related fixes
  • Update to new versions promptly
  • Review breaking changes that might affect security
  • Subscribe to security notifications

Reporting Security Issues

If you discover a security vulnerability in m1f:

  1. Do NOT open a public issue
  2. Email security details to the maintainers
  3. Include steps to reproduce the issue
  4. Allow time for a fix before public disclosure

Advanced Security Features

Per-File Security Settings

# Example preset with per-file security settings
security_preset:
  global_settings:
    security_check: "abort"  # Default strict
    
    extensions:
      .md:
        security_check: null  # Disable for markdown
      .py:
        security_check: "abort"  # Strict for Python
      .js:
        security_check: "warn"  # Warn for JavaScript
      .env:
        security_check: "abort"  # Very strict for env files

Content-Based Security

# Security based on file content patterns
content_security:
  presets:
    sensitive_files:
      patterns: ["**/config/**", "**/secrets/**"]
      security_check: "abort"
      max_file_size: "10KB"
      
    public_files:
      patterns: ["**/public/**", "**/static/**"]
      security_check: "warn"

Best Practices Summary

  1. Always enable security scanning in production environments
  2. Use the strictest security settings appropriate for your use case
  3. Regularly review and update exclusion patterns
  4. Monitor logs for security-related events
  5. Test security configurations before deployment
  6. Keep the toolkit updated with the latest security patches
  7. Follow the principle of least privilege for file access
  8. Document your security configurations for team members

Next Steps

  1. Assess your security requirements for different environments
  2. Configure appropriate security settings for your use cases
  3. Test security configurations with sample data
  4. Set up monitoring and alerting for security events
  5. Train team members on security best practices

Remember: Security is a shared responsibility. While m1f implements many protective measures, proper configuration and usage are essential for maintaining a secure environment.