Guides
Security
Security best practices and protective measures for safe m1f operation
import Callout from ’@/components/ui/Callout.astro’;
This guide documents security best practices and protective measures implemented in the m1f toolkit. Following these practices ensures safe operation and prevents common security vulnerabilities.
Overview
m1f implements multiple layers of security protection:
- Path traversal protection - Prevents access to files outside intended directories
- Secret detection - Automatically scans for sensitive data in files
- SSRF protection - Blocks requests to internal network resources
- Input validation - Validates all user inputs and configuration files
- Safe command execution - Prevents command injection attacks
Path Validation and Traversal Protection
Why It Matters
Path traversal attacks can allow malicious actors to access files outside intended directories, potentially exposing sensitive system files or overwriting critical data.
Best Practices
1. Always Validate Resolved Paths
# Good practice - validate after resolving
from tools.m1f.utils import validate_safe_path
target_path = Path(user_input).resolve()
validate_safe_path(target_path, base_path)
2. Use Provided Validation Utilities
validate_safe_path()
intools/m1f/utils.py
ensures paths stay within allowed boundaries- All user-provided paths should be validated before use
3. Symlink Safety
- Symlinks are resolved and validated to prevent escaping directories
- Target of symlinks must be within the allowed base path
Common Pitfalls to Avoid
- Never use user input directly in file paths without validation
- Don’t trust relative paths without resolving and validating them
- Always validate paths from configuration files and presets
Security Scanning for Sensitive Data
Built-in Secret Detection
m1f includes automatic scanning for:
- API keys and tokens
- Passwords and credentials
- Private keys
- High-entropy strings that might be secrets
Security Check Modes
1. Abort Mode (Recommended)
Stops processing if secrets are found:
m1f -s ./src -o output.txt --security-check abort
2. Skip Mode
Excludes files with secrets but continues processing:
m1f -s ./src -o output.txt --security-check skip
3. Warn Mode
Logs warnings but continues processing:
m1f -s ./src -o output.txt --security-check warn
4. Disabled Mode
m1f -s ./src -o output.txt --security-check null
Handling False Positives
If legitimate content is flagged as sensitive:
- Review the warnings carefully
- Use
--security-check warn
if you’re certain the content is safe - Consider refactoring code to avoid patterns that trigger detection
Web Scraping Security
SSRF (Server-Side Request Forgery) Protection
The toolkit blocks access to:
- Private IP ranges (10.x.x.x, 172.16.x.x, 192.168.x.x)
- Localhost and loopback addresses (127.0.0.1, ::1)
- Link-local addresses (169.254.x.x)
- Cloud metadata endpoints (169.254.169.254)
SSL/TLS Validation
Default Behavior
SSL certificates are validated by default.
Disabling Validation (Use with Caution)
# Only for trusted internal sites or testing
m1f-scrape --ignore-https-errors https://internal-site.com
robots.txt Compliance
All scrapers automatically respect robots.txt files:
- Automatically fetched and parsed for each domain
- Scraping is blocked for disallowed paths
- User-agent specific rules are respected
- This is always enabled - no configuration option to disable
JavaScript Execution Safety
When using Playwright with custom scripts:
- Scripts are validated for dangerous patterns
- Avoid executing untrusted JavaScript code
- Use built-in actions instead of custom scripts when possible
Command Injection Prevention
Safe Command Execution
The toolkit uses proper escaping for all system commands:
# Good - using shlex.quote()
import shlex
command = f"httrack {shlex.quote(url)} -O {shlex.quote(output_dir)}"
# Bad - direct string interpolation
command = f"httrack {url} -O {output_dir}" # DON'T DO THIS
Preset System Security
File Size Limits
- Preset files are limited to 10MB to prevent memory exhaustion
- Large preset files are rejected with an error
Path Validation in Presets
- All paths in preset files are validated
- Paths cannot escape the project directory
- Absolute paths outside the project are blocked
Custom Processor Validation
- Processor names must be alphanumeric with underscores only
- Special characters that could enable code injection are blocked
Secure Temporary File Handling
The toolkit uses Python’s tempfile
module for all temporary files:
- Temporary directories are created with restricted permissions
- All temporary files are cleaned up after use
- No sensitive data is left in temporary locations
Input Validation Best Practices
File Type Validation
- Use include/exclude patterns to limit processed file types
- Be explicit about allowed file extensions
- Validate file contents match expected formats
Size and Resource Limits
- Set appropriate limits for file sizes
- Use
--max-file-size
to prevent processing huge files - Monitor memory usage for large file sets
Encoding Safety
- The toolkit automatically detects file encodings
- UTF-8 is preferred for text files by default
- Binary files are handled safely without interpretation
Configuration Security
Secure Configuration Files
# Example secure configuration
bundles:
secure-bundle:
description: "Security-focused bundle"
output: "secure/bundle.txt"
# Enable strict security checking
security_check: "abort"
# Limit file sizes
max_file_size: "1MB"
# Exclude sensitive patterns
exclude_patterns:
- "**/*.key"
- "**/*.pem"
- "**/.env*"
- "**/secrets/**"
- "**/config/database*"
# Only include specific file types
include_extensions:
- ".py"
- ".js"
- ".md"
- ".yml"
Environment-Specific Security
# Different security levels for different environments
development:
global_settings:
security_check: "warn" # More lenient for development
production:
global_settings:
security_check: "abort" # Strict for production
exclude_patterns:
- "**/*.test.*"
- "**/*.spec.*"
- "**/debug/**"
Deployment Security Recommendations
Environment Configuration
- Run with minimal required permissions
- Use dedicated service accounts when possible
- Avoid running as root/administrator
Network Security
- Use HTTPS for all web scraping when possible
- Configure firewall rules to limit outbound connections
- Monitor for unusual network activity
Logging and Monitoring
- Enable verbose logging for security-sensitive operations
- Review logs regularly for suspicious patterns
- Set up alerts for security check failures
Security Checklist for Users
Before running m1f in production:
- Validate all input paths and patterns
- Review security check mode settings
- Enable SSL validation for web scraping
- Set appropriate file size limits
- Use minimal required permissions
- Review preset files for suspicious content
- Test security scanning on sample data
- Configure proper logging and monitoring
- Keep the toolkit updated to the latest version
Common Security Patterns
Secure Bundle for External Sharing
# Create a secure bundle for sharing with external parties
m1f -s ./src -o external-bundle.txt \
--security-check abort \
--exclude-patterns "**/*.env*" "**/*.key" "**/secrets/**" \
--max-file-size 500KB \
--include-extensions .py .js .md .yml
Internal Development Bundle
# More permissive for internal development
m1f -s ./src -o dev-bundle.txt \
--security-check warn \
--exclude-paths-file .gitignore \
--max-file-size 2MB
Production Deployment Bundle
# Strict security for production
m1f -s ./src -o prod-bundle.txt \
--security-check abort \
--exclude-patterns "**/*.test.*" "**/*.spec.*" "**/debug/**" \
--max-file-size 1MB \
--minimal-output
Security Monitoring
Log Analysis
Monitor m1f logs for:
- Security check failures
- Unusual file access patterns
- Large file processing attempts
- Failed path validations
Automated Security Checks
#!/bin/bash
# Security monitoring script
m1f -s ./src -o /tmp/security-check.txt \
--security-check abort \
--verbose 2>&1 | grep -E "(SECURITY|ERROR|WARNING)"
Incident Response
If Security Issues Are Detected
- Stop processing immediately
- Review the flagged content
- Determine if it’s a false positive
- Update exclusion patterns if needed
- Restart with appropriate security settings
Security Audit Trail
# Create an audit trail of security checks
m1f -s ./src -o audit-bundle.txt \
--security-check abort \
--verbose \
--log-file security-audit.log
Updates and Security Patches
Stay informed about security updates:
- Check the CHANGELOG for security-related fixes
- Update to new versions promptly
- Review breaking changes that might affect security
- Subscribe to security notifications
Reporting Security Issues
If you discover a security vulnerability in m1f:
- Do NOT open a public issue
- Email security details to the maintainers
- Include steps to reproduce the issue
- Allow time for a fix before public disclosure
Advanced Security Features
Per-File Security Settings
# Example preset with per-file security settings
security_preset:
global_settings:
security_check: "abort" # Default strict
extensions:
.md:
security_check: null # Disable for markdown
.py:
security_check: "abort" # Strict for Python
.js:
security_check: "warn" # Warn for JavaScript
.env:
security_check: "abort" # Very strict for env files
Content-Based Security
# Security based on file content patterns
content_security:
presets:
sensitive_files:
patterns: ["**/config/**", "**/secrets/**"]
security_check: "abort"
max_file_size: "10KB"
public_files:
patterns: ["**/public/**", "**/static/**"]
security_check: "warn"
Best Practices Summary
- Always enable security scanning in production environments
- Use the strictest security settings appropriate for your use case
- Regularly review and update exclusion patterns
- Monitor logs for security-related events
- Test security configurations before deployment
- Keep the toolkit updated with the latest security patches
- Follow the principle of least privilege for file access
- Document your security configurations for team members
Related Topics
- Auto Bundle - Automated bundling with security considerations
- Presets - Advanced configuration with security settings
- Claude Integration - AI-powered security configuration
- CLI Reference - Complete command-line security options
Next Steps
- Assess your security requirements for different environments
- Configure appropriate security settings for your use cases
- Test security configurations with sample data
- Set up monitoring and alerting for security events
- Train team members on security best practices
Remember: Security is a shared responsibility. While m1f implements many protective measures, proper configuration and usage are essential for maintaining a secure environment.
- Previous
- html2md - HTML to Markdown Converter