How I Analyze 200+ File Codebases in Minutes, Not Hours
Using m1f's Intelligent Bundles to Speed Up Code Analysis by 10x
Last week, I analyzed a 224-file Python codebase in 15 minutes. Here’s exactly how I did it using m1f’s intelligent bundles to give AI precisely the context it needs for fast, comprehensive analysis.
Here’s the brutal truth about analyzing large codebases: You’re either spending hours navigating through hundreds of files, trying to understand how everything connects, or you’re feeding your AI random snippets and getting shallow, incomplete answers. Most developers are stuck with this lose-lose situation, burning precious time on what should be a straightforward task.
Enter m1f (make 1 file) - a tool that transforms how we analyze codebases by creating intelligent, topic-specific bundles. Instead of drowning in file navigation or overwhelming your AI with millions of tokens, m1f helps you go from “I need to understand this codebase” to “I have actionable insights” in minutes, not hours.
In this post, I’ll walk you through my exact workflow using the Marker project - a 224-file Python tool for PDF conversion - as a real-world example. You’ll see every step and how long each one takes.
My 15-Minute Analysis Workflow
The Complete Timeline
Here’s exactly how I analyze a 200+ file codebase:
- Minutes 1-2: Run
m1f-init
to create initial bundles - Minutes 3-5: Run
m1f-claude --setup
for intelligent bundle creation - Minutes 5-6: Generate the bundles with
m1f-update
- Minutes 7-15: Upload specific bundles to Claude and get insights
Total time: 15 minutes from git clone to actionable insights
The Challenge: Why Manual Analysis Fails
When you try to analyze a codebase manually, you’re fighting multiple battles:
- Navigation Overhead: Jumping between 200+ files takes forever
- Mental Context Switching: Each file jump breaks your flow
- Missing Connections: You can’t hold the entire dependency graph in your head
- AI Token Limits: You end up cherry-picking files, missing crucial context
- Repetitive Work: Every new analysis starts from scratch
Getting Started with m1f
Let’s clone the Marker project and initialize m1f:
git clone git@github.com:datalab-to/marker.git
cd marker
m1f-init
Here’s what m1f does automatically:
🚀 m1f Project Initialization
==================================================
📋 Setting up m1f documentation...
✅ Created symlink: m1f/m1f.txt -> /home/m1f-user/git/m1f/m1f/m1f/87_m1f_only_docs.txt
✅ Git repository detected in current directory
⚠️ No m1f configuration found - will create one
📊 Project Analysis
==============================
Analyzing project structure...
✅ Found 224 files in 43 directories
📁 Project Type: Python Project
💻 Programming Languages: Python (208 files)
📦 Creating Initial Bundles
==============================
Creating complete project bundle...
✅ Created: m1f/marker_complete.txt
Creating documentation bundle...
✅ Created: m1f/marker_docs.txt
✏️ Creating .m1f.config.yml...
✅ Configuration created: .m1f.config.yml
✅ Quick Setup Complete!
📁 Here are your files:
• m1f/m1f.txt (symlink)
• m1f/marker_complete.txt
• m1f/marker_docs.txt
• .m1f.config.yml
📌 Next Steps:
1. Use 'm1f-update' to regenerate bundles after changes
2. Reference @m1f/m1f.txt in AI tools for m1f documentation
3. Preview your bundle: cat m1f/marker_complete.txt | head -50
🚀 Additional Setup Available!
For topic-specific bundles (components, API, tests, etc.), run:
m1f-claude --setup
This will:
• Analyze your project structure in detail
• Create focused bundles for different aspects
• Optimize configuration for your project type
Within seconds, m1f has analyzed the entire project structure and created two initial bundles. But how large are these files? Let’s check:
Understanding Token Limits: Why 90% Gets Left Behind
Remember our headline question? Here’s the proof. AI models have strict token limits, and most codebases far exceed them:
m1f-token-counter marker_complete.txt
The file 'marker_complete.txt' contains approximately 3385753 tokens (using 'cl100k_base' encoding).
Token Limit Reality Check
That’s 3.4 million tokens! Here’s what each AI can actually see:
- Google Gemini Pro 2.5 (1M tokens): 31% of the codebase - still missing 69%!
- Claude Opus 4 Code (256k): Only 7.5% of the codebase
- Claude Web (5MB): ~15-20% at best
- ChatGPT o3 (200k): Just 5.9% of the codebase
- Grok 4 (256k): Only 7.5% of the codebase
Even the best AI model misses 70% of your code. Most miss over 90%!
The Hidden Problem: AI Prioritizes Without Your Context
But here’s an even bigger issue that token limits don’t solve: Even if an AI could process your entire codebase, it will prioritize based on its own logic - which rarely aligns with your specific needs.
The Prioritization Challenge
Imagine dumping 3.4 million tokens on an AI and asking “Find the security issues.” The AI will:
- Apply its own prioritization criteria without explaining why
- Might focus on minor style issues while missing critical auth flaws
- Analyze test files and documentation with the same intensity as core logic
- Produce a response based on patterns it thinks are important - not what YOU need
The AI doesn’t tell you how or why it prioritized certain areas - and worse, its priorities might lead you in the wrong direction. That’s exactly what m1f solves by letting YOU control the focus.
Let’s check the documentation bundle:
m1f-token-counter marker_docs.txt
The file 'marker_docs.txt' contains approximately 162861 tokens (using 'cl100k_base' encoding).
Pro Tip
Still 163K tokens - quite large. This is where m1f’s intelligent bundling comes into play.
The Solution: Intelligent, Prioritized Bundles
This is where m1f’s true power shines. Instead of dumping everything on the AI, m1f creates intelligent, topic-specific bundles that group related code together. You maintain control over what the AI sees and in what context.
Automatic Bundle Creation
Let’s use Claude to intelligently split the project into focused bundles:
m1f-claude --setup
🤖 m1f Setup with Claude
==================================================
This command adds topic-specific bundles to your existing m1f setup.
✅ Prerequisites:
• Run 'm1f-init' first to create basic bundles
• Claude Code must be installed
• .m1f.config.yml should exist
📝 Project Information
==================================================
Please provide some information about your project to help create better bundles.
📋 Project Description
Describe your project briefly (what it does, main technologies):
> its a tool for converting pdfs to markdown
🎯 Project Priorities
What's important for this project? (e.g., performance, security, maintainability, documentation):
> documentation
✅ Git repository detected: /home/m1f-user/tmp/marker
✅ m1f documentation available
✅ m1f configuration found: .m1f.config.yml
✅ Claude Code is available
📊 Project Analysis
==============================
Analyzing project structure...
📄 Created file list: project_analysis_filelist.txt
📁 Created directory list: project_analysis_dirlist.txt
✅ Found 224 files in 43 directories
📁 Project Type: Not specified
💻 Languages: Python (208 files)
📝 User Description: its a tool for converting pdfs to markdown
🎯 User Priorities: documentation
🤖 Creating Topic-Specific Bundles
──────────────────────────────────────────────────
Claude will analyze your project and create focused bundles.
🤖 Sending to Claude Code...
⏳ Claude will now analyze your project and create topic-specific bundles...
⚠️ IMPORTANT: This process may take 1-3 minutes as Claude:
• Reads and analyzes all project files
• Understands your project structure
• Creates intelligent bundle configurations
🔄 Please wait while Claude works...
[ 92.7s] I've successfully created topic-specific bundles for the Marker PDF to Markdown conversion project. Here's what I've created:
[ 92.7s]
[ 92.7s] ## 📦 Created Bundles (27 total):
[ 92.7s]
[ 92.7s] ### 🚀 Quick Start & Reference
[ 92.7s] - **quick-reference.txt** - README and main conversion scripts
[ 92.7s]
[ 92.7s] ### 🔧 Core API Bundles
[ 92.7s] - **api-01-core.txt** - Core settings, models, logger, utilities
[ 92.7s] - **api-02-builders.txt** - Document builders (layout, OCR, structure)
[ 92.7s] - **api-03-converters.txt** - Format converters (PDF, table, extraction)
[ 92.7s] - **api-04-processors-core.txt** - Core text processors (non-LLM)
[ 92.7s] - **api-05-processors-llm.txt** - LLM-powered processors
[ 92.7s] - **api-06-providers.txt** - Input providers (PDF, HTML, image)
[ 92.7s] - **api-07-renderers.txt** - Output renderers (Markdown, HTML, JSON)
[ 92.7s] - **api-08-schema.txt** - Data structures and document models
[ 92.7s] - **api-09-services.txt** - External service integrations (OpenAI, Claude, Gemini)
[ 92.7s] - **api-10-config.txt** - Configuration system
[ 92.7s]
[ 92.7s] ### 📜 Scripts & Applications
[ 92.7s] - **scripts-01-cli.txt** - CLI conversion scripts
[ 92.7s] - **scripts-02-apps.txt** - Web applications (Streamlit, extraction app)
[ 92.7s]
[ 92.7s] ### 🧪 Test Bundles
[ 92.7s] - **tests-01-builders.txt** - Builder module tests
[ 92.7s] - **tests-02-processors.txt** - Processor module tests
[ 92.7s] - **tests-03-other.txt** - Tests for converters, providers, renderers
[ 92.7s]
[ 92.7s] ### 📊 Benchmarks
[ 92.7s] - **benchmarks-01-overall.txt** - Overall performance benchmarks
[ 92.7s] - **benchmarks-02-methods.txt** - Conversion methods and scorers
[ 92.7s] - **benchmarks-03-other.txt** - Table conversion and throughput tests
[ 92.7s]
[ 92.7s] ### 📁 Other Bundles
[ 92.7s] - **config-files.txt** - Configuration files (pyproject.toml, etc.)
[ 92.7s] - **examples.txt** - Example markdown and JSON outputs
[ 92.7s] - **common-errors.txt** - Placeholder for error documentation
[ 92.7s] - **best-practices.txt** - Placeholder for best practices guide
[ 92.7s]
[ 92.7s] Each bundle is designed to be under 180KB for optimal Claude Code performance. The bundles are organized by logical modules, making it easy to prov...
[ 92.7s]
[ 92.7s] To generate these bundles, run:
[ 92.7s] m1f-update
[ 92.7s]
[ 92.7s] Or to generate a specific bundle:
[ 92.7s] m1f auto-bundle api-01-core
✅ Claude completed in 92.9s
✅ Phase 1 complete: Topic-specific bundles added!
📝 Claude has analyzed your project and updated .m1f.config.yml
The Power of Intelligent Bundling
Look at what just happened! m1f didn’t just split files randomly. It:
- Grouped related functionality (all builders together, all processors together)
- Separated concerns (tests separate from implementation, benchmarks isolated)
- Created logical entry points (quick-reference.txt for getting started)
- Maintained relationships (API bundles numbered to show dependencies)
Now you can tell your AI: “Review the authentication flow” and upload just api-01-core.txt
and api-09-services.txt
- giving it exactly what it needs, nothing more.
Let’s continue setting up:
🔄 Phase 2: Generating bundles and verifying configuration...
⏳ Running m1f-update to generate bundles...
✅ Bundles generated successfully!
🤖 Phase 2: Claude will now verify and improve the configuration...
⏳ This includes checking bundle quality and fixing any issues...
[100.7s] Perfect! The configuration has been optimized. Here's a summary of the improvements made:
[100.7s]
[100.7s] ## ✅ Configuration Verified and Improved
[100.7s]
[100.7s] ### Changes Made:
[100.7s]
[100.7s] 1. **Removed empty bundles** (52 bytes each):
[100.7s] - Removed 'common-errors' bundle (files don't exist)
[100.7s] - Removed 'best-practices' bundle (files don't exist)
[100.7s]
[100.7s] 2. **Split large examples bundle** (was 9.4M):
[100.7s] - 'examples-markdown.txt' (628K) - Only markdown examples
[100.7s] - 'examples-json-meta.txt' (336K) - Only JSON metadata
[100.7s]
[100.7s] ### Final Bundle Organization:
[100.7s]
[100.7s] **✅ Task-focused bundles (all under 180KB ideal size):**
[100.7s] - API modules: 15K-81K each (perfectly sized)
[100.7s] - Test modules: 5K-15K each
[100.7s] - Scripts: 8K-34K each
[100.7s] - Benchmarks: 17K-26K each
[100.7s] - Config files: 7K
[100.7s] - Quick reference: 37K
[100.7s]
[100.7s] **✅ Reference bundles (under 5MB):**
[100.7s] - examples-markdown: 628K
[100.7s] - examples-json-meta: 336K
[100.7s] - marker_docs: 660K
[100.7s]
[100.7s] **✅ Complete bundle:**
[100.7s] - marker_complete: 10M (expected for full project)
[100.7s]
[100.7s] ### Key Improvements:
[100.7s] 1. All bundles generate without errors
[100.7s] 2. No bundles under 10KB (configuration error indicator)
[100.7s] 3. Task-focused bundles are ideally sized for Claude Code usage
[100.7s] 4. The large JSON example files (5.9M, 1.9M) are excluded from task bundles
[100.7s] 5. Clear, logical organization by module and purpose
[100.7s] 6. Each bundle serves a distinct purpose without excessive overlap
[100.7s]
[100.7s] The M1F configuration is now optimal for the Marker project! All bundles are appropriately sized and organized for efficient use with Claude Code a...
✅ Claude completed in 100.9s
✅ Phase 2 complete: Configuration verified and improved!
🚀 Next steps:
• Your .m1f.config.yml has been created and verified
• Run 'm1f-update' to regenerate bundles with any improvements
• Use topic-specific bundles with your AI tools
Real-World Usage: From Hours to Minutes
Let me show you exactly how intelligent bundles transform analysis time:
Example 1: Security Review (10 minutes)
# Upload these focused bundles to Claude:
- api-01-core.txt (authentication, models)
- api-09-services.txt (external API integrations)
- api-10-config.txt (configuration handling)
# Time taken: 10 minutes
# Result: Complete security analysis with actionable findings
Example 2: Performance Optimization (15 minutes)
# Upload performance-focused bundles:
- benchmarks-01-overall.txt
- api-04-processors-core.txt
- api-02-builders.txt
# Time taken: 15 minutes
# Result: Identified all bottlenecks with benchmark context
Example 3: Adding New Features (20 minutes)
# For adding new document format support:
- quick-reference.txt (understand project structure)
- api-03-converters.txt (existing converters)
- api-06-providers.txt (input handling)
# Time taken: 20 minutes
# Result: Complete implementation plan following existing patterns
The Human-AI Partnership
This is the key insight: m1f doesn’t replace human judgment - it amplifies it.
You decide:
- Which bundles to upload based on your task
- What context the AI needs
- How to prioritize the AI’s attention
The AI gets:
- Complete, coherent code sections
- Related files grouped together
- Just enough context to be helpful
Result: Instead of overwhelming the AI with 3.4M tokens or limiting it to random snippets, you provide exactly what’s needed for the task at hand.
- Optimized bundle sizes for AI consumption
- Removed empty placeholders
- Split oversized bundles
Let’s see the results:
ls -lah --time-style='+%Y-%m-%d %H:%M' | awk 'NR>1 {print $5, $6, $7, $8}' | column -t
4.0K 2025-07-23 11:03 .
4.0K 2025-07-23 11:03 ..
38K 2025-07-23 11:03 api-01-core.txt
54K 2025-07-23 11:03 api-02-builders.txt
17K 2025-07-23 11:03 api-03-converters.txt
81K 2025-07-23 11:03 api-04-processors-core.txt
80K 2025-07-23 11:03 api-05-processors-llm.txt
42K 2025-07-23 11:03 api-06-providers.txt
31K 2025-07-23 11:03 api-07-renderers.txt
70K 2025-07-23 11:03 api-08-schema.txt
22K 2025-07-23 11:03 api-09-services.txt
15K 2025-07-23 11:03 api-10-config.txt
17K 2025-07-23 11:03 benchmarks-01-overall.txt
26K 2025-07-23 11:03 benchmarks-02-methods.txt
22K 2025-07-23 11:03 benchmarks-03-other.txt
6.8K 2025-07-23 11:03 config-files.txt
336K 2025-07-23 11:03 examples-json-meta.txt
628K 2025-07-23 11:03 examples-markdown.txt
51 2025-07-23 10:59 m1f.txt
10M 2025-07-23 11:03 marker_complete.txt
660K 2025-07-23 11:03 marker_docs.txt
797 2025-07-23 11:00 project_analysis_dirlist.txt
7.0K 2025-07-23 11:00 project_analysis_filelist.txt
764 2025-07-23 11:00 project_analysis.log
37K 2025-07-23 11:03 quick-reference.txt
34K 2025-07-23 11:03 scripts-01-cli.txt
7.9K 2025-07-23 11:03 scripts-02-apps.txt
15K 2025-07-23 11:03 tests-01-builders.txt
13K 2025-07-23 11:03 tests-02-processors.txt
5.3K 2025-07-23 11:03 tests-03-other.txt
Fine-Tuning the Configuration
Notice that while individual bundles are small (5K-81K), the complete bundle is still 10MB. Let’s investigate and optimize further using m1f-claude’s interactive mode:
m1f-claude
🤖 m1f-claude Interactive Mode
==================================================
I'll enhance your prompts with m1f knowledge!
Commands: 'help', 'context', 'examples', 'quit', '/e'
💡 Tip: Run 'm1f-link' first for better assistance!
You: check the .m1f.config.yml - the complete project has 10mb, but the bundles are very small, so I guess important files are missing. also look in m1f/
🤖 Claude is thinking...
I'll check the existing .m1f.config.yml file and the m1f directory to understand why the bundles are so small.
Now let me check what's in the m1f directory:
Let me check the size of the complete bundle and see what might be missing:The complete bundle is 10MB. Let me check some of the smaller bundles:
Let me check what's actually in the project to understand what might be missing:The data directory is 11MB! Let me check what's in there:
Now let me check if there are PDF files or other binary files that should be excluded but aren't:The issue is that the 'data' directory contains 11MB of files including images and PDFs that are being included in the complete bundle. Let me update the .m1f.config.yml to exclude these binary files:
Now let me run m1f-update to regenerate the bundles:
Now let me check the new size of the complete bundle:Still 10MB. Let me check what large files are still being included:
Let me check if poetry.lock is being included:
Let me check the bundle structure and find large files:
Let me analyze which files in the bundle are taking up the most space:Found it! The example JSON files are huge - 'thinkpython.json' alone is 6MB. Let me update the config to exclude these:Excellent! The complete bundle is now 1.6MB instead of 10MB. Let me verify the examples-json-meta bundle doesn't include the large files:Good, that's also much smaller now. Let me check all bundle sizes:Perfect! The issue has been resolved. The large example JSON files (thinkpython.json, switch_trans.json, multicolcnn.json) were being included in the bundles, making them unnecessarily large. After adding these files to the global_excludes in .m1f.config.yml:
- 'marker_complete.txt' reduced from 10MB to 1.6MB
- 'examples-json-meta.txt' reduced from 336KB (previously much larger)
- All other bundles remain appropriately sized
The bundles now contain only the actual source code and documentation, excluding:
- Large example JSON files
- Binary files (PDFs, images, etc.)
- Lock files
- Other unnecessary large files
[✅ Response complete]
Claude:
Claude automatically identified and fixed the issue: large JSON example files were bloating the bundles. After optimization, the complete bundle shrank from 10MB to 1.6MB!
Creating Custom Bundles
Sometimes you need specific views of the codebase. Let’s create a code-only bundle:
You: Great. can you also generate a bundle where all program files are included? (only program files no documentation)
🤖 Claude is thinking...
I'll create a new bundle that includes only program files (Python code) without documentation. Let me add this to the .m1f.config.yml:
Now I'll add the code-only bundle:
Now let me regenerate the bundles:
Let me check the size and content of the new code-only bundle:
Let me verify what files are included in the code-only bundle:Perfect! The code-only bundle has been created with:
- Size: 473KB
- Contains all Python source files from the main codebase
- Excludes tests, benchmarks, data, docs, and examples directories
- Includes the main conversion scripts, marker package modules, and web apps
This bundle contains only the program files (Python code) without any documentation, test files, or example data.
[✅ Response complete]
Let’s check the token count of our new code-only bundle:
m1f-token-counter marker_code_only.txt
The file 'marker_code_only.txt' contains approximately 101844 tokens (using 'cl100k_base' encoding).
Perfect Size
Now we have a manageable 102K token bundle containing just the code - ideal for AI analysis!
Putting It All Together: Real-World Analysis
With our optimized bundles, we can now perform meaningful analysis. Here are practical examples of questions developers actually ask:
Understanding Core Features
claude -p "Read marker_code_only.txt - How does the LLM integration work with Gemini/Claude for table merging across pages? Show me the implementation details."
Finding Specific Implementations
claude -p "Read marker_code_only.txt - Where is the table detection algorithm implemented? I need to understand how it identifies table boundaries in PDFs."
Extending the Codebase
claude -p "Read marker_code_only.txt - How would I add support for a new document format like RTF? What interfaces need to be implemented?"
Security Analysis
claude -p "Read marker_code_only.txt - Are there security risks using this program? What are the strengths and weaknesses?"
The result of the security query:
Security Analysis Result
The marker codebase is a well-architected PDF-to-markdown conversion tool with good structural design but several security vulnerabilities that need addressing, particularly around input validation, resource management, and protection against malicious PDF files.
The Power of m1f: Key Benefits
- Intelligent Organization: Claude automatically understands your project structure and creates logical bundles
- Token Optimization: Bundles are sized appropriately for AI model limits
- Flexible Views: Create custom bundles for specific analysis needs (code-only, docs-only, tests, etc.)
- Iterative Refinement: Use m1f-claude to fine-tune configurations and fix issues
- AI-Ready: Output files are perfectly formatted for consumption by Claude, GPT-4, or other LLMs
Real-World Applications
Beyond security reviews, m1f enables numerous use cases:
- Code Understanding: Quickly grasp how unfamiliar codebases work
- Technical Documentation: Generate comprehensive docs from code
- Architecture Analysis: Understand system design and dependencies
- Migration Planning: Assess effort required to port or upgrade projects
- Code Quality Review: Identify patterns, anti-patterns, and improvement opportunities
- Onboarding: Help new team members understand existing codebases faster
Conclusion: Get Your Time Back
The 224-file Marker project? Fully analyzed in 15 minutes. That includes understanding the architecture, identifying key components, and getting actionable insights.
Here’s what m1f enables:
- Complete codebase analysis: 15-20 minutes
- Security reviews: 10 minutes
- Performance audits: 15 minutes
- Architecture understanding: 10 minutes
- Feature implementation planning: 20 minutes
The key is intelligent bundling. Instead of overwhelming AI with 3.4 million tokens or cherry-picking random files, m1f creates focused bundles that give AI exactly the context it needs. You control what goes where, and the AI delivers targeted insights fast.
Next time you face a new codebase, you know the drill: m1f-init
, create bundles, analyze. 15 minutes later, you’re ready to code.
Get Started with m1f
m1f Documentation
Learn all about m1f's features and capabilities
GitHub Repository
Explore the source code and contribute
Quick Start Guide
Get up and running with m1f in minutes
Advanced Bundling
Master custom bundle configurations
Speaking of Security - Is m1f Itself Secure?
Curious about m1f’s security? Just ask Claude:
claude -p "Read m1f/m1f/99_m1f_complete.txt - this is the complete codebase of m1f in one file. Analyze it and provide a concise review (1-2 paragraphs) covering: Key strengths and good practices | Major weaknesses or areas for improvement | Critical security vulnerabilities or risks | Focus on the most significant findings and their potential impact."
The beauty of m1f is that you can always verify its security yourself - the entire codebase is transparent and AI-analyzable. Try it with your preferred AI assistant!
Related Posts
The Irony of Claude Code: When AI Needs to Google Itself
A tale of how a multi-billion dollar AI needs to look up its own documentation online – and how m1f helps solve this existential crisis.
Introducing m1f: Your AI's Bridge to Complex Codebases
Transform thousands of project files into AI-ready bundles. Because uploading files one-by-one to ChatGPT isn't how we should work in 2025.