How I Analyze 200+ File Codebases in Minutes, Not Hours

Using m1f's Intelligent Bundles to Speed Up Code Analysis by 10x

ai development tools productivity workflow
Sound off
Oh, a new tool - let's deploy it right away! Oh, 3 security vulnerabilities found in your codebase!?

Last week, I analyzed a 224-file Python codebase in 15 minutes. Here’s exactly how I did it using m1f’s intelligent bundles to give AI precisely the context it needs for fast, comprehensive analysis.

Here’s the brutal truth about analyzing large codebases: You’re either spending hours navigating through hundreds of files, trying to understand how everything connects, or you’re feeding your AI random snippets and getting shallow, incomplete answers. Most developers are stuck with this lose-lose situation, burning precious time on what should be a straightforward task.

Enter m1f (make 1 file) - a tool that transforms how we analyze codebases by creating intelligent, topic-specific bundles. Instead of drowning in file navigation or overwhelming your AI with millions of tokens, m1f helps you go from “I need to understand this codebase” to “I have actionable insights” in minutes, not hours.

In this post, I’ll walk you through my exact workflow using the Marker project - a 224-file Python tool for PDF conversion - as a real-world example. You’ll see every step and how long each one takes.

My 15-Minute Analysis Workflow

The Complete Timeline

Here’s exactly how I analyze a 200+ file codebase:

  • Minutes 1-2: Run m1f-init to create initial bundles
  • Minutes 3-5: Run m1f-claude --setup for intelligent bundle creation
  • Minutes 5-6: Generate the bundles with m1f-update
  • Minutes 7-15: Upload specific bundles to Claude and get insights

Total time: 15 minutes from git clone to actionable insights

The Challenge: Why Manual Analysis Fails

When you try to analyze a codebase manually, you’re fighting multiple battles:

  • Navigation Overhead: Jumping between 200+ files takes forever
  • Mental Context Switching: Each file jump breaks your flow
  • Missing Connections: You can’t hold the entire dependency graph in your head
  • AI Token Limits: You end up cherry-picking files, missing crucial context
  • Repetitive Work: Every new analysis starts from scratch

Getting Started with m1f

Let’s clone the Marker project and initialize m1f:

bash
git clone git@github.com:datalab-to/marker.git
cd marker
m1f-init
← Swipe to see more →

Here’s what m1f does automatically:

text
🚀 m1f Project Initialization
==================================================

📋 Setting up m1f documentation...
✅ Created symlink: m1f/m1f.txt -> /home/m1f-user/git/m1f/m1f/m1f/87_m1f_only_docs.txt
✅ Git repository detected in current directory
⚠️   No m1f configuration found - will create one

📊 Project Analysis
==============================
Analyzing project structure...
✅ Found 224 files in 43 directories
📁 Project Type: Python Project
💻 Programming Languages: Python (208 files)

📦 Creating Initial Bundles
==============================
Creating complete project bundle...
✅ Created: m1f/marker_complete.txt
Creating documentation bundle...
✅ Created: m1f/marker_docs.txt

✏️ Creating .m1f.config.yml...
✅ Configuration created: .m1f.config.yml

✅ Quick Setup Complete!

📁 Here are your files:

 • m1f/m1f.txt (symlink)
 • m1f/marker_complete.txt
 • m1f/marker_docs.txt
 • .m1f.config.yml

📌 Next Steps:
1. Use 'm1f-update' to regenerate bundles after changes
2. Reference @m1f/m1f.txt in AI tools for m1f documentation
3. Preview your bundle: cat m1f/marker_complete.txt | head -50

🚀 Additional Setup Available!
For topic-specific bundles (components, API, tests, etc.), run:
m1f-claude --setup

This will:
• Analyze your project structure in detail
• Create focused bundles for different aspects
• Optimize configuration for your project type
← Swipe to see more →

Within seconds, m1f has analyzed the entire project structure and created two initial bundles. But how large are these files? Let’s check:

Sound off
Oh, the codebase has only 10 MBs - let' read it right away!

Understanding Token Limits: Why 90% Gets Left Behind

Remember our headline question? Here’s the proof. AI models have strict token limits, and most codebases far exceed them:

bash
m1f-token-counter marker_complete.txt
← Swipe to see more →
text
The file 'marker_complete.txt' contains approximately 3385753 tokens (using 'cl100k_base' encoding).
← Swipe to see more →

Token Limit Reality Check

That’s 3.4 million tokens! Here’s what each AI can actually see:

  • Google Gemini Pro 2.5 (1M tokens): 31% of the codebase - still missing 69%!
  • Claude Opus 4 Code (256k): Only 7.5% of the codebase
  • Claude Web (5MB): ~15-20% at best
  • ChatGPT o3 (200k): Just 5.9% of the codebase
  • Grok 4 (256k): Only 7.5% of the codebase

Even the best AI model misses 70% of your code. Most miss over 90%!

The Hidden Problem: AI Prioritizes Without Your Context

But here’s an even bigger issue that token limits don’t solve: Even if an AI could process your entire codebase, it will prioritize based on its own logic - which rarely aligns with your specific needs.

The Prioritization Challenge

Imagine dumping 3.4 million tokens on an AI and asking “Find the security issues.” The AI will:

  • Apply its own prioritization criteria without explaining why
  • Might focus on minor style issues while missing critical auth flaws
  • Analyze test files and documentation with the same intensity as core logic
  • Produce a response based on patterns it thinks are important - not what YOU need

The AI doesn’t tell you how or why it prioritized certain areas - and worse, its priorities might lead you in the wrong direction. That’s exactly what m1f solves by letting YOU control the focus.

Let’s check the documentation bundle:

bash
m1f-token-counter marker_docs.txt
← Swipe to see more →
text
The file 'marker_docs.txt' contains approximately 162861 tokens (using 'cl100k_base' encoding).
← Swipe to see more →

Pro Tip

Still 163K tokens - quite large. This is where m1f’s intelligent bundling comes into play.

The Solution: Intelligent, Prioritized Bundles

This is where m1f’s true power shines. Instead of dumping everything on the AI, m1f creates intelligent, topic-specific bundles that group related code together. You maintain control over what the AI sees and in what context.

Automatic Bundle Creation

Let’s use Claude to intelligently split the project into focused bundles:

bash
m1f-claude --setup
← Swipe to see more →
text
🤖 m1f Setup with Claude
==================================================

This command adds topic-specific bundles to your existing m1f setup.

✅ Prerequisites:
• Run 'm1f-init' first to create basic bundles
• Claude Code must be installed
• .m1f.config.yml should exist


📝 Project Information
==================================================
Please provide some information about your project to help create better bundles.

📋 Project Description
Describe your project briefly (what it does, main technologies):
> its a tool for converting pdfs to markdown

🎯 Project Priorities
What's important for this project? (e.g., performance, security, maintainability, documentation):
> documentation

✅ Git repository detected: /home/m1f-user/tmp/marker
✅ m1f documentation available
✅ m1f configuration found: .m1f.config.yml
✅ Claude Code is available

📊 Project Analysis
==============================
Analyzing project structure...
📄 Created file list: project_analysis_filelist.txt
📁 Created directory list: project_analysis_dirlist.txt
✅ Found 224 files in 43 directories
📁 Project Type: Not specified
💻 Languages: Python (208 files)

📝 User Description: its a tool for converting pdfs to markdown
🎯 User Priorities: documentation

🤖 Creating Topic-Specific Bundles
──────────────────────────────────────────────────
Claude will analyze your project and create focused bundles.

🤖 Sending to Claude Code...
⏳ Claude will now analyze your project and create topic-specific bundles...

⚠️   IMPORTANT: This process may take 1-3 minutes as Claude:
 • Reads and analyzes all project files
 • Understands your project structure
 • Creates intelligent bundle configurations

🔄 Please wait while Claude works...

[ 92.7s] I've successfully created topic-specific bundles for the Marker PDF to Markdown conversion project. Here's what I've created:
[ 92.7s]
[ 92.7s] ## 📦 Created Bundles (27 total):
[ 92.7s]
[ 92.7s] ### 🚀 Quick Start & Reference
[ 92.7s] - **quick-reference.txt** - README and main conversion scripts
[ 92.7s]
[ 92.7s] ### 🔧 Core API Bundles
[ 92.7s] - **api-01-core.txt** - Core settings, models, logger, utilities
[ 92.7s] - **api-02-builders.txt** - Document builders (layout, OCR, structure)
[ 92.7s] - **api-03-converters.txt** - Format converters (PDF, table, extraction)
[ 92.7s] - **api-04-processors-core.txt** - Core text processors (non-LLM)
[ 92.7s] - **api-05-processors-llm.txt** - LLM-powered processors
[ 92.7s] - **api-06-providers.txt** - Input providers (PDF, HTML, image)
[ 92.7s] - **api-07-renderers.txt** - Output renderers (Markdown, HTML, JSON)
[ 92.7s] - **api-08-schema.txt** - Data structures and document models
[ 92.7s] - **api-09-services.txt** - External service integrations (OpenAI, Claude, Gemini)
[ 92.7s] - **api-10-config.txt** - Configuration system
[ 92.7s]
[ 92.7s] ### 📜 Scripts & Applications
[ 92.7s] - **scripts-01-cli.txt** - CLI conversion scripts
[ 92.7s] - **scripts-02-apps.txt** - Web applications (Streamlit, extraction app)
[ 92.7s]
[ 92.7s] ### 🧪 Test Bundles
[ 92.7s] - **tests-01-builders.txt** - Builder module tests
[ 92.7s] - **tests-02-processors.txt** - Processor module tests
[ 92.7s] - **tests-03-other.txt** - Tests for converters, providers, renderers
[ 92.7s]
[ 92.7s] ### 📊 Benchmarks
[ 92.7s] - **benchmarks-01-overall.txt** - Overall performance benchmarks
[ 92.7s] - **benchmarks-02-methods.txt** - Conversion methods and scorers
[ 92.7s] - **benchmarks-03-other.txt** - Table conversion and throughput tests
[ 92.7s]
[ 92.7s] ### 📁 Other Bundles
[ 92.7s] - **config-files.txt** - Configuration files (pyproject.toml, etc.)
[ 92.7s] - **examples.txt** - Example markdown and JSON outputs
[ 92.7s] - **common-errors.txt** - Placeholder for error documentation
[ 92.7s] - **best-practices.txt** - Placeholder for best practices guide
[ 92.7s]
[ 92.7s] Each bundle is designed to be under 180KB for optimal Claude Code performance. The bundles are organized by logical modules, making it easy to prov...
[ 92.7s]
[ 92.7s] To generate these bundles, run:
[ 92.7s]   m1f-update
[ 92.7s]
[ 92.7s] Or to generate a specific bundle:
[ 92.7s]   m1f auto-bundle api-01-core

✅ Claude completed in 92.9s

✅ Phase 1 complete: Topic-specific bundles added!
📝 Claude has analyzed your project and updated .m1f.config.yml
← Swipe to see more →

The Power of Intelligent Bundling

Look at what just happened! m1f didn’t just split files randomly. It:

  • Grouped related functionality (all builders together, all processors together)
  • Separated concerns (tests separate from implementation, benchmarks isolated)
  • Created logical entry points (quick-reference.txt for getting started)
  • Maintained relationships (API bundles numbered to show dependencies)

Now you can tell your AI: “Review the authentication flow” and upload just api-01-core.txt and api-09-services.txt - giving it exactly what it needs, nothing more.

Let’s continue setting up:

text
🔄 Phase 2: Generating bundles and verifying configuration...
⏳ Running m1f-update to generate bundles...
✅ Bundles generated successfully!

🤖 Phase 2: Claude will now verify and improve the configuration...
⏳ This includes checking bundle quality and fixing any issues...

[100.7s] Perfect! The configuration has been optimized. Here's a summary of the improvements made:
[100.7s]
[100.7s] ## ✅ Configuration Verified and Improved
[100.7s]
[100.7s] ### Changes Made:
[100.7s]
[100.7s] 1. **Removed empty bundles** (52 bytes each):
[100.7s]    - Removed 'common-errors' bundle (files don't exist)
[100.7s]    - Removed 'best-practices' bundle (files don't exist)
[100.7s]
[100.7s] 2. **Split large examples bundle** (was 9.4M):
[100.7s]    - 'examples-markdown.txt' (628K) - Only markdown examples
[100.7s]    - 'examples-json-meta.txt' (336K) - Only JSON metadata
[100.7s]
[100.7s] ### Final Bundle Organization:
[100.7s]
[100.7s] **✅ Task-focused bundles (all under 180KB ideal size):**
[100.7s] - API modules: 15K-81K each (perfectly sized)
[100.7s] - Test modules: 5K-15K each
[100.7s] - Scripts: 8K-34K each
[100.7s] - Benchmarks: 17K-26K each
[100.7s] - Config files: 7K
[100.7s] - Quick reference: 37K
[100.7s]
[100.7s] **✅ Reference bundles (under 5MB):**
[100.7s] - examples-markdown: 628K
[100.7s] - examples-json-meta: 336K
[100.7s] - marker_docs: 660K
[100.7s]
[100.7s] **✅ Complete bundle:**
[100.7s] - marker_complete: 10M (expected for full project)
[100.7s]
[100.7s] ### Key Improvements:
[100.7s] 1. All bundles generate without errors
[100.7s] 2. No bundles under 10KB (configuration error indicator)
[100.7s] 3. Task-focused bundles are ideally sized for Claude Code usage
[100.7s] 4. The large JSON example files (5.9M, 1.9M) are excluded from task bundles
[100.7s] 5. Clear, logical organization by module and purpose
[100.7s] 6. Each bundle serves a distinct purpose without excessive overlap
[100.7s]
[100.7s] The M1F configuration is now optimal for the Marker project! All bundles are appropriately sized and organized for efficient use with Claude Code a...

✅ Claude completed in 100.9s

✅ Phase 2 complete: Configuration verified and improved!

🚀 Next steps:
• Your .m1f.config.yml has been created and verified
• Run 'm1f-update' to regenerate bundles with any improvements
• Use topic-specific bundles with your AI tools
← Swipe to see more →

Real-World Usage: From Hours to Minutes

Let me show you exactly how intelligent bundles transform analysis time:

Example 1: Security Review (10 minutes)

bash
# Upload these focused bundles to Claude:
- api-01-core.txt (authentication, models)
- api-09-services.txt (external API integrations)  
- api-10-config.txt (configuration handling)

# Time taken: 10 minutes
# Result: Complete security analysis with actionable findings
← Swipe to see more →

Example 2: Performance Optimization (15 minutes)

bash
# Upload performance-focused bundles:
- benchmarks-01-overall.txt
- api-04-processors-core.txt
- api-02-builders.txt

# Time taken: 15 minutes
# Result: Identified all bottlenecks with benchmark context
← Swipe to see more →

Example 3: Adding New Features (20 minutes)

bash
# For adding new document format support:
- quick-reference.txt (understand project structure)
- api-03-converters.txt (existing converters)
- api-06-providers.txt (input handling)

# Time taken: 20 minutes
# Result: Complete implementation plan following existing patterns
← Swipe to see more →

The Human-AI Partnership

This is the key insight: m1f doesn’t replace human judgment - it amplifies it.

You decide:

  • Which bundles to upload based on your task
  • What context the AI needs
  • How to prioritize the AI’s attention

The AI gets:

  • Complete, coherent code sections
  • Related files grouped together
  • Just enough context to be helpful

Result: Instead of overwhelming the AI with 3.4M tokens or limiting it to random snippets, you provide exactly what’s needed for the task at hand.

  • Optimized bundle sizes for AI consumption
  • Removed empty placeholders
  • Split oversized bundles

Let’s see the results:

bash
ls -lah --time-style='+%Y-%m-%d %H:%M' | awk 'NR>1 {print $5, $6, $7, $8}' | column -t
← Swipe to see more →
text
4.0K  2025-07-23  11:03  .
4.0K  2025-07-23  11:03  ..
38K   2025-07-23  11:03  api-01-core.txt
54K   2025-07-23  11:03  api-02-builders.txt
17K   2025-07-23  11:03  api-03-converters.txt
81K   2025-07-23  11:03  api-04-processors-core.txt
80K   2025-07-23  11:03  api-05-processors-llm.txt
42K   2025-07-23  11:03  api-06-providers.txt
31K   2025-07-23  11:03  api-07-renderers.txt
70K   2025-07-23  11:03  api-08-schema.txt
22K   2025-07-23  11:03  api-09-services.txt
15K   2025-07-23  11:03  api-10-config.txt
17K   2025-07-23  11:03  benchmarks-01-overall.txt
26K   2025-07-23  11:03  benchmarks-02-methods.txt
22K   2025-07-23  11:03  benchmarks-03-other.txt
6.8K  2025-07-23  11:03  config-files.txt
336K  2025-07-23  11:03  examples-json-meta.txt
628K  2025-07-23  11:03  examples-markdown.txt
51    2025-07-23  10:59  m1f.txt
10M   2025-07-23  11:03  marker_complete.txt
660K  2025-07-23  11:03  marker_docs.txt
797   2025-07-23  11:00  project_analysis_dirlist.txt
7.0K  2025-07-23  11:00  project_analysis_filelist.txt
764   2025-07-23  11:00  project_analysis.log
37K   2025-07-23  11:03  quick-reference.txt
34K   2025-07-23  11:03  scripts-01-cli.txt
7.9K  2025-07-23  11:03  scripts-02-apps.txt
15K   2025-07-23  11:03  tests-01-builders.txt
13K   2025-07-23  11:03  tests-02-processors.txt
5.3K  2025-07-23  11:03  tests-03-other.txt
← Swipe to see more →

Fine-Tuning the Configuration

Notice that while individual bundles are small (5K-81K), the complete bundle is still 10MB. Let’s investigate and optimize further using m1f-claude’s interactive mode:

bash
m1f-claude
← Swipe to see more →
text
🤖 m1f-claude Interactive Mode
==================================================
I'll enhance your prompts with m1f knowledge!
Commands: 'help', 'context', 'examples', 'quit', '/e'

💡 Tip: Run 'm1f-link' first for better assistance!


You: check the .m1f.config.yml - the complete project has 10mb, but the bundles are very small, so I guess important files are missing. also look in m1f/

🤖 Claude is thinking...
I'll check the existing .m1f.config.yml file and the m1f directory to understand why the bundles are so small.
Now let me check what's in the m1f directory:
Let me check the size of the complete bundle and see what might be missing:The complete bundle is 10MB. Let me check some of the smaller bundles:
Let me check what's actually in the project to understand what might be missing:The data directory is 11MB! Let me check what's in there:
Now let me check if there are PDF files or other binary files that should be excluded but aren't:The issue is that the 'data' directory contains 11MB of files including images and PDFs that are being included in the complete bundle. Let me update the .m1f.config.yml to exclude these binary files:
Now let me run m1f-update to regenerate the bundles:
Now let me check the new size of the complete bundle:Still 10MB. Let me check what large files are still being included:
Let me check if poetry.lock is being included:
Let me check the bundle structure and find large files:
Let me analyze which files in the bundle are taking up the most space:Found it! The example JSON files are huge - 'thinkpython.json' alone is 6MB. Let me update the config to exclude these:Excellent! The complete bundle is now 1.6MB instead of 10MB. Let me verify the examples-json-meta bundle doesn't include the large files:Good, that's also much smaller now. Let me check all bundle sizes:Perfect! The issue has been resolved. The large example JSON files (thinkpython.json, switch_trans.json, multicolcnn.json) were being included in the bundles, making them unnecessarily large. After adding these files to the global_excludes in .m1f.config.yml:

- 'marker_complete.txt' reduced from 10MB to 1.6MB
- 'examples-json-meta.txt' reduced from 336KB (previously much larger)
- All other bundles remain appropriately sized

The bundles now contain only the actual source code and documentation, excluding:
- Large example JSON files
- Binary files (PDFs, images, etc.)
- Lock files
- Other unnecessary large files
[✅ Response complete]
Claude:
← Swipe to see more →

Claude automatically identified and fixed the issue: large JSON example files were bloating the bundles. After optimization, the complete bundle shrank from 10MB to 1.6MB!

Creating Custom Bundles

Sometimes you need specific views of the codebase. Let’s create a code-only bundle:

text
You: Great. can you also generate a bundle where all program files are included? (only program files no documentation)

🤖 Claude is thinking...
I'll create a new bundle that includes only program files (Python code) without documentation. Let me add this to the .m1f.config.yml:
Now I'll add the code-only bundle:
Now let me regenerate the bundles:
Let me check the size and content of the new code-only bundle:
Let me verify what files are included in the code-only bundle:Perfect! The code-only bundle has been created with:
- Size: 473KB
- Contains all Python source files from the main codebase
- Excludes tests, benchmarks, data, docs, and examples directories
- Includes the main conversion scripts, marker package modules, and web apps

This bundle contains only the program files (Python code) without any documentation, test files, or example data.
[✅ Response complete]
← Swipe to see more →

Let’s check the token count of our new code-only bundle:

bash
m1f-token-counter marker_code_only.txt
← Swipe to see more →
text
The file 'marker_code_only.txt' contains approximately 101844 tokens (using 'cl100k_base' encoding).
← Swipe to see more →

Perfect Size

Now we have a manageable 102K token bundle containing just the code - ideal for AI analysis!

Putting It All Together: Real-World Analysis

With our optimized bundles, we can now perform meaningful analysis. Here are practical examples of questions developers actually ask:

Understanding Core Features

bash
claude -p "Read marker_code_only.txt - How does the LLM integration work with Gemini/Claude for table merging across pages? Show me the implementation details."
← Swipe to see more →

Finding Specific Implementations

bash
claude -p "Read marker_code_only.txt - Where is the table detection algorithm implemented? I need to understand how it identifies table boundaries in PDFs."
← Swipe to see more →

Extending the Codebase

bash
claude -p "Read marker_code_only.txt - How would I add support for a new document format like RTF? What interfaces need to be implemented?"
← Swipe to see more →

Security Analysis

bash
claude -p "Read marker_code_only.txt - Are there security risks using this program? What are the strengths and weaknesses?"
← Swipe to see more →

The result of the security query:

Security Analysis Result

The marker codebase is a well-architected PDF-to-markdown conversion tool with good structural design but several security vulnerabilities that need addressing, particularly around input validation, resource management, and protection against malicious PDF files.

The Power of m1f: Key Benefits

  1. Intelligent Organization: Claude automatically understands your project structure and creates logical bundles
  2. Token Optimization: Bundles are sized appropriately for AI model limits
  3. Flexible Views: Create custom bundles for specific analysis needs (code-only, docs-only, tests, etc.)
  4. Iterative Refinement: Use m1f-claude to fine-tune configurations and fix issues
  5. AI-Ready: Output files are perfectly formatted for consumption by Claude, GPT-4, or other LLMs

Real-World Applications

Beyond security reviews, m1f enables numerous use cases:

  • Code Understanding: Quickly grasp how unfamiliar codebases work
  • Technical Documentation: Generate comprehensive docs from code
  • Architecture Analysis: Understand system design and dependencies
  • Migration Planning: Assess effort required to port or upgrade projects
  • Code Quality Review: Identify patterns, anti-patterns, and improvement opportunities
  • Onboarding: Help new team members understand existing codebases faster

Conclusion: Get Your Time Back

The 224-file Marker project? Fully analyzed in 15 minutes. That includes understanding the architecture, identifying key components, and getting actionable insights.

Here’s what m1f enables:

  • Complete codebase analysis: 15-20 minutes
  • Security reviews: 10 minutes
  • Performance audits: 15 minutes
  • Architecture understanding: 10 minutes
  • Feature implementation planning: 20 minutes

The key is intelligent bundling. Instead of overwhelming AI with 3.4 million tokens or cherry-picking random files, m1f creates focused bundles that give AI exactly the context it needs. You control what goes where, and the AI delivers targeted insights fast.

Next time you face a new codebase, you know the drill: m1f-init, create bundles, analyze. 15 minutes later, you’re ready to code.

Get Started with m1f

m1f Documentation

Learn all about m1f's features and capabilities

GitHub Repository

Explore the source code and contribute

Quick Start Guide

Get up and running with m1f in minutes

Advanced Bundling

Master custom bundle configurations

Speaking of Security - Is m1f Itself Secure?

Curious about m1f’s security? Just ask Claude:

bash
claude -p "Read m1f/m1f/99_m1f_complete.txt - this is the complete codebase of m1f in one file. Analyze it and provide a concise review (1-2 paragraphs) covering: Key strengths and good practices | Major weaknesses or areas for improvement | Critical security vulnerabilities or risks | Focus on the most significant findings and their potential impact."
← Swipe to see more →

The beauty of m1f is that you can always verify its security yourself - the entire codebase is transparent and AI-analyzable. Try it with your preferred AI assistant!

Related Posts