CLI & Config
HTML2MD Test Suite Documentation
Comprehensive test suite for validating the html2md converter with real-world HTML structures
html2md testing development quality-assurance
HTML2MD Test Suite Documentation
A comprehensive test suite for validating the html2md converter (v2.0.0) with challenging real-world HTML structures.
Overview
The HTML2MD test suite provides a robust testing framework consisting of:
- A Flask-based web server serving complex HTML test pages
- Comprehensive pytest test cases covering all conversion features including async operations
- Real-world documentation examples with challenging HTML structures
- Automated test runner with coverage reporting
- Full support for testing async/await patterns and parallel processing
Architecture
tests/
├── html2md_server/
│ ├── server.py # Flask test server
│ ├── requirements.txt # Test suite dependencies
│ ├── run_tests.sh # Automated test runner
│ ├── README.md # Test suite documentation
│ ├── static/
│ │ ├── css/
│ │ │ └── modern.css # Modern CSS with dark mode
│ │ └── js/
│ │ └── main.js # Interactive features
│ └── test_pages/
│ ├── index.html # Test suite homepage
│ ├── m1f-documentation.html
│ ├── html2md-documentation.html
│ ├── complex-layout.html
│ ├── code-examples.html
│ └── ... # Additional test pages
└── test_html2md_server.py # Pytest test cases
Test Server
Features
- Modern Flask-based web server
- RESTful API endpoints for test page discovery
- CORS enabled for cross-origin testing
- Dynamic page generation support
- Static asset serving with proper MIME types
Running the Server
# Start server on default port 8080
python tests/html2md_server/server.py
# Server provides:
# - http://localhost:8080/ # Test suite homepage
# - http://localhost:8080/page/{name} # Individual test pages
# - http://localhost:8080/api/test-pages # JSON API
Test Pages
1. M1F Documentation (m1f-documentation.html
)
Tests real documentation conversion with:
- Complex heading hierarchies
- Code examples in multiple languages
- Nested structures and feature grids
- Command-line documentation tables
- Advanced layout with inline styles
2. HTML2MD Documentation (html2md-documentation.html
)
Comprehensive documentation page testing:
- Multi-level navigation structures
- API documentation with code examples
- Complex tables and option grids
- Details/Summary elements
- Sidebar navigation
3. Complex Layout Test (complex-layout.html
)
CSS layout challenges:
- Flexbox layouts: Multi-item flex containers
- CSS Grid: Complex grid with spanning items
- Nested structures: Up to 4 levels deep
- Positioning: Absolute, relative, sticky elements
- Multi-column layouts: CSS columns with rules
- Masonry layouts: Pinterest-style card layouts
- Overflow containers: Scrollable areas
4. Code Examples Test (code-examples.html
)
Programming language support:
- Languages tested: Python, TypeScript, JavaScript, Bash, SQL, Go, Rust
- Inline code: Mixed with regular text
- Code with special characters: HTML entities, Unicode
- Configuration files: YAML, JSON examples
- Edge cases: Empty blocks, long lines, whitespace-only
5. Additional Test Pages (Planned)
- Edge Cases: Malformed HTML, special characters
- Modern Features: HTML5 elements, web components
- Tables and Lists: Complex nested structures
- Multimedia: Images, videos, iframes
Test Suite Features
Content Selection Testing
# Test CSS selector-based extraction (v2.0.0 async API)
from tools.html2md.api import HTML2MDConverter
import asyncio
converter = HTML2MDConverter(
outermost_selector="article",
ignore_selectors=["nav", ".sidebar", "footer"]
)
# Async conversion
result = asyncio.run(converter.convert_file("test.html"))
Code Block Detection
- Automatic language detection from class names
- Preservation of syntax highlighting hints
- Special character handling in code
Layout Preservation
- Nested structure maintenance
- List hierarchy preservation
- Table structure conversion
- Heading level consistency
Edge Case Handling
- Empty HTML documents
- Malformed HTML structures
- Very long lines
- Unicode and special characters
- Missing closing tags
Running Tests
Quick Start
# Run all tests with the automated script
./tests/html2md_server/run_tests.sh
# This will:
# 1. Install dependencies
# 2. Start the test server
# 3. Run all pytest tests
# 4. Generate coverage report
# 5. Clean up processes
Manual Testing
# Install dependencies
pip install -r tests/html2md_server/requirements.txt
# Start server in one terminal
python tests/html2md_server/server.py
# Run tests in another terminal
pytest tests/test_html2md_server.py -v
# Run with coverage
pytest tests/test_html2md_server.py --cov=tools.html2md_tool --cov-report=html
Test Options
# Run specific test
pytest tests/test_html2md_server.py::TestHTML2MDConversion::test_code_examples -v
# Run with detailed output
pytest tests/test_html2md_server.py -vv -s
# Run only fast tests
pytest tests/test_html2md_server.py -m "not slow"
Test Coverage
Core Features Tested
- ✅ Basic HTML to Markdown conversion
- ✅ Async I/O operations with aiofiles
- ✅ CSS selector content extraction
- ✅ Element filtering with ignore selectors
- ✅ Complex nested HTML structures
- ✅ Code block language detection
- ✅ Table conversion (simple and complex)
- ✅ List conversion (ordered, unordered, nested)
- ✅ Special characters and HTML entities
- ✅ Unicode support
- ✅ YAML frontmatter generation
- ✅ Heading level offset adjustment
- ✅ Parallel processing with asyncio
- ✅ Configuration file loading (YAML/TOML)
- ✅ CLI argument parsing
- ✅ API mode for programmatic access
- ✅ HTTrack integration (when available)
- ✅ URL conversion from lists
Performance Testing
- Parallel conversion of multiple files
- Large file handling
- Memory usage monitoring
- Conversion speed benchmarks
Writing New Tests
Adding Test Pages
- Create HTML file in
tests/html2md_server/test_pages/
- Register in
server.py
:TEST_PAGES = { 'your-test': { 'title': 'Your Test Title', 'description': 'What this tests' } }
- Add corresponding test case
Test Case Structure
class TestYourFeature:
async def test_your_feature(self, test_server, temp_output_dir):
"""Test description."""
from tools.html2md.api import HTML2MDConverter
converter = HTML2MDConverter(
outermost_selector="main",
ignore_selectors=["nav", "footer"],
add_frontmatter=True
)
# Perform async conversion
results = await converter.convert_directory(
f"{test_server.base_url}/page",
temp_output_dir
)
# Assert expected results
assert len(results) > 0
Continuous Integration
GitHub Actions Integration
# .github/workflows/test.yml
- name: Run HTML2MD Tests
run: |
cd tests/html2md_server
./run_tests.sh
Local Development
# Watch mode for development
pytest-watch tests/test_html2md_server.py
# Run with debugging
pytest tests/test_html2md_server.py --pdb
Troubleshooting
Common Issues
Server won’t start
- Check if port 8080 is already in use
- Ensure Flask dependencies are installed
- Check Python version (3.9+ required)
Tests fail with connection errors
- Ensure server is running
- Check firewall settings
- Verify localhost resolution
Coverage report issues
- Install pytest-cov:
pip install pytest-cov
- Ensure tools.html2md module is in Python path
- For async tests, use pytest-asyncio:
pip install pytest-asyncio
Future Enhancements
-
Additional Test Pages
- SVG content handling
- MathML equations
- Microdata and structured data
- Progressive web app features
- WebAssembly integration tests
- Shadow DOM content extraction
-
Test Automation
- Visual regression testing
- Performance benchmarking
- Memory leak detection
- Cross-platform testing
-
Enhanced Reporting
- HTML test reports with screenshots
- Conversion diff visualization
- Performance metrics dashboard
Contributing
To contribute to the test suite:
- Identify untested scenarios
- Create representative HTML test pages
- Write comprehensive test cases
- Document the test purpose
- Submit PR with test results
The test suite aims to cover all real-world HTML conversion scenarios to ensure robust and reliable Markdown output.
- Previous
- m1f Preset System Complete Reference
- Next
- Version Management