AI Intelligence // signal over noise
← back to feed
Simon Willison

HTML table extractor

context
What happened
Simon Willison built a browser-based utility tool that extracts pasted rich-text tables or Wikipedia pages (via a CORS API) and converts them into clean HTML, Markdown, CSV, TSV, or JSON.
Why it matters
Clean table parsing is a frequent friction point when preparing web data for LLM context windows.
The take

This is a handy utility for context engineering. LLMs handle structured Markdown tables much better than raw, messy HTML. Having clean, open-source tools to pre-process tabular data before feeding it into a prompt context window is highly practical for RAG pipelines.

Do this
Bookmark the tool or review its source code to implement similar clean table-parsing logic in your data ingestion pipelines.
Read the source →

Don't read this site daily. Get it in your inbox.

The daily brief and Sunday deep dive — distilled, scored, and opinionated. For builders only.