Simon Willison
HTML table extractor
context
What happened
Simon Willison built a browser-based utility tool that extracts pasted rich-text tables or Wikipedia pages (via a CORS API) and converts them into clean HTML, Markdown, CSV, TSV, or JSON.
Why it matters
Clean table parsing is a frequent friction point when preparing web data for LLM context windows.
The take
This is a handy utility for context engineering. LLMs handle structured Markdown tables much better than raw, messy HTML. Having clean, open-source tools to pre-process tabular data before feeding it into a prompt context window is highly practical for RAG pipelines.
Do this
Bookmark the tool or review its source code to implement similar clean table-parsing logic in your data ingestion pipelines.
Don't read this site daily. Get it in your inbox.
The daily brief and Sunday deep dive — distilled, scored, and opinionated. For builders only.