We are currently testing this tool and will let you know when it is ready for full use!
This web scraper runs entirely in your browser and is perfect for creating training data for AI models. It works by reading the website’s sitemap.xml file, making it particularly well-suited for modern platforms like Squarespace and Shopify that automatically generate sitemaps.
The scraper preserves the structure of your content, including headings, paragraphs, lists, and tables, while removing unnecessary elements like navigation menus and footers. It also captures metadata, images, and PDF documents.
More technical details
This scraper uses a CORS proxy to access websites. Before using it:
                    Visit CORS Anywhere Demo in a new tab
                    Click the button to temporarily enable the demo server
                    Return to this page and start scraping
The scraper will:
                    Read the website’s sitemap.xml to find all pages
                    Process each page while preserving content structure
                    Generate a markdown file with all content
                    Allow you to preview each page’s content before saving
            
            
Ready to start
                    
                    No pages scraped yet

