Languages Magazine

The MindMeld API: Introducing the Crawl Manager

By Expectlabs @ExpectLabs

Learn how you can surface contextually relevant content to your users with the power of the MindMeld API Crawl Manager. 

When you’re finished, crawl on over to this screencast that explores the many capabilities of our API Explorer tool.

TRANSCRIPT:

In this video, I will explain how to use the Crawl Manager, which is a browser-based tool that we’ve created to make it easy for you to crawl and index pages on your website or any place that’s accessible on the web. To use it, go to your Management Console, and scroll down to the button that says “Launch Crawl Manager.” Click on the button, which will then open the tool.

To get started crawling your content, go to the “My Domains” section in the upper left-hand corner and click on the plus sign to the right. This will let you add a new domain, which is the base url of the website that you want to have our crawlers start to index. I’m going to enter the Expect Labs company blog, http://blog.expectlabs.com. If you want, you can optionally add a white list or black list. The white list will include additional pages that has an url that matches a pattern that you specify that it wouldn’t normally find if you crawled from the base url, and the black list excludes urls that have specific patterns. Click “Add,” and that’s all you have to do to have the crawler start the crawl. You’ll know it’s working by looking at the status message next to the crawl which says “Crawl in Progress.” Below the “My Domains” section, you’ll see the crawler log which shows you events generated in real-time by our crawling infrastructure that lets you keep track of the progress of the crawl on the back-end.

Now on the right-hand side, in the Crawled Documents section, you’ll see all of the documents that our crawler is finding on your website in real-time. As long as the crawl goes on, you’ll see the new pages populated in that list. And that is the Crawl Manager.


Back to Featured Articles on Logo Paperblog