Crawling Dynamic Database Content

Post Reply
lcherry0
Posts: 3
Joined: Wed Jan 03, 2007 1:06 pm

Crawling Dynamic Database Content

Post by lcherry0 »

Is it possible to crawl a dynamic web site using aspx pages via Webinator? If so, how is this done? Do you need a particular version of Webinator to do this?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Crawling Dynamic Database Content

Post by mark »

Yes. Nothing special needed. Just make sure .aspx is in the allowed "extensions" list, remove ? from the default "exclusions", and turn off "strip queries".

If you have a perpetual calendar or any other such thing that just keeps returning more pages forever you should add an exclusion or set max depth to prevent going too deep into those.
lcherry0
Posts: 3
Joined: Wed Jan 03, 2007 1:06 pm

Crawling Dynamic Database Content

Post by lcherry0 »

Yes, I see that it crawls aspx pages with those settings. What if I want to take this a step further and crawl an underlying database? Information from this database is typically returned on the web site when a user enters a zip code using a web form.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Crawling Dynamic Database Content

Post by mark »

Where the data comes from doesn't really matter much. It's all in how it's presented on a web page. You can put a query string into a url to do form inputs. Something like
http://somesite/find.aspx?zip=44107
or, if it insists on method POST,
http-post://somesite/find.aspx?zip=44107

The first url above will work in a browser so you can experiment. The second is a Webinator specific syntax.

If making the database searchable is the primary purpose of your index you should be using the Texis product instead of Webinator.
lcherry0
Posts: 3
Joined: Wed Jan 03, 2007 1:06 pm

Crawling Dynamic Database Content

Post by lcherry0 »

If I used Texis, how would I configure it to index a database?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Crawling Dynamic Database Content

Post by mark »

There are various ways to load data into a Texis database. There's timport (command line and vortex versions), C API, Perl DBD, Java JDBC.

Importing the data into Texis is more precise than scraping web pages and allows specialized schemas and more flexible multi-field searches with various grouping and ordering options. Using Texis gives you full control of the application instead of trying to shoehorn something into Webinator.
Post Reply