Page 1 of 1

Question re limiting what links are walked...

Posted: Thu Apr 03, 2008 11:14 pm
by legedza.henry
I want to walk a series of different different domains most of which link to documents in our document management system which has an address in the form: www.site.edu.au/dms/docman/.....

I have put in the various individual domains in the BASE URL and have limited Webinator to STAY UNDER each of those domains.

What do I need to do for webinator to index any links which goto the dms site so that all I end up with is the indexed content for each domain plus any links to the dms also indexed.

Thanks
Henry

Question re limiting what links are walked...

Posted: Fri Apr 04, 2008 10:31 am
by mark
Add www.site.edu.au to "Extra Domains" or add
>>=http://www\.site\.edu\.au/dms/docman/
to "Extra URLs REX".

Question re limiting what links are walked...

Posted: Sat Apr 05, 2008 1:26 am
by legedza.henry
I want to index this page http://www.decs.sa.gov.au/policy/pages/ ... icy_index/

When I set it to STAY UNDER with your EXTRA URLS Rex set to our document management system which is www.decs.sa.gov.au/docs it only indexes about 10 pages.

When I turn stay under off it proceeds to index everything.

All I want is to index the links in the A-Z section - I don't want it to wander off into different areas like Home About DECS, STAFF Info etc and start walking them

Question re limiting what links are walked...

Posted: Mon Apr 07, 2008 10:06 am
by jason112
If you do the walk with a high verbosity (4) and then look at your Base URL in List/Edit URLs and click the "Children" link, it should list all the child URLs with the reasons that they weren't walked.

Glancing at a few of the URLs, it appears that they are under many differnet prefixes/sites.

http://www.decs.sa.gov.au/docs/documents/...
http://www.drugstrategy.sa.edu.au/aboutdrugstrategy/...
http://www.decs.sa.gov.au/docs/files/...

If these documents are the only things you're indexing for this profile, you could set "Offsite Pages" to "Y" and "Max Depth" to 1. That would have it "wander" only 1 page into the other links you mentioned ("About DECS", "Staff", etc).