Page 1 of 1
Question re limiting what links are walked...
Posted: Thu Apr 03, 2008 11:14 pm
by legedza.henry
I want to walk a series of different different domains most of which link to documents in our document management system which has an address in the form:
www.site.edu.au/dms/docman/.....
I have put in the various individual domains in the BASE URL and have limited Webinator to STAY UNDER each of those domains.
What do I need to do for webinator to index any links which goto the dms site so that all I end up with is the indexed content for each domain plus any links to the dms also indexed.
Thanks
Henry
Question re limiting what links are walked...
Posted: Fri Apr 04, 2008 10:31 am
by mark
Add
www.site.edu.au to "Extra Domains" or add
>>=http://www\.site\.edu\.au/dms/docman/
to "Extra URLs REX".
Question re limiting what links are walked...
Posted: Sat Apr 05, 2008 1:26 am
by legedza.henry
I want to index this page
http://www.decs.sa.gov.au/policy/pages/ ... icy_index/
When I set it to STAY UNDER with your EXTRA URLS Rex set to our document management system which is
www.decs.sa.gov.au/docs it only indexes about 10 pages.
When I turn stay under off it proceeds to index everything.
All I want is to index the links in the A-Z section - I don't want it to wander off into different areas like Home About DECS, STAFF Info etc and start walking them
Question re limiting what links are walked...
Posted: Mon Apr 07, 2008 10:06 am
by jason112
If you do the walk with a high verbosity (4) and then look at your Base URL in List/Edit URLs and click the "Children" link, it should list all the child URLs with the reasons that they weren't walked.
Glancing at a few of the URLs, it appears that they are under many differnet prefixes/sites.
http://www.decs.sa.gov.au/docs/documents/...
http://www.drugstrategy.sa.edu.au/aboutdrugstrategy/...
http://www.decs.sa.gov.au/docs/files/...
If these documents are the only things you're indexing for this profile, you could set "Offsite Pages" to "Y" and "Max Depth" to 1. That would have it "wander" only 1 page into the other links you mentioned ("About DECS", "Staff", etc).