Controlled Walking -- problems

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Controlled Walking -- problems

Post by Thunderstone »



Hello - I have been testing webinator (free version) and have some problems
with controlled walking. I use a batch file to automatically go through
different sites as shown below.

actual batch file:
START /WAIT
gw -dSE -w5 -noindex -jhttp://www.wbr.com/nashville/anitacochran/
http://www.wbr.com/nashville/anitacochran/
START /WAIT
gw -dSE -w5 -noindex -jhttp://www.wbr.com/nashville/michaelpeterson/
http://www.wbr.com/nashville/michaelpeterson/
START /WAIT gw -dSE -w5 -noindex -jhttp://www.wbr.com/nashville/travistritt/
http://www.wbr.com/nashville/travistritt/
START /WAIT gw -dSE -w5 -noindex -jhttp://www.wbr.com/steveearle/
http://www.wbr.com/steveearle/index.html
START /WAIT gw -dSE -w5 -noindex -jhttp://www.wbr.com/tishhinojosa/
http://www.wbr.com/tishhinojosa/index.html
START /WAIT gw -dSE -w5 -noindex -jhttp://www.wbr.com/travistritt/
http://www.wbr.com/travistritt/
START /WAIT gw -dSE -w5 -noindex -jhttp://www.wealthcreations.com/tracy/
http://www.wealthcreations.com/tracy/
START /WAIT gw -dSE -w5 -noindex -jhttp://www.webtickets.com/
http://www.webtickets.com/
START /WAIT gw -dSE -w5 -noindex -jhttp://www.westworld.com/~garthbrk/
http://www.westworld.com/~garthbrk/
START /WAIT gw -dSE -w5 -noindex -jhttp://www.westworld.com/~garthbrk/
http://www.westworld.com/~garthbrk/GB.html
START /WAIT gw -dSE -w5 -noindex -jhttp://www.wf.quik.com/cmg/
http://www.wf.quik.com/cmg/jerryc.htm
START /WAIT gw -dSE -w5 -noindex -jhttp://www.wildatheart.com/
http://www.wildatheart.com/
START /WAIT gw -dSE -w5 -noindex -jhttp://www.wildcountry.com/
http://www.wildcountry.com/
START /WAIT gw -dSE -w5 -noindex -jhttp://www.wilma.com/
http://www.wilma.com/
START /WAIT gw -dSE -w5 -noindex -jhttp://www.womenofcountry.com/
http://www.womenofcountry.com/
START /WAIT gw -dSE -w5 -noindex -jhttp://www.worldaccess.nl/~jsomers/
http://www.worldaccess.nl/~jsomers/
START /WAIT gw -dSE -w5 -noindex -jhttp://www.wp.com/MillerCreek/
http://www.wp.com/MillerCreek/birthday.html
START /WAIT gw -dSE -w5 -noindex -jhttp://www.wp.com/MillerCreek/
http://www.wp.com/MillerCreek/cyrus.html
START /WAIT gw -dSE -w5 -noindex -jhttp://www.wsu.edu/~kyoshi/
http://www.wsu.edu/~kyoshi/shania.html
START /WAIT
gw -dSE -w5 -noindex -jhttp://www.wu-wien.ac.at/usr/h92/h9225291/kris/
http://www.wu-wien.ac.at/usr/h92/h92252 ... /kris.html

(note: START /WAIT was utilized so each gw command would completely execute
before starting the next command)

problems:
-- gw doesn't always stay within the -j parameter -
http://www.wbr.com/madonna/ along with many other (non-requested) pages were
walked
-- these non-requested pages were walked within the a gw command nowhere
near what the -j parameter requested for that run - for example, in the log
file: http://www.wbr.com/madonna/ and all connected pages were walked during
the "w -dSE -w5 -noindex -jhttp://www.wf.quik.com/cmg/
http://www.wf.quik.com/cmg/jerryc.htm" command (to me this makes no sense -
if anything, you would think it would be walked during
"gw -dSE -w5 -noindex -jhttp://www.wbr.com/travistritt/
http://www.wbr.com/travistritt/")

please HELP :- (

thanks,
David Repas
SoundMarket (our new name) at http://soundmarket.net





User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Controlled Walking -- problems

Post by Thunderstone »



Gw remembers every Url specified for a given database between
invocations as a valid site for walking. So if a subsequent walk
encounters a link from that site, it will be accepted unless there's
some other restriction.

You need to specify all of your -j options to each invocation. Place
them all in an option file:

jhttp://www.wbr.com/nashville/anitacochran/
jhttp://www.wbr.com/nashville/michaelpeterson/
jhttp://www.wf.quik.com/cmg/

You can also place all of the urls to walk into a list file:

http://www.wbr.com/nashville/anitacochran/
http://www.wbr.com/nashville/michaelpeterson/
http://www.wf.quik.com/cmg/jerryc.htm

You can then use a command line like this to walk everything:
gw -dSE -w5 -mOPTIONSFILE "&LISTFILE"

..
..


justin
Posts: 7
Joined: Sat Jul 01, 2000 4:08 pm

Controlled Walking -- problems

Post by justin »

I am using this technique, but apparently not correctly. Any insight you could give would be appreciated.

My command is:
/usr/local/morph3/bin/gw -d/usr/local/morph3/import/WALKING/TPDB -noindex -m/usr/local/morph3/htdocs/option_files/option_tpdb.txt "&/usr/local/morph3/htdocs/option_files/list_tpdb.txt"

My option file is:
y
jhttp://www.aats.org/
jhttp://www.acc.org/
jhttp://www.americanheart.org/
jhttp://www.ctsnet.org/
jhttp://www.naspe.org/
jhttp://www.sts.org/
jhttp://navigator.tufts.edu/

My URL List file is:
http://www.aats.org/
http://www.acc.org/
http://www.americanheart.org/
http://www.ctsnet.org/
http://www.naspe.org/
http://www.sts.org/
http://navigator.tufts.edu/

The error/output is:
Getting http://209.207.155.98/robots.txt...Not there...Ok.
Getting http://64.58.70.195/robots.txt...Not there...Ok.
Getting http://216.27.8.199/robots.txt...Got it...Ok.
Getting http://209.207.155.118/robots.txt...Got it...Ok.
Getting http://38.15.67.68/robots.txt...Not there...Ok.
Getting http://209.207.155.97/robots.txt...Not there...Ok.
http://www.aats.org/: Disallowed path(i)
http://www.acc.org/: Disallowed path(i)
http://www.americanheart.org/: Disallowed path(i)
http://www.ctsnet.org/: Disallowed path(i)
http://www.naspe.org/: Disallowed path(i)
http://www.sts.org/: Disallowed path(i)
http://navigator.tufts.edu/: Disallowed path(i)
Visited 0 pages total
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Controlled Walking -- problems

Post by mark »

Make sure you don't have any extra leading or trailing spaces in your options file.
Post Reply