Page Size

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Page Size

Post by Thunderstone »



Hi All,

Have a question.

I'm seeing a number of these lines in my gw.log

2000/03/07 02:32:31 Retrieving http://www.mydomain.com/search/jobs/Feb-14-2000/
2000/03/07 02:32:31 Max page size exceeded (truncated) for
http://www.mydomain.com/search/jobs/Feb-14-2000/

I'm noticing that some of the files in these dirs aren't being
indexed. It isn't just this one directory, I have a number of them that
I'm indexing and get the same error. I am not using the -z switch nor is
there a single file in any of the dirs that exceed 10k. Note that all my
files are of type PHP.

Here's my gw command that I used to create with:

/bin/gw -d/www/domains/recruitersonline.com/html/webinator/jobs -L -D2
-fphtml -Fhtml -Fhtm -Ftxt -r -v3 -jhttp://216.234.235.38/search/jobs http://216.234.235.38/search/jobs/

Here's the command that I rewalk with:

/bin/gw -d/www/domains/recruitersonline.com/html/webinator/jobs -rewalk

Thanks much,

Andy Lewis
RON Sys Admin
972-398-0225



User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Page Size

Post by Thunderstone »




But the index page for a _directory_ is exceeding 100k (the default
page size): that's what this error is from, not a file. Since it
gets truncated, the remaining links in that directory are lost, and hence
not walked.

Use -z to increase the page size limit so these large directory indexes
will fit with all their links.

-Kai


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Page Size

Post by Thunderstone »



Not at all.

You need to understand how your webserver works. When there is no
"index.html" or similar file, it generates a directory listing and
returns that as a "document". The directory listing "document" at
http://www.mydomain.com/search/jobs/Feb-14-2000/ is larger than the
default max document size. Try downloading it with your browser and
saving it to see how big it is.



Post Reply