Max Page Size Question

Post Reply
dorban
Posts: 31
Joined: Fri Sep 24, 2004 12:39 pm

Max Page Size Question

Post by dorban »

Excuse all the questions, but we recently received the search appliance, and I am working through the settings trying to figure out why the results are not coming back as I expect.

Current issue...

I have been changing one setting at a time, then re-indexing my server. The most results I've been able to get ar 627 files (it's still not going into the option lists, but I'll work on that later). When I set Max Page Size to "-1" (without quotes), which is supposed to allow any file size if I'm reading the instructions correctly, cuts my return result set down to 173.

When I set it to "20000000" (adding a zero to the default) it also returns 173 results.

Shouldn't increasing this option allow it to index more pages?

TIA for any info.
D.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Max Page Size Question

Post by John »

For HTML pages it will typically just truncate the larger pages. Depending on what you have the Max Process Size set to you might want to try a refresh crawl and if it stopped because of a memory size issue that will resume.
John Turnbull
Thunderstone Software
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Max Page Size Question

Post by mark »

That setting should only have an effect if there were pages getting truncated. Your change in number is probably something else. Make sure you're doing a new walk each time while playing with settings rather than a refresh.

The thing to do is turn verbosity to 4 and do a new walk. Then go to list/edit urls. Click submit to get everything. Click on a page to get info about it. Then click on the "Children" link on the info page. That will show what urls were found on the page and indicate which are in the database or not and the reason they are not in the database. That should clue you in fairly quickly about what settings you might need to change.

Another approach is to determine a page you thing should be in the database but isn't. Determine which page on your site links to that page (it's parent). Lookup that parent page in List/Edit urls and check the Children links.
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Max Page Size Question

Post by mark »

Regarding the process size. Just set it to large if you only have one profile walking at any one time.
dorban
Posts: 31
Joined: Fri Sep 24, 2004 12:39 pm

Max Page Size Question

Post by dorban »

I've changed the 2000000 to -1, 2500000, 3000000, 1000000, 20, etc. and I get odd results from what I'd expect.

I still get the most returns with 2000000. When I look at the error file for vortex, I see a lot of pdf files that have been truncated. That's why I've tried increasing the size. I also set process size to "large" since I'm only running this one at a time.

I also have a ton of /dir/dir/post and /dir/dir/undefined errors in the log. I'm not sure what it's looking for since it's only listing the directory root. Any suggestions?

Thanks.
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Max Page Size Question

Post by mark »

The walker will descend down discovered directories under your specified directory. We'd need more specifics on the errors encountered to comment on them.
dorban
Posts: 31
Joined: Fri Sep 24, 2004 12:39 pm

Max Page Size Question

Post by dorban »

Mark, please let me know what you'd like me to send. If you'd like to email with me directly, my address is dorban@firstindustrial.com.

Thanks,
Don
dorban
Posts: 31
Joined: Fri Sep 24, 2004 12:39 pm

Max Page Size Question

Post by dorban »

User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Max Page Size Question

Post by mark »

Those urls were found on one of the pages it walked. They don't exist on your server. Use the techniques described above to find what page links to those urls. Then examine that page for why it links to those pages.
Post Reply