Not finding words

Post Reply
pete.smith
Posts: 73
Joined: Tue May 17, 2005 2:08 pm

Not finding words

Post by pete.smith »

Hello,

I just took over our thunderstone implementation. Any idea where to solve this problem? I have a small test dir, i have a page that has the word "smarty" on it. I search for term "smarty" page does not come up. I look in live url, the page has not indexed that word, but others on that page. I did a default walk.

Pete
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Not finding words

Post by mark »

Do you mean you looked it up in list/edit urls? If not do so to see what text was extracted from that page.

Are you using any ignore or keep tags or remove common? Perhaps one of those caused it to be stripped. Or is the missing word in the middle of the indexed text?

Is this a public page we can examine? What's the url?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Not finding words

Post by mark »

p.s.
If you can't give the url maybe you can give the html from the page that includes the word smarty and some words around it that are indexed.
pete.smith
Posts: 73
Joined: Tue May 17, 2005 2:08 pm

Not finding words

Post by pete.smith »

Right, the word was not extracted, I am running this walk on defaults - just to see why other people are not finding what they want.

I do list/edit urls, bring up that page, none of these words are "smarty" which is right on that page.

Body: Index of /test/inside2
Parent Directory

css

department.shtml

first.php

homepage.shtml

includes

index.html

killer_menus.html

my_portal.shtml

new_template.shtml

utilities
pete.smith
Posts: 73
Joined: Tue May 17, 2005 2:08 pm

Not finding words

Post by pete.smith »

It is a very simple index.html with lists of links in UL's . We do have our intranet apache server setting to index directories IF they dont have an index file in it. But in this case, its just an index.html with links, and thunderstone didnt slurp the word "Smarty" but it did others on that page. People here are complaining that, even with direct hits pages that they know exist are not being found, and in my simple test I am finding the same thing.
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Not finding words

Post by mark »

Are you doing a file: walk or http:? What's the url of the page (even it it's not reachable) as displayed in the info? What's the HTML of that page. Does that same url look like a directory listing as above in your browser or as something else, such as the contents of the index.html file?
pete.smith
Posts: 73
Joined: Tue May 17, 2005 2:08 pm

Not finding words

Post by pete.smith »

The Baseurl of the walk is:
http://www-internal.mathworks.com/test/

The url in question is:
http://www-internal.mathworks.com/test/inside2/

It does not look like a dir listing, its just a good old fashioned page that I made, of ul's and a's with h2's.

PAGE:

Inside2
Inside2 is the next generation of the MathWorks Intranet.


Initial Pilot Pages
First - A Smarty Page with data from Template
Department/Team Template
Homepage Template
Killer Menus

Utilities in Action
Utilities
Dadabik in Action
Smarty in Action
PhpMyAdmin in Action
PhpWiki in Action
Documentation
EzSql
PhpMyAdmin
Smarty Syntax

Objectives
Environments / Dev - live
Scripting Language that fits
Whip up database stuff
STandard libs of stuff ->Componenes as a monarch like
Indentify good projects->bus case rapid proto requirements
Problems
Security-permissions
QE
Pages->controller-prepend
Page Onwership
Templating (smarty)
Vision
Integration in our Apps (Supp/Rlease/Selling) | Collaboration (project page) | Wiki (blog) | Content Frameworks | Rapid Database Apps | Search

Quickees
Gost

Component List
Javascript Rollovers
The Pieces
PHP
Smarty (template engine)
Dadabik (rapid database dev)
Ezsql (database connectivity classes)
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Not finding words

Post by mark »

It would appear that your webserver is returning different results for the crawler vs. a browser. Is the index.html file readable by everyone, even if they're not logged in or whatever? Try setting the user agent in webinator to something more like a browser

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)
Post Reply