Thunderstone Support Forums

Posted: **Fri Oct 19, 2007 3:34 pm**

Hi, I am creating a site that contains two URLs. e.g.
http://www.mysite.com/myApplication/ -- this contains all application pages.
http://www.mysite.com/NR/Resource/ -- this contains all the resourse documents that are linked in application pages. e.g. pdf files, doc files.

I put both of URLs as Base URL. Because there are several level subdirectory down at http://www.mysite.com/NR/Resource/, and there are no index page in them, only pdf files, so I have to enable Directory Browsing Option for http://www.mysite.com/NR/Resource/; The problem now is when user types "index of" to search, it will list all the directory index pages, how can I avoid this?
I did a little bit search, and find out modify dowalk script will do the trick, but is there a way that just use configuration? Such as exclude by field?

Posted: **Fri Oct 19, 2007 3:50 pm**

If you're generating the pages, you can add a standard "robots" command to the page that tells Webinator (and other search engines) not to use the page's contents.

Add this in the <head> of the html pages:

<meta name="robots" content="noindex"/>

Posted: **Fri Oct 19, 2007 3:58 pm**

Re-reading I realize you're not _generating_ the index pages yourself. Exclude By Field would probably be the best way to do it based on URL.

The logic is that we want to exclude everything under http://www.mysite.com/NR/Resource/ that ends with a slash.

The rex expression for this should be

http://www.mysite.com/NR/Resource=!http ... urce*/=>>=

set the field to "url", and exclude to "pages only", should do the trick.

Posted: **Fri Oct 19, 2007 3:59 pm**

Thanks for your quick response, but actually those are not real pages, they are generated by webserver to list the sub directories and files in them.

Posted: **Fri Oct 19, 2007 4:02 pm**

They're still "pages" even if they're dynamically generated by the server. Jason's suggestion still applies.

Posted: **Fri Oct 19, 2007 4:28 pm**

Thanks guys, I tried Jason's REX, but it didn't work. I found that the following setting will exclude most of index pages, except the parent directory(http://www.mysite.com/NR/Resource/).
Query: http://www.mysite.com/NR/Resource/
Field: URL
Exclude to: Links Only

The only index page showing will be the top directory, and pdfs are showing fine.

Posted: **Fri Oct 19, 2007 5:17 pm**

Since exlcude by field is a metamorph query, not just rex, you'd have to put a leading / on Jason's expression:

/http://www.mysite.com/NR/Resource=!http ... urce*/=>>=

Or you could look at the html of those pages and pick out something unique to match on.

Thunderstone Support Forums

How to exclude the directory browsing page?

How to exclude the directory browsing page?

How to exclude the directory browsing page?

How to exclude the directory browsing page?

How to exclude the directory browsing page?

How to exclude the directory browsing page?

How to exclude the directory browsing page?

How to exclude the directory browsing page?