Duplicate pages and frames

Post Reply
jgdoke
Posts: 167
Joined: Wed Jul 14, 2004 10:52 am

Duplicate pages and frames

Post by jgdoke »

One of our sites is built with frames. The problem seems to be that if the appliance does not find any information in the <body> </body> area then essentially that page is blank. Then the next frames page it finds it records it as a duplicate. These pages are correctly coded for our site. There is no need for any <noframes> information as our corporate standard browser is IE 5.5.
What can we do to get these url's in the database? the reason is that I need to attach best bets to them. Without turning off duplicates.

Here is the code from two of the pages..

first page:

<html>
<head>
<title>IT</title>

</head>

<frameset rows="68,*" cols="*" frameborder="NO" border="0" framespacing="0">
<frame src="../../site_architecture/top_services.html" name="top" scrolling="NO" noresize >
<frameset rows="*" cols="190,*" framespacing="0" frameborder="NO" border="0">
<frame src="../../site_architecture/left_services_telecom.html" name="left" noresize>
<frame src="../../site_architecture/main_services_telecom.html" name="main">
</frameset>
</frameset>
<noframes>
<body>

</body>
</noframes>
</html>

Second page:

<html>
<head>
<title>human resources</title>

</head>

<frameset rows="68,*" cols="*" frameborder="NO" border="0" framespacing="0">
<frame src="../../human_resources/top.html" name="top" scrolling="NO" noresize >
<frameset rows="*" cols="190,*" framespacing="0" frameborder="NO" border="0">
<frame src="../../human_resources/left.html" name="left" noresize>
<frame src="../../human_resources/main.html" name="main">
</frameset>
</frameset>
<noframes>
<body>

</body>
</noframes>
</html>
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Duplicate pages and frames

Post by mark »

Unless you set max frames to 0 the frames of the page should be treated as one big page. noframes should be ignored.

If you go to list/edit urls for the page that's considered the original for the dups what does it have for text?

You can change what fields are considered when checking for dups. You might add Title to the list.
jgdoke
Posts: 167
Joined: Wed Jul 14, 2004 10:52 am

Duplicate pages and frames

Post by jgdoke »

Max Frames is set at 20.
=================================
Title: Top Index
Description: -None-
Keywords: -None-
Meta data: -None-
Body: -None-
============================================
It does not seem to be getting the text from the frames in the index. Just noticed that.
===========================================
Fields for duplicates does not seem to be in my list. I have version 5.4.4
jgdoke
Posts: 167
Joined: Wed Jul 14, 2004 10:52 am

Duplicate pages and frames

Post by jgdoke »

I just checked and none of the frames content pages show up in the list:
<frame src="../../human_resources/top.html" name="top" scrolling="NO" noresize >
<frame src="../../human_resources/left.html" name="left" noresize>
<frame src="../../human_resources/main.html" name="main">
And searching for text from these pages brings zero results, which tells me they have not been indexed.
Thanks
John
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Duplicate pages and frames

Post by mark »

The frame links wouldn't occur in the database. The main page's url would be in the database. It's content should be that of all of the sub pages.

Is this a page you can point us to?

You could try creating a new profile with default settings, set that url as the base url, set max pages to 1 or 2 and start the walk. Then look under list/edit urls to see how it got processed with default settings.
jgdoke
Posts: 167
Joined: Wed Jul 14, 2004 10:52 am

Duplicate pages and frames

Post by jgdoke »

Sorry No. the main page url is deemed a duplicate of the "blank" first page. Even though this page of frames has much content.
We are in the process of updating our software to the latest from 5.4.4. After that I will test this again.

John
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Duplicate pages and frames

Post by mark »

"main" is somewhat ambiguous. The crawler has no idea what you consider main. The first copy of a page it encounters is the one it keeps. All subsequent ones are discarded.

In any further discussion please provide urls or portions thereof to clarify your meaning.
Post Reply