Using commerical (enterprise) webinator (RedHat 9) we have a program to drop and recreate all the indexes on the html table. Somewhere along the way it appears as though duplicate pages must have gotten into the table as we get thousands of the following error (in the vortex log) when we try to rebuild the unique index on Hash:
178 Aug 27 06:47:11 /webinator/dev/idx_rebuild:67: Trying to insert duplicate value (000000000) in index (temp RAM DBF)
and before that we get one error:
100 Aug 27 06:47:11 /webinator/dev/idx_rebuild:67: Creating Unique index on Non-unique data
If I am correct that this is because some duplicate pages got in somehow (maybe the unique index was not on for one of the walks), then how can I fix this? I thought of just "create new_table as select distinct * from html" and then copy the new table over html -- but the duplicate pages are likely to have some other field that is different (Url, etc.) so that won't work. Any suggestions, or could this be some other problem?
Thanks,
Marcos
178 Aug 27 06:47:11 /webinator/dev/idx_rebuild:67: Trying to insert duplicate value (000000000) in index (temp RAM DBF)
and before that we get one error:
100 Aug 27 06:47:11 /webinator/dev/idx_rebuild:67: Creating Unique index on Non-unique data
If I am correct that this is because some duplicate pages got in somehow (maybe the unique index was not on for one of the walks), then how can I fix this? I thought of just "create new_table as select distinct * from html" and then copy the new table over html -- but the duplicate pages are likely to have some other field that is different (Url, etc.) so that won't work. Any suggestions, or could this be some other problem?
Thanks,
Marcos