my web site includes navigation and privacy statements on most of the pages. What is the best way to exclude repeated data from the database
You could perform a search and replace (sandr) on the records after the crawl using vortex. http://www.thunderstone.com/site/vortexman/node107.html
Or use a scripted crawler where you can strip the information out before inserting it into the database. See ftp://ftp.thunderstone.com/pub/dowalk_beta
for an example scripted crawler.