Using schema.org tags in search results

rgwin0 · Post by **rgwin0** » Mon Jun 30, 2014 1:53 pm

Our Webinator appliance supports search across hundreds of sites, all managed from our proprietary CMS which defines a bunch of standard "content types" like photos, videos, news releases, events, etc. We would like to train Webinator to recognize these content types so that search results can be filtered or sorted by type. So the general plan is to add some sort of type-identifying markup to the webpages, configure webinator to recognize and store this info somehow as a "type", and then use this parameter in the search interface.

At the moment we are considering marking up our content with schema.org tags because (a) that is an accepted standard with an extensive taxonomy of object types, and (b) it allows multiple object types per page -- e.g. a news release with two embedded photos and a video = four objects on one html page.

Has anyone integrated schema.org markup into a webinator crawl before? Any tips for how we might go about it? Or perhaps some completely different way to accomplish this goal?

Thanks,
Rob

Post by **mark** » Tue Jul 01, 2014 4:57 pm

You'll want to use Data From Field to populate an Additional Field (or parametric field on parametric appliances). If you use <meta> tags you can just grab the data from those. If you use schema.org you'll have to write expressions to extract the desired info from the HTML.

See the manual for how to restrict the search by the Additional or Parametric fields you populated with Data From Field.