Clarify "Data From Fields" Option

Post Reply
velevi
Posts: 42
Joined: Thu Sep 08, 2005 12:21 pm

Clarify "Data From Fields" Option

Post by velevi »

Within "All Walk Settings" there is an option "Data from Field", which is a little obscure.

The description that is in the documentation is not that clear either:

"Data From Field

Syntax: Metamorph query, field to search, what to exclude

This provides more flexible control of what to include and how to include it. It allows getting page information from NON-DEFAULT places by searching and optionally replacing the data. One inclusion per row of controls may be entered; new blank rows will be provided as rows are used. The Search column is where a Metamorph query (ie. a typical search on Webinator) is entered: eg. several keywords or a regular expression. The Replace column is used (optionally) to replace the data obtained from the search. The Meta and Field columns determine what the Query searches: if Meta is non-blank, that named meta field is searched, otherwise the field selected in Field is searched. "
http://www.thunderstone.com/site/webina ... 0000000000

Can you please clarify what "what to include and how to include" refers to? Also, can you clarify what "information from NON-DEFAULT places" means?

So, it seems that this feature takes a search query (i.e. metamorph query; these are equivalent right?) and then manipulates what is returned? How exactly does that affect the search results? Can you give an example?

Thank you for your help!!
User avatar
Kai
Site Admin
Posts: 1272
Joined: Tue Apr 25, 2000 1:27 pm

Clarify "Data From Fields" Option

Post by Kai »

The docs are a typo; `what to include and how to include it' should be `setting the Modify Date, Title or Description fields for searching'. `information from non-default places' means, for example, obtaining the Modify Date from a date string in a <META> tag, instead of the Last-Modified HTTP header (the default place).

This setting does not affect user searches; it merely uses the same query syntax (Metamorph) as user searches. The difference is that instead of returning whole documents as search results for the user, instead just the match itself is returned and placed in the appropriate field (eg. Modify Date, Title or Description). This happens at crawl time, not search time.

For example a Search field of `>><\!--Description:=!-->+' (a REX expression in Metamorph syntax), a Field of `HTML', and a Which Field of `Description' would cause the HTML of pages to be scanned for comments like:

<!--Description: washing instructions -->

and the contents used for the Description of the page, instead looking at the default place of a <META> Description tag.

This can be used to obtain a Description for searching from documents that have a non-standard (ie. non-<META>) method of tagging their descriptions.
velevi
Posts: 42
Joined: Thu Sep 08, 2005 12:21 pm

Clarify "Data From Fields" Option

Post by velevi »

OK, that makes it much much clearer.

So, in summary, this search/replace takes place during the walking part. The specified portion of the document is searched (all of HTML, Meta fields, text only, etc) for the specified "pattern", then that information is, optionally, manipulated (in the "Replace" field) to shape it in the right format, and finally inserted in the appropriate field of the database record for that page (like "Title", "Description", etc).

Otherwise, if none if these search/replacements are used, the Description field is usually filled in with the body text, right? And Title comes from the <title> tag I assume? Or are these strictly taken from the document's metadata?

Thanks!
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Clarify "Data From Fields" Option

Post by mark »

Data from field applies to the walk, not the search. There are a number of attributes about a page that get stored in the database such as modified date, title, and description. Those come from standardized places by default. You can specify non-standard places to look for them. The "search" says how to find it in the downloaded page. "replace" lets you use a subset or rearranged version of what was found. "meta" lets you specify which meta field to get it from (that's not listed in "field"). "field" lets you tell it where to do the search. (an empty search means use the whole field)

default place for Title is the document <title>. etc.
You could tell it to use meta description for title instead if you like.

One of the more common usages is with dynamic pages that don't have modified dates as such according to the http protocol. But the page may include a publish date for the article. You could search the html of the page for the date and use that as the modify date for the page.

So, if the page had
<td>Published:</td><td>3-5-2005</td>
you could use
Search: >><td>Published:</td><td>=\digit+-=\digit+-=\digit+
Replace: \6-\2-\4
Meta: --leave-blank--
Field: HTML
Which Field: Modify Date
velevi
Posts: 42
Joined: Thu Sep 08, 2005 12:21 pm

Clarify "Data From Fields" Option

Post by velevi »

OK! Thanks a lot. I understand it very well now. In the previous posting, I just summarized basically what you had (I guess it's hard to match all of the terminology we individually use; but I feel like we meant the same thing; therefore I have understood everything right).

Your example is good (and a very effective use of the function); I do not understand how did you do the REX groupings and print the date using \6 \4 and \2 in the replacement string. Wouldn't you need to use parentheses to group the date's digits?
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Clarify "Data From Fields" Option

Post by mark »

velevi
Posts: 42
Joined: Thu Sep 08, 2005 12:21 pm

Clarify "Data From Fields" Option

Post by velevi »

Can data be inserted in other page database fields except for Title, Description and Date Modified? Is this customization possible by hand?
User avatar
John
Site Admin
Posts: 2622
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Clarify "Data From Fields" Option

Post by John »

Yes, you can edit the dowalk script to manipulate the data however you want.
John Turnbull
Thunderstone Software
Post Reply