Page 1 of 1

Problem with data from field

Posted: Wed Nov 28, 2007 4:08 pm
by michel.weber
Hi

I have tried do define an additional field to hold the publication date of documents, but so far no luck.

For testing purposes i have defined 3 additional fields :
<FullTitle> Text Searchable Sortable Output
<DateSelect> Text Searchable Sortable Output
<Extra> Date Searchable Sortable Output

Data is put into fields like data from field :
REX Search Replace From Field From Meta Field To Field
.+ <empty> Meta Field -> date-creation-yyyymmdd DateSelect
.+ <empty> Meta Field -> Last-modified Extra
.+ <empty> Title <empty> FullTitle

For some reason only the fulltitle one works the others are empty although most documents have the specified meta filled in. For ex.
<title>DG of Administration and Logistics - Manuel de catalogage Cataloguing Guide (only available in French)</title>
<meta name="robots" content="all" >
<meta name="description" content="Manuel de catalogage Cataloguing Guide (only available in French)" >
<meta name="keywords" content="ADMIN/TI(2005)9, " >
<meta name="author" content="Council of Europe, DG of Administration and Logistics, Department of Information and Technology" >
<meta name="dimSector" content="secDGAL=DG of Administration and Logistics">
<meta name="dimSectorLevel" content="levDGAL-DIT=Department of Information and Technology">
<meta name="dimDocType" content="docTraining=Guide, Manual, Training material">
<meta name="dimLanguage" content="lanEnglish=English, lanFrench=Français">
<meta name="dimTheme" content="thmTI=Information management and technologies">
<meta name="dimFilingPlan" content="fplHB-libraries=Libraries">

<meta name="date-creation-yyyymmdd" content="20050804" >
<meta name="date-revision-yyyymmdd" content="20070608" >
<meta name="Last-modified" content="Fri, 8 Jun 2007 11:32:56 CEST" >

What's wrong?

Problem with data from field

Posted: Wed Nov 28, 2007 5:44 pm
by mark
"Works for me" as they say. Except that if the server provides "Last-modified" as well there will be 2 and it won't parse correctly into the date field.

Where are you looking to see those fields? I checked in list/edit urls and in the xml output and saw them in both places.

Under Maintenance->tech support info what are your Version and Scripts Version?

Is there a public page where I can see the meta fields you describe above?

Problem with data from field

Posted: Thu Nov 29, 2007 1:50 pm
by michel.weber
The pages are from a public site :
http://www.coe.int/t/congress/wcd/simplesearch_en.asp#
type 'meeting' for example.

I looked in the XML output and the fields are there alright, but except for the fulltitle, they are empty.

As for the scripts, we have the latest version.

Problem with data from field

Posted: Thu Nov 29, 2007 2:51 pm
by mark
The pages from there work well for me with your settings and scripts version 6.2.16. The page I indexed is https://wcd.coe.int/ViewDoc.jsp?id=1216 ... ged=FFC679

Make sure you don't have any typo's in your settings and that there are no extraneous spaces in them.

Problem with data from field

Posted: Thu Nov 29, 2007 2:53 pm
by mark
p.s.
Also make sure those pages have been visited after your settings changes either by being due for refresh during a refresh walk or doing a full new walk.

Problem with data from field

Posted: Thu Nov 29, 2007 3:05 pm
by Kai
The latest released scripts are texisScripts 6.2.16; is this the version you have? I've run a test crawl with your fields against https://wcd.coe.int/ViewDoc.jsp?id=1216 ... ged=FFC679 and all 3 were populated (as viewed from List/Edit URLs).

Open a tech support ticket (top menu on message board) describing the issue, and attach an (HTML) copy of All Walk Settings, Maintenance -> Tech Support Info, and an example (public) URL, and we'll investigate further.

Problem with data from field

Posted: Mon Dec 03, 2007 9:14 am
by michel.weber
Hi

I'll be away the whole week, so you might not get any feedback for a while, i'll ask Cyril to open a ticket in the mean-time.

Just to answer your questions :

The walk i was running was a test and i always do a 'new' type walk.

The scripts are the latest :

Thunderstone Information
Version: Search Appliance Server Version 5.01.1191613536 20071005 (i686-unknown-linux2.4.9-64-32)
Scripts Version: 6.2.16
Details: dowalk: 6.2.16/2.509 dowalk: 6.2.16/2.423 dowalk: /1.9 appliance: 6.2.16/1.217 search: 6.2.16/2.389 DB: /1.6
Serial number: 60463

Problem with data from field

Posted: Mon Dec 10, 2007 5:20 pm
by michel.weber
Hi

I think i have found the 'bug'.

We have a couple of 'keep tags' which enclose part of the html body. They do not enclose the 'meta' tags nor the 'title' tags.
When i take them away my additional fields are suddenly filled in

This behaviour strikes me as odd. The meta being outside the 'keep tags' they should either be all excluded or all included, but not 'some'.