Hello,
I am using Webinator 5.0.2-Unix-w/plugin.
I have scoured the manual.
My search is not indexing some numbered directories containing various legacy mainframe logs which are on a web server. I want all content indexed, all files are plain ASCII, nothing binary. The mainframe developers need some love.
The full path I am trying to index is:
http://servername/CENTRALLOGS/00000136/LOGS/JF9999.TXT
The indexing stops at 00000136, the search result shows me a text file listing of the directory's content. I need clickable URLs and I need JF9999.TXT to be indexed.
My complete settings follow.
Thanks in advance for any help.
Best regards,
Bert
All Walk Settings
Current Profile: legacylogs Webinator 5.0.2-Unix-w/plugin
Database ? /usr/local/morph3/texis/legacylogs/db2
Walk Summary ? Last complete walk: 2006-10-05 16:03:47 (took 2 seconds)
Success. 85 pages (324,055 bytes)
Base URL ? http://prdcvs02/CENTRALLOGS/
Enterprise ? Yes
Domain
Robots ? robots.txt: Y
Meta: Y
Extensions ? .html .htm .txt .pdf .doc .xls .swf .TXT
Exclusions ? /cgi-bin/
~
?
Crawl Delay ? 0
Parallelism ? Threads: 5 Servers: 2
Verbosity ? 2
Rewalk Type ? New
Rewalk Schedule ? Frequency Daily 2AM
Watch URL ? (none)
Notify ? (none)
Categories ? Category (none)
URL Pattern (none)
URL File ? (none)
URL URL ? (none)
Single Page ? (none)
Page File ? (none)
Page URL ? (none)
Strip Queries ? N
Ignore Case ? Y
Extra Domains ? (none)
Extra Networks ? (none)
Extra URLs REX ? (none)
Exclusion REX ? (none)
Exclusion Prefix ? (none)
Required REX ? (none)
Required Prefix ? (none)
Max Page Size ? -1
Max Pages ? -1
Max Bytes ? -1
Max Depth ? -1
Page Timeout ? 60
Meta Tags ? (none)
Standard Meta ? Y
All Meta ? N
Keep HTML ? ALT Text Y
<STRIKE> Y
<DEL> Y
<FORM> Y
Remove Common ? N
Ignore Tags ? Begin (none)
End (none)
Keep Tags ? Begin (none)
End (none)
Plugin Split ? Depth 0
Bytes 0
AtPage (not checked)
Pages 0
Word Definition ? [\alnum\x80-\xff]{1,70}
[\alnum\x80-\xff.]{1,70}>>[.&']=[\alnum\x80-\xff.]{1,70}
Login Info ? (none)
Password (none)
Proxy ?
Proxy Login Info ? Name (none)
Password (none)
Cookie Source Path ? (none)
Temporary Dir ? (none)
Off-site Pages ? N
Stay Under ? Y
Prevent Duplicates ? Y
All Extensions ? Y
Store Refs ? Y
Inline Iframes ? Y
Max Frames ? 20
Execute JavaScript ? N Note: This feature not enabled by current license
Fetch JavaScript ? N
Debug JavaScript ? N
Protocols ? HTTP FTP
Embedded Security ? Any
Entropy Source ? Standard
Max Redirects ? 12
Index Name ? index.html index.htm
DNS Mode ? Internal
Net Mode ? Internal
User Agent ? Mozilla/4.0 (compatible; T-H-U-N-D-E-R-S-T-O-N-E)
Mime Types ? */*
Default Refresh Time ? 1 hour
Minimum Refresh Time ? 1 minute
Maximum Refresh Time ? 90 days
Maximum Process Size ? Small
I am using Webinator 5.0.2-Unix-w/plugin.
I have scoured the manual.
My search is not indexing some numbered directories containing various legacy mainframe logs which are on a web server. I want all content indexed, all files are plain ASCII, nothing binary. The mainframe developers need some love.
The full path I am trying to index is:
http://servername/CENTRALLOGS/00000136/LOGS/JF9999.TXT
The indexing stops at 00000136, the search result shows me a text file listing of the directory's content. I need clickable URLs and I need JF9999.TXT to be indexed.
My complete settings follow.
Thanks in advance for any help.
Best regards,
Bert
All Walk Settings
Current Profile: legacylogs Webinator 5.0.2-Unix-w/plugin
Database ? /usr/local/morph3/texis/legacylogs/db2
Walk Summary ? Last complete walk: 2006-10-05 16:03:47 (took 2 seconds)
Success. 85 pages (324,055 bytes)
Base URL ? http://prdcvs02/CENTRALLOGS/
Enterprise ? Yes
Domain
Robots ? robots.txt: Y
Meta: Y
Extensions ? .html .htm .txt .pdf .doc .xls .swf .TXT
Exclusions ? /cgi-bin/
~
?
Crawl Delay ? 0
Parallelism ? Threads: 5 Servers: 2
Verbosity ? 2
Rewalk Type ? New
Rewalk Schedule ? Frequency Daily 2AM
Watch URL ? (none)
Notify ? (none)
Categories ? Category (none)
URL Pattern (none)
URL File ? (none)
URL URL ? (none)
Single Page ? (none)
Page File ? (none)
Page URL ? (none)
Strip Queries ? N
Ignore Case ? Y
Extra Domains ? (none)
Extra Networks ? (none)
Extra URLs REX ? (none)
Exclusion REX ? (none)
Exclusion Prefix ? (none)
Required REX ? (none)
Required Prefix ? (none)
Max Page Size ? -1
Max Pages ? -1
Max Bytes ? -1
Max Depth ? -1
Page Timeout ? 60
Meta Tags ? (none)
Standard Meta ? Y
All Meta ? N
Keep HTML ? ALT Text Y
<STRIKE> Y
<DEL> Y
<FORM> Y
Remove Common ? N
Ignore Tags ? Begin (none)
End (none)
Keep Tags ? Begin (none)
End (none)
Plugin Split ? Depth 0
Bytes 0
AtPage (not checked)
Pages 0
Word Definition ? [\alnum\x80-\xff]{1,70}
[\alnum\x80-\xff.]{1,70}>>[.&']=[\alnum\x80-\xff.]{1,70}
Login Info ? (none)
Password (none)
Proxy ?
Proxy Login Info ? Name (none)
Password (none)
Cookie Source Path ? (none)
Temporary Dir ? (none)
Off-site Pages ? N
Stay Under ? Y
Prevent Duplicates ? Y
All Extensions ? Y
Store Refs ? Y
Inline Iframes ? Y
Max Frames ? 20
Execute JavaScript ? N Note: This feature not enabled by current license
Fetch JavaScript ? N
Debug JavaScript ? N
Protocols ? HTTP FTP
Embedded Security ? Any
Entropy Source ? Standard
Max Redirects ? 12
Index Name ? index.html index.htm
DNS Mode ? Internal
Net Mode ? Internal
User Agent ? Mozilla/4.0 (compatible; T-H-U-N-D-E-R-S-T-O-N-E)
Mime Types ? */*
Default Refresh Time ? 1 hour
Minimum Refresh Time ? 1 minute
Maximum Refresh Time ? 90 days
Maximum Process Size ? Small