Trying to count character entity as one character

Post Reply
User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Trying to count character entity as one character

Post by Thunderstone »



In the default webinator vortex scripting in the section "showlinks",
the title is truncated to 70 charcters. For example, there's this
among others:

<!-- Truncate the title in case it's really long: -->
<substr $Title 0 70>

This makes sense to me. But what scripting would I need to use to be
sure that this doesn't truncate right in the middle of a character
entity? For example, if characters 68 thru 73 of the title are
"&cent;", I wouldn't want to cut this down to "&ce". Besides, the
copyright symbol is only one characters and if this is the end of the
title (and this is the only character entity in the title), then there
are only 68 displayed characters in which case the title ought not to
be truncated at all.

Or perhaps the better (and much more direct!) question is: how can I
truncate the title to 70 characters but count each character entity
as one character?

-John Koch - - - __o
Knowledge Systems, Inc. - - - - _ \<,_
<John.ksi@webplus.net> - - (_)/ (_)
(A NET-FRIENDLY SIG. http://www.ncsa.uiuc.edu/Edu/ICG/pt1.ch2.Etiquette.html )



User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Trying to count character entity as one character

Post by Thunderstone »




The character entities are decoded before placing them into the database.
So if you're getting things that look like entities they are coming from
a badly coded html page. You should look at the original page source
from the web server to see what's there.

In the general case of truncating strings on a word boundary, you would need
to use <rex> to find a better break point. Perhaps something like:
<rex ">>=.{,65}[^\space]{,10}" $Title>


User avatar
Thunderstone
Site Admin
Posts: 2504
Joined: Wed Jun 07, 2000 6:20 pm

Trying to count character entity as one character

Post by Thunderstone »





Mark Willson's answer was correct; entity characters are pre-decoded
in such a manner as to prevent them from being split. Only a syntax
error in the original HTML will cause this behavior.

There is another way however to prevent truncation on non-whitespace
characters. Just replace the call to <substr> with a call to <abstract>.

EG: <abstract $Title 70 0>

For more info on <abstract> see:

http://www.thunderstone.com/vortexman/node119.html

I'm not exactly sure why we didn't use this in the original script.


Post Reply