Cluster server

rjshelq
Posts: 82
Joined: Thu Nov 17, 2005 3:25 pm

Cluster server

Post by rjshelq »

Hi,

My web host has installed a Linux cluster server, where a central load balancer receives all the HTTP/HTTPS/FTP requests, and then routes the request to any one a few dozen servers for processing. This system uses a massive central hard drive array where all of the files reside, from which the assigned server grabs all the necessary files and performs the required processing to fulfil the user's request.

Apparently this cluster server scheme is becoming quite common in the web-hosting business, and even in large corporate applications... so you're likely to run into it again... and again.

In this cluster arrangement, there is no way to predict which server a particular http request will be processed on (although there may be a few minutes of persistence for SSL and FTP traffic). But, in general, each HTTP request will be served by any machine that the load balancer chooses.

Consequently, the idea of having a continuously running process such as texis/monitor is not practical, because the HTTP request for a Webiantor search can be routed to any one of dozens of available servers for processing.

All of the Webinator files and databases are on the central hard drive array, and are grabbed by whatever server is assigned to run the search. But of course there is no way to have the shared memory all ready on any arbitrary server, so Webinator would have to set up its shared memory, and then run the search.

So... armed with that background, is there any way to install and/or modify Webinator such that it can be run on such a server cluster?

Thanks,
Richard
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Cluster server

Post by mark »

You're basically saying that there are multiple machines accessing the same database files over nfs or the equivalent. That is not supported because record locking won't work and database corruption will ensue.

Webinator would have to be installed locally on each server. If you're not using query logging the database, once crawled, could be configured for single-user (no locking) read-only search access so any of the servers could talk to it. Or the search script could be replaced on all but one server to proxy all the searches to one machine.
bart
Posts: 251
Joined: Wed Apr 26, 2000 12:42 am

Cluster server

Post by bart »

Cluster serving with ANY database application must be done with an os that knows how to treat the filesystem and shared memory as one large SMP machine. Examples of this on Linux are OpenMosix and OpenSSI. A Webinator app should work just fine in either of these cluster environments.
rjshelq
Posts: 82
Joined: Thu Nov 17, 2005 3:25 pm

Cluster server

Post by rjshelq »

Hi Mark,

It would really be neat to find a way.. some way.. any way... to use Webinator in this cluster, even if it involves some significant compromises. For example, giving up the query logging.

There is no other search engine that even comes close to giving me the quality of search results that I get with Webinator. The free service at master.com lacks the ability to use the customized search script that I've developed... hence my request to find some way to use Webinator in this cluster environment.

There is no way that I could install Webinator on every machine... the number of machines is not even possible to predict, and there is no place other than the central file system to install anything.

A major feature of the cluster system is that the machines are just processing units, units that can be taken off-line for repair, or new units can be added, all without disrupting the overall system. The file server and the load balancer the focal points of the system, while the individual computers are just processing resources.

So there's no way to proxy to one machine, since in this cluster arrangement, machines may come and go, such as during service or repairs, and further, there is no way for a user to specify which computer handles any given task, since the load balancer is in charge of assigning the system tasks.

Any file locking is done at the central file server (which in this case is apparently a BlueArc Titan http://bluearc.com), since the individual machines operate entirely from that central file server.

It's a whole different way of looking at a server! More and more web hosts and corporate servers are changing to such a cluster arrangement, and over the next few years many people predict that the cluster will be the dominant scheme for large scale servers. (see for example http://www.linuxvirtualserver.org )

So, please the concept ruminate in the back of your clever mind for a few days.. and then in a few days, maybe next week, let me know if you have any further insights.

thanks,
Richard
bart
Posts: 251
Joined: Wed Apr 26, 2000 12:42 am

Cluster server

Post by bart »

>More and more web hosts and corporate servers are changing to such a cluster arrangement, and over the next few years many people predict that the cluster will be the dominant scheme for large scale servers. (see for example http://www.linuxvirtualserver.org )

Richard,

No corporate server that has non-static database driven content or server-side user data _can_ever_ use a "cluster" of this type.

What they're calling a cluster in this case is really a HA (high-availability) load balanced content server, and isn't good for anything much more than reliably serving a bunch of static content or totally autonomous applications. The minute you need to do any interprocess communication this kind of cluster becomes a lot of trouble. All it really is is a bunch of cloned server blades attached to a network storage array with no interprocess communication. This technique was invented more as a way to help ISP's do their thing cheaply and reliably than a way to provide a scalable virtual HA processor to their customer.

I know all this junk because I'm in the middle of building a cluster driven application with Thunderstone's stuff. Doing so requires that I use a more sophisticated cluster system than something like LVS. The same thing would be true regardless of my choice of database vendors.

Ask your ISP if you can lock your application to single virtual server node. This will solve your problem.

Bart
rjshelq
Posts: 82
Joined: Thu Nov 17, 2005 3:25 pm

Cluster server

Post by rjshelq »

I only offered the simple example of linuxvirtualserver.org as an introduction to the general idea.

My web host will not divulge the actual scheme that they are using, but whatever it is, it works fine with PostgreSQL and MySQL databases.

They say they cannot (or perhaps will not) lock any application to a single node, saying that such a lock would defeat the system-wide load balancing that they are trying to achieve.
rjshelq
Posts: 82
Joined: Thu Nov 17, 2005 3:25 pm

Cluster server

Post by rjshelq »

In order to continue to try to find a solution, could you explain which resources Webinator requires to remain constant, and which ones it can "restart"?

That is, when Webinator if running normally I see that there are two memory segments, one semaphore, and that processes called monitor and texis are running.

If the processes are killed, but the memory segments are still in place, will Webinator "restart"?

What I'm trying to understand, is: What is the minimal amount of "state" information that must be preserved?

thanks for your help,
Richard
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Cluster server

Post by mark »

The minimum is the license.key file in the installation directory. If the monitor and the memory segment are gone the mem can be reloaded from license.key. There's a shared mem per database as well but that need not persist but no 2 machines should be allowed to access the database at the same time unless it's read-only.
rjshelq
Posts: 82
Joined: Thu Nov 17, 2005 3:25 pm

Cluster server

Post by rjshelq »

Great.. now suppose that someone does a search, and this first search is assigned to server #1. The monitor happily starts up, creates shared memory segments, and delivers the search results.

Then, suppose that there are no searches for a few hours (or a few days), and when a new search request arrives, this second job is assigned to server #2. Again, the monitor happily starts up, creates shared memory segments, and delivers the search results.

However, after another period of inactivity, a third search request arrives and it is assigned again to server #1. When Webinator begins it's tasks, it finds that there is already a monitor process running, there are already environment variables set, and there is already shared memory... but all of those are from several days ago. What happens then? Will the old "set-up" be somehow "out of sync" and cause a failure?

My current theory is that this sort of scenario is happening and causing an error message that says: "Could not create license segment".
User avatar
mark
Site Admin
Posts: 5519
Joined: Tue Apr 25, 2000 6:56 pm

Cluster server

Post by mark »

The monitors running on different machines are each updating the on disk license.key periodically. They are probably stepping on each other and making a mess.
Post Reply