I did a search and didn't find any pertinant info concerning our current issue. I'm trying to find folks who have done the following:
We have a NetApp setup to receive the audit (solaris) files from a few hundred servers. The praudit command can output the files in XML format, however can the appliance handle this on the fly or will I have to write some code to convert these files to XML prior to the appliance scanning them?
The file size will increase four fold when converted to XML obviously, so I would like to keep them in their native format and let the appliance handle the conversion when walking the filesystem.
It's not something the search appliance can do, however the XML conversion doesn't have to be *prior* to the appliance's walk.
It would be possible for you to set up a CGI application that converts the praudit output to XML when the file is requested. That way XML output will be given whether the search appliance or a real user is requesting the file, without needing to store the XML.
I figured that the appliance was not set up to handle audit files. Once the files are positioned on the NetApp, I will probably rip through them with a script and convert them all to XML, then have the Appliance walk through the XML converted files. Due to security issues, CGI is a no-no here. However to allow security to quickly review/search the files, converting them to XML via praudit will be the simplest method. And sadly, some of the converted XML files can be 100MB+ in size. Compound that by a couple hundred machines and you have yourself quite a bit of data to rip through.
Am I correct in thinking that the appliance can not walk the binary files that the audit daemon produces? Would it be possible that once the XML audit files are created and then compressed, would the appliance be able at that point to walk them?
The search appliance essentially acts as a text-only web browser like w3m. It can't handle binary data just as a web browser can't handle binary data (unless the binary data is a format that they know how to read, like flash or a PDF).
Similarly, the search appliance will see the large XML file as a web browser sees it - a large chunk of data. Not a pretty HTML page (unless something else does that).
The "no CGI" mandate really ties your hands, you have to store the data in a presentable format.
I do have the dtd and xls files associated with the XML formatted audit files, so presentation isn't going to be an issue. What is going to be interesting is how the appliance handles the large XML files.
I'll let you know how it goes once I get it set up and running. I have a feeling that I may have to do more work so folks won't pull back the entire 100MB+ file.
The appliance should be ok with the 100M XML file; but you're right, it's an issue of when users click on the search result and wonder why it takes 45 seconds to load the page while it's sucking down the 100M file for local transform.
Breaking up larger files into smaller XML files will definintely help the situation.
Idealy, I'd like to create a profile specific to the audit files so the the search results return only the blocks that are found during the search. Since I have the dtd and xls files for audit, could it be possible to create a profile for this purpose?
Better yet, a profile that when the found item is presented and the link is clicked, that the previous 5 blocks and trailing 5 blocks of information is presented to the user. That would mean that the entire XML file would have to be stored as each block representing a 'page.' Is that correct?
> Idealy, I'd like to create a profile specific to the
> audit files so the the search results return only the
> blocks that are found during the search.
By "blocks" I'm guessing you're referring to subsections of the XML document? The search appliance won't do anything to break down the XML file in to smaller sections for you, it operates on web pages as it's smallest ganularity. The search appliance is meant to be a website crawler, not a fully customizable searching webapp that is pre-built to do websites (which is what our Webinator product is, although that wouldd fall in the defition of "CGI" for the corporate overlords).
What you _really_ need in this situation is a webapp. The situation you describe in the second line could easily be done with a CGI app that reads the XML 11 lines at a time and puts some <!-- noindex --> comments around the first 5 and last 5 blocks, and *POOF*! A search for a record would return one page - that records with the 5 surrounding hits.
I don't mean to sound like a broken record, but what you're trying to do is essentially build a house without using a hammer. To make the kind of feature-rich, dynamic site you're envisioning, you need to be able to DO something on the server.
The CLOSEST the search appliance would be able to get to you is if you can put this data in a database and expose a JDBC interface for it. The search appliance would be able to use its "DB Walker" functionality to create a browsable HTML interface for the DB's contents, and then walk that. It wouldn't provide the "previous 5 & next 5" links that you mentioned, but it'd be something.
Thanks Jason. Yes, by block I meant each section of data.
I'm just trying to push the limits of this thing within the limited confines of our environment. I told security long ago to purchase some security audit software that has built in algorithms that are tailored to audit logs. (not that they would understand them anyway.
In all honesty, it's rarly if a system (in this case Thunderstone Appliance) can do it and more on the side of presentation capabilities.