Relational database vendors stepped up to the plate
DEC RDB (Relational DataBase, I know, creative) had long since provided SQLMOD. These were source modules you compiled into object form which could then be added to your site specific library and used by any application. This is one of the reasons service oriented architecture was our first solution. It was very easy to whip up a server that accepted data streams in known formats from raw TCP/IP ports and message queues then chewed on it and returned a result.
The Java and Web developers began decrying, not unjustifiably, that the remote services were getting overloaded and taking too long to respond. Part of this was the volume being fed to them and the other part was “just add it here” syndrome. Many players are guilty of allowing “just add it here” syndrome to overtake SOA. Upper management did not want to pay for all of the time and people it would take to create, in a large corporate environment, a new service for each and every thing the Web needed. A systems analyst had to modify system start and stop procedures. The networking team had to assign and make open a network port. If you were using a major messaging handler like Websphere/MQ Series/insert-your-favorite-here as your external interface, you were almost forced to “just add it here” because it became a hassle to set up a completely new message route for each and every message. If there were only a handful of backend services receiving messages it was a simple matter to setup the message translation and just send it to one of the previously defined services.
Remember what I said in part 1?
The correct place to validate data is the last step/process which actually writes it to the database. Once it gets into the database it is too late.
Commercial database vendors and several OpenSource projects, created the stored procedure concept. Most of the database and business world started to realize that quote of mine was right. Their solution, however, was hideously flawed.
A stored procedure is much like an SQLMOD. It is some database specific code callable by any language able to access the database and invoke said procedure. Unlike SQLMOD it is not stored in some external library, but the database itself. In order to execute it your application has to connect directly to the database. This bypasses any and all safeguards. The database has to protect itself. In the SOA world if someone sends in data that your service cannot handle you either dump it to a log or your service crashes. While Denial of Service (DOS) attacks can cripple a business’ cash flow, they do not expose sensitive customer data to evil doers.
There are many different SQL injection techniques out there. If anybody actually knew all of them databases would be able to defend against all of them. Boiling it down to a thimble’s worth of content, here are the general classes I know about:
- A massive amount of data is sent in for a single, poorly handled field/column causing some form of overflow error giving the following text the ability to act directly on the database.
- A series of highly specialized “escape” or “control” characters are included in the data stream which exploit a back door to admin or other user level allowing the following text to operate directly on the database.
- A stored procedure is poorly written and tested only for the “happy path” of good data so when bad data is sent the procedure itself gives access to the database in ways not intended.
Directly connecting a Web or external service to a database is always bad.
That was my belief when stored procedures were first introduced. If Java or any other Web language was allowed to directly access a database not owned by the Web service itself and outside of the firewall demons aren’t just at the gate, they are inside of your kitchen eating all of your food. Just ask Yahoo or any of the other Web companies having to fess up to breaches and identity theft. If you think that is old news read up on the HBO hack.
Your relational database, at least at the enterprise level, is supposed to be the repository of low level business logic and all of your data. It must be protected as sacred and access must be restricted. Stored procedures were a good idea. Making them callable from the Web was not.
I hear some of you asking “low level?” This is where it gets interesting and where a System Architect either shines or buries a company. Let’s take a look at some of the low level stuff.
Referential Integrity – Relational databases have provided this for decades. It is what stops you from deleting a customer master record when there are invoices/orders/whatever tied to that customer.
Column Level Validation – Say you are on a Web page about to order a pair of blue jeans. The jeans only come in certain waist sizes. Far too many Web pages code that logic in the page when it belongs in the database. Column level validation means you have a reference table and only values found in that reference table can be placed in the column. Referential integrity stops someone from deleting an entry in the reference table if there are any rows in this table containing that value.
Range and Duration rules – Many times range and duration rules get handled via column level validation, but other times they do not. Can a person weigh -256 pounds? Can blood pressure be less than zero? Some databases allow you to set numeric ranges directly on the column and others only let you restrict a column to positive numeric values and others allow you to assign a list or range of legal values. For those which don’t you have to create a stored procedure. If you are a wholesaler allowing stores to purchase soda by the 6, 12 or 24 bottle packages, you order system had best not allow someone to enter 3.
Duration rules are a bit more interesting. If you are a retailer who has sales, an item’s price is good from a certain time on a certain day to a different day and potentially different time. Trying to hack entry screens for this each and every week would be insane. Instead you have an item pricing table and a stored procedure. The procedure accepts an item code and timestamp then looks through your pricing table returning either an error code or a the price which was active for that item at that date and time. Some procedures get really complex involving customer specific discounts and quantity break pricing, which, of course, means additional parameters.
Oh come on, you’ve all been to a Web site and seen something like:
0-5 $12.95 6+ $9.95
That pricing logic needs to reside within the database so every interface shows the same values. Forget that Trivago commercial about the same hotel room having different prices on different sites. That is a completely different situation.