PDA

View Full Version : Normalization for web development



bipper
07-28-2008, 09:34 PM
I am kind of going around in circles in my head here with-in regards to database normalization. I know strict database normalization is generally a good idea, and I had it brutally pounded into my face through out college and most of my career.

My question now is, how much normalization is too much in regards to web development? I tend to love third form, but even this seems excessive at times when you have databased information that needs to be calculated using so much information which is spread through out the database. I know it is sin, but I am really starting to feel the rebel in me just want to paint a few extra columns in select tables to hold intensive data that will never change. An example may include the length of time in a multi staged event. Instead of piecing together four tables, and taking the earliest time to the latest and figuring length out that way, it just seems to make sense for me to have it saved in the event table.

Normally, I would not forsee this as an issue, at all. However, with web programming and corps *finally* grasping some OOP in web development (outside of JEE), these queries are getting largely taxing.

Maybe this is a can of worms that is already opened, but I am curious as to what others think on this. Maybe I just need new db material to read, or maybe web applications are to be treated differently as far as database design goes. My thoughts are structured, but before I put effort into a DB redesign, I thought I would ask the EoFF prodigy programmers about it.

What do you all think?

Dr Unne
07-29-2008, 01:17 AM
(I say this not being a prodigy of any sort.) How big is your website? Unless your site is really getting hammered with traffic I can't imagine performance of one table vs. four will really affect you. It's arguably worth having a cleaner design even if it results in a performance hit. Computers and networks are so fast today that you should write the easiest-to-read-and-maintain code possible until such time as you discover a performance bottleneck. Then fix the bottlenecks until things are fast enough again. Premature optimization being the root of all evil etc.

Is the concern clunkiness? Abstract away the clunkiness until it's no longer clunky. Abstract it so you can change the database later if you need to, without changing all the code that relies on the database. If you have raw SQL splattered all over your code, then it might be a hassle to constantly JOIN a billion tables, but you probably have bigger problems than that if you're writing the same SQL over and over. If you have to join four tables to compute some value, so what? Write it once and package the code so you can re-use it.

If you're designing the next Google, then yeah, I've read articles about some of super-huge sites running on flat text files, or simplistic databases with fields full of enormous serialized strings of crap. It's not a sin to have a database that isn't normalized. There's no single right way to do anything.

bipper
07-29-2008, 02:26 PM
The site I am working on is actually expected to maintain a very large flow of traffic. The part that is actually causing me to have doubts vs normalization is simply the fact that we need developers and analysts to be able to pick up on the design fairly quickly. However, I am in the same boat as you where I believe normalization is worth it, even if you take a slight hit.

I actually stumbled on a few stranger answers from a few people in my network whom work on large volume commercial sites. They basically informed me that normalization and complex joins are more of a pain than they tend to be worth. When a query takes a solid 5 seconds to run on a development platform, the effects on a live server can be devastating. They also recommend an object database approach for transactional speed. That is getting out side of my experience though. The flat file thing was also suggested, to which my head imploded.

The site I came into is simple suffering problems because of actual queries putting it to the database to run more complex operations. Like a subquery that is checking for disqualifying times between a range of qualified times it was handed, and then being placed in a for loop which executes the query in rapid succession. Now, I guess it is up to me over how we are going to tack these problems, with out straying from a very tight development schedule.

Rantz
07-29-2008, 05:04 PM
I was taught to commence another phase after normalization: denormalization. This phase is where you weigh the pros and the cons of every part of your normalization and use common sense to produce the results. There may often be several issues with normalization; you may even decrease performance through excessive normalization if the complexity of the database becomes too rough and the normalized entities' problems were not even that big to begin with. Denormalization is also where you weigh space requirements vs. performance according to what the server can accept.

Of course, you are often able to do the denormalization in your mind during normalization and just ignore applying the rule at that point, but if you have to write documentation you need to do the normalization part first so you can provide reason later on as to why normalization was deviated from.

bipper
07-29-2008, 06:33 PM
yeah, half my headache is the extremely normalized database with a lack of documentation. I have never heard of de-normalization, and have been scolded by other programmers for not following strict normalization before. However, to me, normalization just does not always make sense, much like you implied.

Thanks for the help guys! Still, I think this is kinda interesting, cause databases have to be the most structured and wholly unorthadoxed technology ever... oh wait...

Renmiri
07-30-2008, 03:26 AM
Heh, at my consulting company we used denormalization all the time :D

You gotta weigh pros and cons. And seriously, if your two or three most used tables are not in 3rd normal form but cheat a little for better performance (and simpler joins to write ;) ) then you would be like 90% of the databases out there

bipper
08-18-2008, 06:02 PM
A different approach; because this site is rather large, and the concept of normalization as a programmer pisses me off, I think I may have come up with and interesting side solution....


If I ran one database set up as a completely denormalizsed temp database (to less strain on database A and create less robust calls from the AJAX layer) is this considered poor in any case, really? Keep the temp database local for quick retrieval, while the main database, used to essentially populate the Ajax calls (used to populate PHP objects) would be on a local, but separate server.

Half the reason I am scratching at this is because a lot of this information is legally sensitive, and I am taking the most secure approach I can really think of. That, and I love thinking out side the box.

Flying Mullet
08-18-2008, 06:07 PM
Summary tables are always worth a look when it comes to performance. We have some data that's presented to users at a summary level and rather than have the data processed to a summary level every time the user requests it we have some chron jobs that run summary processing code once and load the summary tables for fast data retrieval for the users. Of course, the users have to understand that the data isn't real-time, but considering our customer looks at data in a monthly format up to the previous month, real-time isn't a concern for them.

I don't know if that helps you at all but that's what popped to mind when I read your post.

bipper
08-18-2008, 06:18 PM
We need real time, well, let's call it hollywood real time - it can be a bit fake. Up to 5 minutes would not be bad. Summary tables though, reminds of something I read about in a previous life. I like it mulley.

I guess I am just looking at dumping php objects into these temp tables so that they do not have to suffer the clunkiness and insecurity of Ajax get, or JQuerys post.

Samuraid
08-19-2008, 07:48 AM
Normalization is great to be forced to use while you learn, but after you have learned, then you are free to (and should) break out of the mold when necessary and proper.

Some things are just not meant to be normalized (such as OLAP databases), and some things would just run too slowly normalized. Plus, once you factor in partitioning (if your database is just that big), then denormalization can make a lot of sense.

Often the best mix of things involves a little denormalization and storing "expensive" calculated data sets using memcached.

bipper
09-08-2008, 09:28 PM
Though I know it is not exactly in the scope of the original quandary, Object-relational impedance mismatch - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Object-Relational_impedance_mismatch)

this was a very interesting subject that I have been reading on for the last couple weeks. It really describes what my feet were thinking me into.

I stopped to think, and thought through this. Now I am writing an ORP, which will more than handle what I need. Though, my learning has me leaning towards this not being the most propper answer, but it gets the results I need atm.