Thinking about Design Patterns
Posted: November 3, 2009 Filed under: books, design 1 CommentI have this friend who is an ambitious young corporate climber. When I was started out as a team lead, I was totally overwhelmed by the office politics I was suddenly exposed to. Naturally, I turned to my friend for advice. She told me to read Machiaveli’s “The Prince”. So I did. Its an interesting read, but not that amazing as far as management advice goes. When I later tried to discuss the book with my friend, I found out that she never actually read the book herself – she just thought it is good advice.
6 years later and I still believe people when they tell me that I have to read a book. I’m naive like that.
So I spent the last two weeks reading “The Timeless Way of Building”. Its an architecture book. Architecture as in cities and buildings. The reason I spent two weeks reading a professional book intended for a different profession is that at some point in history (1994, I believe), some people though that the ideas in the book are relevant to software development. These days you can’t really be a Java developer without being fully fluent in the development pattern language.
And of course, everyone was saying “You have to read The Timeless Way of Building. It will change the way you think about software.”. From my days in development, I still remember quite a bit of stuff about design patterns, and I never really liked that particular approach to software development, but I didn’t really figure out why. After reading the authoritative source on patterns, I can say the following:
- Christopher Alexander had some good ideas about patterns. The book is readable to non-architects and is very enlightening. I recommend reading it if you are interested in what makes some cities and buildings feel better than others.
- I am pretty sure that his advice on how to design good buildings is not really applicable to software development field.
A lot of his ideas are based on the fact (which he did the research to prove) that people intuitively know how buildings should be designed, and that when you ask a large number of people “How do you imagine you will feel in such room?”, you’ll get an overwhelming consensus. This is far from being true in the software field. - What is common known today as software design patterns is so far removed from what Christopher Alexander recommended for architects, that software developers should really go and find a different name for what they are doing.
I’m still rather shocked by the differences between Christopher Alexander’s patterns, and what design patterns look like today. It is not few tiny differences that occur whenever ideas are translated from one domain to another. Some of the changes are profound.
First of all, Christopher Alexander says that patterns describe the way people already do things. “Night Life” and “Parallel Streets” existed before the book “A Pattern Language” was written. I’m not at all sure this is the case for design patterns. People buy design pattern books to learn the patterns themselves, not just the language or which patterns are better than others.
Second, patterns should have an intuitive meaning and intuitive name. Again, you don’t need a book to know what is a “Bus Stop” or “Small Parking Lots”. You may want to read the book to find out why they are a good idea, or how to make a good bus stop, but you know what it is. I don’t believe that anyone knew what is an “Abstract Factory” before reading a document about design patterns. Even patterns that have been used for decades got a fancy name. It can take a while to figure out that a Singleton is a global variable. One of the simplest and most common patterns in software development “A function that does exactly one task” is missing from software design patterns. “A Loop” is also a pattern which is missing in action. All this gives the wrong impression that patterns are very complicated and something that can be mastered by experts only – which is exactly the opposite of what Christopher Alexander intended.
Third, patterns are abstract concepts. They are always implemented in a different way, because the entire idea is to be sensitive to the context, which is never the same twice. There is a pattern called “Six-foot balcony”, but it would be wrong to mass-manufacture six-foot balconies and start attaching them to buildings. Six-foot balcony is the idea, the exact shape of the balcony will be designed to match the building, the view, the trees, the sun, etc.
So it is rather annoying to discover that all patterns have “implementation examples”, which developers enjoy copying into their code. I’m all for code reuse, but this is not patterns mean.Wikipedia has a decent description of what defines a pattern, and “being implementable in one or two simple classes that can be copy-pasted” is not part of it.
Executive summary: “Timeless Way of Building” is an interesting book on architecture, with some good insights about how humans like to live and a bit of a Zen feel. You will not learn anything about software development from reading it. If you already know software design patterns, you will be struck by how different the ideas in the book seem.
The Small Difference Between XML and RDBMS
Posted: December 14, 2008 Filed under: design, musing 10 CommentsOracleNerd posted a rather heated discussion between himself and an application developer, and few people jumped in to comment. I can’t recommend reading this discussion, it is full of unsupported assertions and rude language. Something that was mentioned in it, did catch my attention.
MCohen wrote “So for example, it shouldn’t really matter to me that we’re using an Oracle database over say, an object database or xml files, or Amazon S3. We’re not quite there yet, but we’re making progress.”
I worked on few projects that stored data in XML files. Usually it starts rather innocently. The program needs to store and retrieve few configuration parameters, and using a database for that seems like an overkill. So they put it in an XML file.
As the project grows, more configuration parameters are added. Some of them may be depended on each other, so if you update one and fail to update the other, you need to undo your first update. Congratulations, you just reinvented transactions.
Few weeks later the project have several components and maybe even threads. Sometimes, several of those need to access the data at once. Maybe you serialize all the access to the data, completely locking the file while it is updated. Maybe you want to allow reads while an update is going on, but how will you prevent reading partially updated data? Different RDBMS has different solutions for concurrency, but Java’s XML library will let you write your own. Enjoy.
I’m not saying XML is always a bad idea. Sometimes, you need your data store to be human-readable. Sometimes, your data is just not very well structured, and would not fit in an RDBMS. But using XML has a cost, and you should make sure it is worth it. To make the right decision, you should know what is the difference between an XML file and an RDBMS.
If you don’t know whether your data is in XML or in RDBMS, how will you know whether or not you need to implement your own transactions?
New Look
Posted: May 15, 2008 Filed under: design 3 CommentsI wanted the fancy code formatting to look better on my blog. Besides, the design was too monochromatic and the links on the side-bar were difficult to read.
So I’m trying out a new design, and as an added bonus, my blog now has flowers. Flowers are usually an improvement.
Feedback on new look is welcome. If my blog is less readable than before, let me know and I’ll continue searching for a good theme. Those of you reading me from a feed shouldn’t care about the change at all.
Why Application Developers Think Differently Than DB Developers
Posted: January 29, 2008 Filed under: design, development, musing 12 CommentsIt is all about programming practices. Very early in their training application developers learn three important paradigms:
- Generality – Solutions should work in as many new situations as possible
- Premature optimization is the root of all evil
- Goto considered harmful
These are very good programming practices that work great in many development situations. Except when it comes to databases. With databases, the first two practices are trouble. Here’s how:
- Application developers want their code to be very general and adaptable. So, they can’t depend on any vendor specific DB features. This mostly means that the developers will be forced to reinvent the wheel on every step of the way, or write a huge, expansive database layer, or even worse – use Hibernate.
- More generality: Often application developers think that the idea of having separate tables for different conceptual objects and different columns for specific attributes of these objects is a form of evil hard coding. Their solution will be to use few “generic” tables, like this one:
create table things (thing_id number,
thing_name varchar(4000),
thing_value varchar(4000),
thing_data_type_id number);
Or sometimes, they will add columns named userdata_1 … userdata_n to an otherwise well designed table. Just in case.
- Of course, if someone will point out that their design will never scale and will cause the application will run very slowly, the application developer will scoff “This is premature optimization! Knuth said it is evil! If it will be slow we will get the DBA to add few indexes and buy more memory”.
DB developers have a different view of things. For us, performance and scalability are not features number 18 and 27 on our list. For DBs performance and scalability is everything. There is no other reason to even have a DB, not to mention an expensive one like Oracle.
No optimization is ever premature on a database, and we prefer to start optimizing when we do the design and modeling. The question “which way is faster” is usually asked when we try to decide between several design ideas or different ways to write the same query. On the other hand, we optimize on an entire different level than C developers, and only very rarely resort to inlining functions and embedding assembly code.
DB developers remember very well that if the data model is good, all other optimizations are much easier. Often great queries will simply write themselves on a good model, but even when they don’t, it is easy to see which indexes should be added and how queries should be tuned when the underlying model is clear.
We also know that DBs work best on specifics. The more information about your data the DB knows, the better you can get it to perform, and also the better you can guard the consistency of your data. While application developers like to “generalizing” their programs by pretending they don’t have information that they actually have, DB developers prefer to use every piece of information they can get their hands on, and then force more information and promises from their customers.
Good DB developers are vendor specific. They know one database very well, they know all the features, limitations, concepts and behaviors. They know how to make it run very fast and how to avoid problems that can only happen on this specific DB. They can tune it to death and get maximum value out of every last feature the vendor put in. Being good engineers, we will not let this knowledge go to waste.
No wonder it is very difficult for application developers and DB developers to get along!
What the application developer considers a best practice, DB developers call a complete waste of resources and a hopelessly slow application. When DB developers talk about their best practices, application developers think this is the root of all evil.
Of course, I didn’t do justice to the topics I covered. Tom Kyte has several best selling books about how to approach database development and design. But I think I had a new insight about the source of the problem: Application developers are not stupid, they just had different indoctrination. Thought it is worth sharing.
Exceptions Should Not Contain Business Logic
Posted: January 11, 2008 Filed under: design, development Leave a commentSteven Feuerstein’s PL/SQL book has a remarkably good chapter about collections. In addition to showing the different kinds of collections, their methods and how to use them, he also shows some cool non-trivial uses for collections, a taste of the great things you can do.
Few of the examples involve the use of collections for caching. Associative arrays are perfect for caching, and caching is a useful programing technique. One of the examples is as follows (The code is available on Oreilly’s site, so I don’t think I’m breaking copyright here):
FUNCTION description (code_in IN hairstyles.code%TYPE) RETURN hairstyles.description%TYPE IS return_value hairstyles.description%TYPE; FUNCTION desc_from_database RETURN hairstyles.description%TYPE IS CURSOR desc_cur IS SELECT description FROM hairstyles WHERE code = code_in; desc_rec desc_cur%ROWTYPE; BEGIN OPEN desc_cur; FETCH desc_cur INTO desc_rec; RETURN desc_rec.description; END; BEGIN RETURN descriptions (code_in); EXCEPTION WHEN NO_DATA_FOUND THEN descriptions (code_in) := desc_from_database; RETURN descriptions (code_in); END;
This code is short, simple to understand, can be adapted for many different uses, and is an excellent demonstration of the power of associative arrays.
I have just one problem with this code: Most of its logic is implemented inside the exception clause.
Technically speaking, exceptions are very similar to “goto” and historically goto has been used for exactly the same purpose – get out of the current block when things have gone wrong. Exceptions (and goto) make the code easier to read by putting all the clean up work in one place and jumping to this one place whenever needed, but other uses of exceptions can make the code more confusing.
The development standards I’m working with specify that exceptions should be used for handling exceptional conditions, not situations that will occur during the normal flow of the program. The advantage of this standard is that when someone reads the code, she can understand the main logic without jumping back and forth between different blocks. Then when she gets to the exception block she can see which problems you anticipated and how you dealt with those.
How do we know if we did this right? We look at the code and ask ourselves what would happen if we deleted the exception block and then ran it in ideal circumstances (enough memory, disk space, smart users, etc) – would it behave the same from business logic perspective? It should.
So, I would rewrite the function above as follows:
BEGIN If (not descriptions.exists(code_in)) then descriptions (code_in) := desc_from_database; End if; RETURN descriptions (code_in); END;
I’m not saying that Steven’s original example is confusing. It is not. I enjoyed the smart use of exceptions. But I know from bitter experience that examples have ways of propagating, and beginners that see this example in the book may copy this bit without thinking about the finer points of code readability.
If you are not sure what belongs in the exception clause and what doesn’t, keep your business logic out of it. Even if an example in your favorite book is doing the opposite. 99% of the time, your code will be more readable.
Exception Safety for PL/SQL
Posted: December 8, 2007 Filed under: design, development, plsql 3 CommentsOne of the things I like about C++, is that there are tons of theory around the language. There are many smart people who sit around and write articles about how different language mechanisms are best used. We have a bit of it in PL/SQL, but not nearly enough.
One of the most important articles about exception safety in C++ is Abraham’s Exception Safety in Generic Components. 90% of it is totally irrelevant to PL/SQL developers because the smart people from Oracle chose not to inflict templates and generic components on us (at least yet, who knows what will happen in the next version). However, in this article Abrahams defines the different levels of exception safety that a code component can achieve, and this directly relates to PL/SQL developments.
The levels are described in terms of a PL/SQL block that is called by another, and what can the caller expect after the called block has thrown an exception when each level is implemented.
Exception safety levels, as I translate them from C++ to PL/SQL:
- No safety: You don’t handle exceptions. When your code breaks you have no idea what is the state of your structures and data, and all kind of resources like cursors may be left open. Note that you can achieve this state by completely ignoring the possibility of errors, or by using “when others null”. Be glad that you are developing in PL/SQL and not in C++, because at least Oracle keeps some semblance of data consistency, and keeps you from memory leaks and crashes.
- Basic Safety: You take care to close cursors or other resources when exception occurs, and that you leave the system in a valid state. Raise_application_error functions does a bit of this – cleanup of out variables if relevant. Note that you still have no idea what the block actually did – inserting one row out of three is a valid state, so are unknown changes to variables in the calling block. In fact, in PL/SQL there is very little difference between basic safety and no safety – because Oracle works hard to make sure that truly invalid states don’t occur even if you do no exception handling at all. But they can happen if you have integrity rules that you expect your code to keep and are not modeled into the database (foreign keys and similar constraints are a good example for this – if you didn’t model them into the tables, you will need to take care of them as part of basic exception safety).
- Strong Safety: This what we usually aim for in exception handling – complete success or full rollback. Either the block did what it was supposed to do, or it left the system as if it was never called, leaving the caller with all the options open when deciding how to proceed. It is difficult to achieve this state in C++, but in PL/SQL it should be almost trivial – scope your transaction correctly, so Oracle’s built in rollback can take of most of the cleanup for you, release all opened resources, clean up changes to global variables (this can be tricky – so avoid using global variables), and you should be good.
- No-throw Safety: The operation will always succeed and never throw an exception. Usually both impossible and unnecessary. Except maybe for code you use in exception handling and similar cleanup operations.
Knowing all this may not cause you to write safer code, but hopefully you can at least think about how unsafe is the code you write, and what it will mean to those who use it.
***
If you are looking for something nice to read on the weekend, here are my recommendations:
Charles Schultz has a DB with high hopes for Christmas.
Jon Emmons explains about holiday decorations.
Dizwell released Doris, the queen of Oracle-on-Linux installations.
Small shortcut in a large report
Posted: November 22, 2007 Filed under: design, plsql 9 CommentsCame back from Oracle OpenWorld to a flurry of business demands. End of Quarter is here and with it new demands for reports and data.
The main report that I’m writing is rather large, generating the data requires about 30 different queries on about 10 databases. Some run in few seconds while others, despite my best tuning efforts can take an hour or more.
So, I wrote a PL/SQL procedure that runs all the queries on all the DBs and collects the results into a single table that I’ll use in the report. That’s pretty cool, but suppose there is an error in query number 23? After I fix the error, I have to run the report again which takes hours. Quite annoying.
So, I created a small table:
CREATE TABLE PROGRESS
( "STAT_NAME" VARCHAR2(30),
"SCHEMA_NAME" VARCHAR2(100),
"TIME_DONE" DATE)
And now, before each query runs, I check in the progress table if it was already done or not. If it was already done, I don’t need to do it again. If it wasn’t, I run it and after it ran successfully, I add it to the table.
Here’s the procedure that takes care of this process:
create or replace
PROCEDURE CONDITIONAL_EXEC ( in_stat_name IN VARCHAR2,
in_schema_name IN VARCHAR2,sql_stmt in varchar2) AS
stat_status int;
BEGIN
select count(*) into stat_status from progress where schema_name=in_schema_name and stat_name=in_stat_name;
if stat_status=0 then
dbms_output.put_line ('About to run: ' || sql_stmt);
execute immediate sql_stmt;
insert into progress values (in_stat_name,in_schema_name,sysdate);
commit;
end if;
END CONDITIONAL_EXEC;
Advantages of this approach:
- I save time by not running the same stuff twice
- I can always “undo” something that already ran by deleting from that table
- I can watch the progress of my report in the table
- The table contains the finish time of each query. This means I have rough information on how long each query took to run and I can use this information to decide which queries need more work.
- I can also use the time information to show the business how much db time each statistic they request costs them.Maybe they’ll decide that some statistics are not that important? Especially those that are carefully crafted so we can’t use any existing aggregated data to produce them. Unlikely, but who knows?
Not bad for 5 minutes of work.
C developers with SQL code
Posted: September 14, 2007 Filed under: design, development, mysql, plsql 4 CommentsJeff Hunter posted an amusing story about how he optimized a query and what the developer did with his suggestions. The story is so cool not because it is one of these horrible WTF stories that are unlikely to happen in any but the most moronic IT organizations, but because this is a rather harmless oddity that happens all the time.
Most developers started their career by studying CS. This means about three years of writing code in Pascal, C, Java or whatever language CS departments prefer these days. After three years of C development you think of every problem in terms of arrays and loops, after three years of Java you think about objects and patterns. You are not even aware that you are doing this, but everything you look at instantly becomes an array with a loop, or a virtual private class.
At least in my days, databases were a highly optional course in the CS department, so the new developer on the team is probably seeing SQL for the first time in her life, after years of converting problems to arrays and loops. Unfortunately, a database table looks exactly the same as an array. The developer’s automatic response is to write a loop that goes over the array. Imagine her horror when she discovers that SQL doesn’t do loops. With few exceptions, even after learning SQL, the developer will continue thinking of it in terms of loops and arrays. For the procedural developer, everything is a full table scan and nested joins.
At this point, his code will do funny things like fetch entire tables and iterate over the results to count the number of rows. Maybe not that funny, but close enough. In a good development organization, there will be code reviews and his manager or peers will turn him to the path of efficient database development.
In my experience, it is best that if at this fragile stage, no one will tell the developer about PL/SQL. The longer the developer has to work only with set operations within the database, the more he will get the ideas of database developement and the better DBA he will become. Developers need to be broken of their loop addiction early and brutally. If they find out about PL/SQL too soon, they will always have the crutch of going back to their loops and will never really get DBs, or at least the process will be much delayed.
DBAs working with development teams really need to take care in educating their developers. This is really preventing performance issues before they are even written, and it is much easier to solve problems at this stage.
Meaningless Foreign Keys
Posted: September 5, 2007 Filed under: design, plsql 7 CommentsJonathan Lewis wrote an article about Meaningless Keys. It’s an old article, but I just read it today and I found it very inspiring. So immediately I started thinking about my latest project.
For the project, I created a customers table. Each customer has a unique name, but it also has a meaningless ID, generated as sequence, and this meaningless id is used as a foreign key in other tables.
Why do I have this meaningless ID? Well, by now it was a habit, but I can still remember the reason – you don’t want to duplicate the same piece of data in multiple tables. Thats a known best practice for developers. But why? Because you don’t want to waste space – but in this case it is just varchar(50) column in a table that will never reach 100,000 rows, the space is not the issue. Also because if we will want to change the customer name we will have to change it in many places. We will have to hunt down its usage in all the child tables and update them as well. Thats the real bummer – it will complicate the application and make it more bug prone.
So, maybe Jonathan Lewis got it wrong? Or maybe I got him wrong? Maybe my meaningless key is not such a bad idea after all? Too bad, it would have been nice using the customer name as the key and avoiding all these join operations. If I could just solve the update problem. Why can’t I create a foreign key constraint with “on update cascade”?
Actually, I can.
Oracle didn’t provide this feature as built-in, but Tom Kyte wrote an update cascade package that does exactly what I just describe – it updates all the child references when you update a key on the parent table. I used it, it works great.
I would have given an example here, but this is Tom Kyte we are talking about, and of course he gave a great example that is much clearer than what I can put here. So just go check it out.
How to automate tasks
Posted: August 17, 2007 Filed under: design, maintenance, musing, plsql 2 CommentsAutomation has always been the holy grail of information technology departments everywhere. Sysadmins and DBAs always prefer to automate the routine and boring tasks to make more time for interesting projects. Managers also like automation, mostly because this means fewer employees can do more work, and because automated tasks usually involve fewer human errors.
Some tasks are automated to a point where they run automatically at a specific time every day without DBA present. Backups are the most common example of this. More often the DBA writes a script that will automatically do what used to be a series of steps done manually by the DBA at some point, every DBA has the “create schema” script that will create a user with the permissions and objects he is going to need.
After years of automating different tasks, I’ve come up with a method of doing this which usually brings good results, and which I always regret when I don’t follow:
1. Automate the task after you did it manually at least three times. This gives you better understanding of the different scenarios you need to prepare for and also prevent you from wasting time automating tasks that turn out not to be routine after all.
2. Write down all the steps you do when you perform the task manually. This helps when writing the script, and also serves as good documentation later on.
3. Write a script that does the task based on the steps you wrote earlier.
4. To verify that the script indeed does the right thing I run it in debug mode – step by step, verifying that each step behaves as expected. Step n is supposed to shut down the listener, verify that after step n the listener is really down.
5. The script should output a log with all the action taken, responses and errors. You will need this when something goes wrong.
6. Run the script several times and verify the results and log.
7. Ask someone else to review the script with you. This can help catch bugs, make sure the script is readable to other people and have the added benefit of having someone else who can maintain the script if needed.
8. If you schedule it using cron, oracle job scheduler or a similar system, add a change to your change tracking system saying that this script now runs automatically every day at 3am.