VMWare Hires Redis Key Developer – But Why?

My friend MosheZ alerted me to the fact (which few hours later appeared all over the net) that VMWare hired Redis key developer. Which is as close to acquisition as you can get with an open source project.

What is Redis? Redis is yet-another-NoSQL. A key-value store, somewhat similar to Tokyo Cabinet. Except that Redis does persistence differently, which makes it faster in many cases. Redis started as a Memcached replacement, so a lot of the documentation describes it as follows: Redis is like Memcached, except it supports more data types, it is persistent to some degree and it is not distributed.

But the more interesting question is – Why does VMWare need Redis?

VMWare says: “As VMware continues its investments in the context of cloud computing, technologies such as Redis become key for future cloud based apps, whether private or public cloud, and the cloud infrastructure itself.”

So Redis is cloud and VMWare is a major cloud player, therefore VMWare needs redis. Two discrepancies stand out in this story:

  1. Redis is not distributed system. Unlike Cassandra, where you can scale by quickly adding more Cassandras to the party, Redis is just one (very fast) server, only supporting master-slave replication. VMWare is all about adding new machines quickly. Something doesn’t fit.
  2. While key-value stores are  cloudy and VMWare is cloudy, there is no natural match between their cloudiness. VMWare itself can’t use Redis – because Redis technology is a natural match for big-data websites which VMWare clearly isn’t. Some VMWare customers can benefit from Redis, but most can’t. What’s going on here?

Clearly, the place to look is not in existing value but in the future. So here are my predictions:

  1. Redis will become distributed. It can certainly be done. Perhaps it can even be done without losing it’s performance edge.
  2. VMware will announce an Amazon-like, cloud-for-rent service. They have the technology for this, and Redis will help them manage the “huge website” part of it.
  3. They may also offer Redis on top of the virtual servers, as something built in. Like Amazon’s Oracle servers.
  4. VMWare can also offer storage for rent. They can do it with EMC storage (since VMWare is an EMC company), but I’m betting that they’ll do it with Netapp – their favorite cloud partner.  I can totally imagine a near-future Netapp-Vmware offering that is similar to Amazon’s EC2+ S3+AWS.

Predicting is very difficult (especially about the future) and I’m very much ready to regret ever posting my day dreams in public, but these are exciting possibilities. I wonder if they make sense to anyone else.

*********************************

And speaking of MosheZ, he is a prolific song writer, and he wrote a song about DBAs! I’m thinking of performing it live during one of my presentations. Actually I’m thinking of writing a presentation “How to win arguments or influence users” just to have an excuse to sneak this song in 🙂


Lessons From OOW09 #1 – Shell Script Tips

During OpenWorld I went to a session about shell scripting. The speaker, Ray Smith, was excellent. Clear, got the pace right, educating and entertaining.

His presentation was based on the book “The Art of Unix Programming” by one Eric Raymond. He recommended reading it, and I may end up doing that.

The idea is that shell scripts should obey two important rules:

  1. Shell scripts must work
  2. Shell scripts must keep working (even when Oracle takes BDUMP away).

Hard to object to that 🙂

Here’s some of his advice on how to achieve these goals (He had many more tips, these are just the ones I found non-trivial and potentially useful. My comments in italics.)

  1. Document dead ends, the things you tried and did not work, so that the next person to maintain the code won’t try them again.
  2. Document the script purpose in the script header, as well as the input arguments
  3. Be kind – try to make the script easy to read. Use indentation. Its 2009, I’m horrified that “please indent” is still a relevant tip.
  4. Clean up temporary files you will use before trying to use them:

    function CleanUpFiles {
    [ $LOGFILE ] && rm -rf ${LOGFILE}
    [ $SPOOLFILE ] && rm -rf ${SPOOLFILE}
    }
  5. Revisit old scripts. Even if they work. Technology changes. This one is very controversial – do we really need to keep chasing the latest technology?
  6. Be nice to the users by working with them – verify before taking actions and keep user informed of what the script is doing at any time. OPatch is a great example.
  7. Error messages should explain errors and advise how to fix them
  8. Same script can work interactively or in cron by using: if [ tty -s ] …
  9. When sending email notifying of success or failure, be complete. Say which host, which job, what happened, how to troubleshoot, when is the next run (or what is the schedule).
  10. Dialog/Zenity – tools that let you easily create cool dialog screens
  11. Never hardcode passwords, hostname, DB name, path. Use ORATAB, command line arguments or parameter files.I felt like clapping here. This is so obvious, yet we are now running a major project to modify all scripts to be like that.
  12. Be consistent – try to use same scripts whenever possible and limit editing permissions
  13. Use version control for your scripts. Getting our team to use version control was one of my major projects this year.
  14. Subversion has HTTP access, so the internal knowledge base can point at the scripts. Wish I knew that last year.
  15. Use deployment control tool like CFEngine. I should definitely check this one out.
  16. Use getopts for parameters. Getopts looked to complicated when I first checked it out, but I should give it another try.
  17. Create everything you need every time you need it. Don’t fail just because a directory does not exist. Output what you just did.
  18. You can have common data files with things like hosts list or DB lists that are collected automatically on regular basis and that you can then reference in your scripts.
  19. You can put comments and descriptions in ORATAB

Visualization Session – The Slides

The “Visualization Session” at OOW Unconference was great. Thanks to everyone who showed up for the lively discussion. It was probably the most fun I’ve ever had at a presentation.
Also thanks for the fine folks whom I later met at the OTN lounge and explained that they wanted to attend my presentation but the OTN lounge had free beer and I did not. I’ll see what I can do about the beer next year.

For those who missed the presentation whether due to beer or to distance from OpenWorld, you can get my slides here. As usual for my presentations, I’m not sure if my slides are meaningful without me standing next to them. It is just a bunch of graphs without the stories. If you really want to hear the stories, you can invite me to speak at your usergroup 🙂


Adventures Installing 11g Clusterware

Let me start by announcing that 11g clusterwear is easy to install. Really. Straightforward, simple, no issues at all. A crazy Dane can do it with both hands tied behind his back. The adventures I’ll describe below are 100% my own fault and have nothing to do with the quality of the product, which is excellent.

Before describing my adventures, I want to talk a bit about mountain biking. Mountain biking is a fine hobby, but riding on rocks and roots requires some skill. As a beginner, you find yourself riding very slowly and walking a lot around difficult sections. It is frusturating, but this way, you rarely crash at all. Experts, of course, ride very fast and rarely walk at all, and they still rarely crash. On the way from beginner to expert, there is a time where you gain some confidence in your skills, so you start riding faster. Unfortunately, this confidence often arrives before you actually have the skills you need to ride fast. The result is about 6 to 12 month of frequent crashes – until skills improves and confidence is reduced to the point that they match again.

I think I just hit this dangerous stage in DBAing. I now have some confidence in my understanding of how Oracle works, so I do not constantly refer to the docs. Which means that I  make more mistakes than I did as a newbie.

Back to 11g clusterware:

Installation went fine. About 2 minutes of “next-next-next install” and 5 minutes of waiting for the install to finish. It is nearly identical to 10g installation, except that the automatic configuration of the VIP actually works, even if your public IP happened to be 192.168.X.X, so no need to run VIPCA manually after the install. Nice.

But then I discovered that I installed clusterware in the wrong directory. Not a big deal, but I dislike non-standard installations, and since it was so easy to install, I decided to take another 15 minutes to uninstall and install it again.

How do I uninstall?

I assumed that you uninstall clusterware just like you uninstall the database software – just run the installer UI, select the right product and click on uninstall. Why bother checking  the docs when you can make convenient assumptions?

Click-click-click and the product should be uninstalled. There was some error message about files it could not remove. I decided to ignore it – the new installation will be in a different directory, and I can always remove extra files later.

When I tried to install it again, the installer complained that VIP is taken.

Strange. Didn’t I uninstall the clusterware? I ran crs_stat to check, and was somewhat worried that it actually worked. Returning all resources with status “unknown”.

I decided that I need to reboot that nodes. At least this should get rid of the VIP.

10 minutes later I found out that the nodes can’t stop rebooting. They start, and 30 seconds later they crash again. Those of you who have some experience with clusterware can already guess what was wrong. /etc/init.d/init.crs – the script that starts clusterware on boot was still there, attempting to start a partially uninstalled cluster, and failing. I did not even bother checking the logs, but I assume they’d show either that the VD is no longer there or that the interconnect is not configured, which leads each node to decide on a split brain and crash.

Over and over again. Thanks RedHat for interactive boot, which allowed me to stop this madness.

When the servers came back up, at least VIP was gone. So I decided to try another install. This time it ran all the way until the point it attempted to configure the notification services. This failed in a rather unhelpful fashion. The log error just said “configuration failed”. Thanks.

I decided to go for extreme cleanup, and simply delete ever related file I could find on the servers – in /etc, $ORACLE_BASE, $CRS_HOME, VD, OCR. Everything I could think of.

Attempting to install again. Again Notification Services fail. At least I know enough not to ignore this error. My redeeming virtue, I guess.

When all else fails, read the docs. Which was not as easy as you would believe. I kind of fell out of practice with the documentation, and 11g did move things around a bit. I could not find the RAC installation guide. Looking under “Grid”, I found RAC administration guide and Clusterware administration guide. Both contained advice on how to remove a node from the cluster, but nothing about how to remove the entire cluster.

Searching for “clusterware uninstall”, led to Overview of Deinstallation Process, which seemed promising. It contains this good advice: “Refer to Oracle Clusterware Installation Guide for your platform for Oracle Clusterware deinstallation procedures.” , but it did not link anywhere. I did find the installation guide, under “Installation” (duh), and it did contain uninstall instructions. I’m still a bit annoyed that searching for “uninstall clusterware” did not come up with this document.

Following the documentation turned out the best idea I’ve had that day. It reminded me that I should run rootdelete.sh, and then rootdeinstall.sh and only then run ./runInstaller -deinstall -removeallfiles.

Since I caused significant manual damage prior to following the documentation, I was not surprised by a long list of complaints that each of these scripts had for me.

But after following the uninstall documentation, I was finally able to install clusterware 11g, successfully, in the right directory.

5 hours after I decided on a small 15 minute solution. It was time to go home.

BTW. Now that I think of it, it is quite possible that in 10.2, it was impossible (or at least undocumented) to uninstall clusterware on Linux. I cannot find the instructions in 10.2 documentation at all (The OpenVMS docs do contain uninstall instructions). Our internal procedure was always just “reimage the servers”.


Latches, Spinning and Queues

You know that you care a lot about a topic, if you find yourself thinking about it again and again, each time hoping to gain enough insights to stop this cycle.

Turns out, I care a lot about what my CPUs are doing. Last time I came up with the epiphany that 100% CPU utilization is a bad idea. During the discussion that followed, Jonathan Lewis and Noons took the time to explain to me the difference between waiting and spinning.

The topic came up again as I’m digging for the worse ways concurrency can go wrong.

Concurrency becomes interesting when the concurrent processes attempt to access shared resources, and since Oracle has shared memory, the shared resources tend to be areas in the memory.

We are in 11g now, so we have 3 Oracle ways to protect memory structures – Locks, latches and mutexes (Oracle mutexes, which should not be confused with OS mutexes.). Below, I’ll summarize the main differences between them. Nearly everything I wrote (and a lot more including cool examples) is covered by Tom Kyte’s Expert Oracle Database Architecture book. I’m just summarizing the importnat points below for my (and your) convinience.

When you read about latches, the first thing you hear is that “Latches are lightweight locks”. Lightweight in this case means “Takes less bits in memory”. Why do we care about our locking structures being small? Small memory footprint of the locking mechanism will translate to faster checks and changes to it. Latches are smaller than locks, and the new 10g mutexes are even smaller than latches.

Locks work by queueing. One process holds the lock for a resource, everyone else who tries to access the lock queues up and goes to sleep (i.e. off the CPU). When the current process finishes, the next in line becomes runnable and now owns the lock on the resource. Queuing is nice, because we have a bunch of queueing theory that lets us predict response times, and waits and such. It is also nice, because while Oracle manages the locks and queues it gives us tons of information about who is blocking and who is waiting. And as one last nice, while all those processes are waiting for locks, they are not using CPU, nor represent any cpu scheduling overhead.

Latches work by spinning (mostly). Think of a case when you know a process will need the memory structure for a very short amount of time. Do you really want to maintain queues, waste time on context switching, lose your CPU cache all for just few milliseconds of waiting? Latches exist for this reason. If a process tries to access the latch and its busy, it will keep on retrying for a while, still using the cpu. If during this “retry time” the latch became free, the process can take the latch and we are saved from the need to context switch. If it didn’t get the latch after several retries, the process goes off the CPU.

The important thing to note it that there is no queue. So there is a lot of uncertainty around when your process will finally get its latch. It is entirely possible that processes that started spinning on the latch later will get the latch first due to a fluke of luck. Because there is no queue, it seems that there is no good way to find a list of processes that are waiting for a latch (maybe by looking at statistics and wait tables?). You do have good information about how many requests, misses, spins and sleeps per latch, which is very useful information.

It is interesting to see how Oracle attempts to prevent the situation where a process waits forever for a latch, and keeps missing it because newer processes keep snatching the latch away as soon as it is freed. When reading about “latch free” wait events, the documentation says: ” The wait time increases exponentially and does not include spinning on the latch (active waiting). The maximum wait time also depends on the number of latches that the process is holding. There is an incremental wait of up to 2 seconds.” It is nearly the same mechanism ethernet uses to avoid machines starving for network connections (“truncated binary exponential backoff“) . Incrementally increasing the wait times reduces the probability of collision.

Mutexes are best covered by Tanel Poder.  They are even smaller and faster to take than latches, they also work as a cursor pin (signifying shared or exclusive ownership of a cursor), and they give you even less information about who is waiting for what and for how long. You have information about sleep times, but not number of requests and misses.


Troubleshooting Streams @ Openworld Unconference

Thanks for everyone who attended the session – you were attentive, intelligent and supportive audience, I couldn’t have hoped for better. I was especially stoked to see Lewis Cunningham in the audience, since he i an expert on the topic. Thank you Lewis for giving encouraging nods throughout the session 🙂
I definitely got an appetite for speaking a bit more, and I’m now furiously scribbling and sending abstracts. I hope to see you in my future presentations. 

I promised to upload my material, so here we go:

 

  • Powerpoint (including the component diagram)
  • Script for creating the replication environment (intentionally buggy!) and also (working) script for removing the replication. This creates replication from HR schema to a new schema called MYHR, but only  replicates one table.
  • Script with troubleshooting queries used in the session.

Good Times in Oracle Openworld

I came back from Openworld to the office and for two consecutive hours I could not shut up about how wonderful it was, how much I’ve learned, new troubleshooting methods, new features, new hardware!

“You are the only person in the world who can enjoy this kind of thing” my manager said when I finally closed my mouth. The senior DBA, senior sysadmin and storage admin agreed.
“What can be so great about listening to boring marketing sessions for an entire week?” They asked.

That is the big secret, I think. During Oracle Openworld I spent only three hours listening to boring marketing sessions. 14 more hours were spent on non-boring, non-marketing sessions.
And according to my accounting I spent over 20 hours that week talking to some of the most brilliant an interesting people I’ve had the pleasure to meet.

Of course I had a great time. It was better than most of my vacations.

In many ways it was better than the previous Openworld. Last year was magical, like falling in love. This time, it was a bit like visiting an old friend. I felt more at home. I knew many people, and was very happy to talk with old friends whom I’ve never met. I felt more comfortable introducing myself to people I did not know and just chatting with anyone who happened to sit next to me in OTN lounge. I also knew better which sessions to attend and what can be happily skipped. With a little help from my friends I even worked up the nerve to give an unconference session.

The highlights: Alex Gorbachev’s Clusterware Internals, walking tour of San Francisco with Rob van-Wijk, Blogger Meetup, Amazon’s session, chatting with Nicolas and Rob, RAT session by Jim Czuprinksi, complaining about streams and clusterware issues to product teams in Demogrounds, 11gR2 Beta briefing, meeting Frits Hoogland and Jacco Landlust, Andrew Holdsworth sessions, Tom Kyte sessions, Lary Ellison’s keynote, seeing Mogens Norgaad naked (video only), Tim Hall’s Spore demo, Tanel Poder’s Advanced Troubleshooting, Greg Marsden’s Linux Tuning talking with Fuadar, Justin Cave and Lewis Cunningham and my own unconference sessions.

I only regret not getting Tom Kyte to sign my chest 🙂