See Me at Oracle OpenWorld 2013

I’ll be in San Francisco next week, presenting about Hadoop and Big Data at the biggest conference. If you want to say “hi”,

you can attend one of my sessions:

  • Big Data Panel Discussion (Sunday, 8am at Moscone West room 3003)
  • Data Wrangling with Oracle Big Data Connectors (Sunday 3:30 pm at Moscone West room 3003)
  • Women in Technology roundtable (Monday 10 am at Oak Table World )
  • ETL on Hadoop (Monday 11 am at Oak Table World – I presented the same content at Surge last week, you can peek at the slides).
  • It’s a Solid-State World: How Oracle Exadata X3 Leverages Flash Storage (Wednesday 3:30 pm at Westin San Francisco, Metropolitan I –  With Marc Fielding)
  • Data Wrangling with Oracle Big Data Connectors (Thursday 2 pm at Moscone South room 300)

I also plan to attend few sessions by other people:

  • Monday 12am – Oracle Database 12c for Data Warehousing and Big Data [CON8710]
  • Monday 6:30pm – Oracle’s Big Data Solutions: NoSQL, Connectors, R, and Appliance Technologies [BOF11057]
  • Tuesday 3:30pm – Big Data Deep Dive: Oracle Big Data Appliance [CON8646]
  • Wednesday 10am – In-Database MapReduce for DBAs and Database Developers Using SQL or Hadoop [CON8601]
  • Thursday 12:30 – Hadoop Your ETL: Using Big Data Technologies to Enhance ‘s Data Warehouses [CON8732]

And few social events: Oracle’s ACE dinner, Friends of Pythian dinner, Blogger meetup, OTN event and possibly few more.

You may also find me helping Oracle and Cloudera demonstrate the Big Data Appliance at the Engineered Systems demo booth.

See you there!

Hadoop Summit 2013 – The Schedule

90% of winning is planning. I learned this as a kid from my dad, and I validated this through many years of work in operations. This applies to everything in life, including conferences.

So in order to maximize fun, networking and learning in Hadoop Summit, I’m planning my schedule in advance. Even if only few hours in advance. Its the thought that counts.

In addition to social activities such as catching up with my former colleagues from Pythian, dining with my distributed solutions architecture team in Cloudera and participating in the Hadoop Summit bike ride, I’m also planning to attend few sessions.

There are tons of good sessions at the conference, and it was difficult to pick. It is also very possible that I’ll change my plans in the last minute based on recommendations from other attendees. For the benefit of those who would like soume recommendations, or to catch up with me at the conference, here’s where you can find me:


11:20am: Securing the Hadoop Ecosystem – Security is important, but I’ll admit that I’m only attending this session because I’m a fan of ATM. Don’t judge.

12am: LinkedIn Member Segmentation Platform: A Big Data Application – LinkedIn are integrating Hive and Pig with Teradata. Just the type of use case I’m interested in, from my favorite social media company.

2:05pm: How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million – I’m a huge believer in offloading ETL to Hadoop and I know companies who saved big bucks that way. But $30M is more than even I’d expect, so I have to hear this story.

2:55pm: HDFS – What is New and Future – Everything Hadoop relies on HDFS, so keeping updated with new features is critical for Hadoop professionals. This should be a packed room.

4:05pm: Parquet: Columnar storage for the People – Parquet is a columnar data store for Hadoop. I’m interesting to learn more about Parquet as it should enable smoother transition of data warehouse workloads to Hadoop.


11:50pm: Mahout and Scalable Natural Language Processing – This session is promising so much data science content in one hour, that I’m a bit worried that my expectations are too high. Hope it doesn’t disappoint.

1:40pm:  Hadoop Hardware @Twitter: Size does matter  – I’m expecting operations level talk about sizing of Hadoop clusters. Something nearly every company adopting Hadoop is struggling with.

2:30pm: Video Analysis in Hadoop – A Case Study – My former colleagues at Pythian are presenting a unique Hadoop use-case they developed. I have to be there.

3:35pm: Large scale near real-time log indexing with Flume and SolrCloud – I’m encountering this type of architecture quite often now. Will be interesting to hear how Cisco are doing it.

5:15pm: Building a geospatial processing pipeline using Hadoop and HBase and how Monsanto is using it to help farmers increase their yield – Using Hadoop to analyze large amounts of geospatial data and help farmers. Sounds interesting. Also, Robert is a customer and a very sharp software architect, so worth attending.

Expect me to tweet every 5 minutes from one of the sessions. Its my way of taking notes and sharing knowledge.  If you are at the conference and want to get together, ping me through here or twitter. Also ping me if you think I’m missing a not-to-be-missed session.

See you at Hadoop Summit!



“A beginning is the time for taking the most delicate care that the balances are correct.”

It is spring. Time for planting new seeds. I started on a new job last week, and it seems that few of my friends and former colleagues are on their way to new adventures as well. I’m especially excited because I’m starting not just a new job – I will be working on a new product, far younger than Oracle and even MySQL. I am also making first tiny steps in the open-source community, something I’ve been looking to do for a while.

I’m itching to share lessons I’ve learned in my previous job, three challenging and rewarding years as a consultant. The time will arrive for those, but now is the time to share what I know about starting new jobs. Lessons that I need to recall, and that my friends who are also in the process of starting a new job may want to hear.

Say hello
I’m usually a very friendly person and after years of attending conferences I’m very comfortable talking to people I’ve never met before. But still, Cloudera has around 200 people in the bay area offices, which means that I had to say “Hello, I’m Gwen Shapira the new Solutions Architect, who are you?” around 200 times. This is not the most comfortable feeling in the world. Its important to go through the majority of the introductions in the first week or two, later on it becomes a bit more awkward. So in the first week it will certainly seem like you are doing nothing except meeting people, chatting a bit and franctically memorizing names and faces. This is perfectly OK.

Get comfortable being unproductive
The first week in a new job feels remarkably unproductive. This is normal. I’m getting to know people, processes, culture, about 20 new products and 40 new APIs. I have incredibly high expectations of myself, and naturaly I’m not as fast installing Hadoop cluster as I am installing RAC cluster. It takes me far longer to write Python code than it does to write SQL. My expectations create a lot of pressure, I internally yell at myself for taking an hour or so to load data into Hive when it “should” have taken 5 minutes. But of course, I don’t know how long it “should” take, I did it very few times before. I’m learning and while learning has its own pace, it is an investment and therefore productive.

Have lunch, share drinks
The best way to learn about culture is from people, and the best way to learn about products is from the developers who wrote them and are passionate about how they are used. Conversations at lunch time are better than tackling people in the corridor or interrupting them at their desk. Inviting people for drinks are also a great way to learn about a product. Going to someones cube and asking for an in-depth explanation of Hive architecture can be seen as entitled and bothersome. Sending email to the internal Hive mailing list and saying “I’ll buy beer to anyone who can explain Hive architecture to me” will result in a fun evening.

If its not overwhelming, you may be in the wrong job
I’m overwhelmed right now. So many new things to learn. First there are the Hadoop ecosystem products, I know some but far from all of them, and I feel that I need to learn everything in days. Then there is programming. I can code, but I’m not and never have been a proficient programmer. My colleagues are sending out patches left and right. It also seems like everyone around me is a machine learning expert. When did they learn all this? I feel like I will never catch up.

And that is exactly how I like it.

Make as many mistakes as possible
You can learn faster by doing, and you can do faster if you are not afraid of failing and making mistakes. Mistakes are more understandable and forgivable when you are new. I suggest using this window of opportunity and accelerate your learning by trying to do as much as possible. When you make a mistake smile and say “Sorry about that. I’m still new. Now I know what I should and shouldn’t do”

Take notes
When you are new a lot of things will look stupid. Sometimes just because they are very different from the way you are used to things in a previous job. Don’t give in to the temptation to criticise everything, because you will look like a whiner. No one likes whiner. But take note of them, because you will get used to them soon and never see things with “beginner mind” again. In few month take a look at your list, if things still look stupid, it will be time to take on a project or two to fix them.

I may be new at this specific job, but I still have a lot to contribute. I try hard to look for opportunities and I keep finding out that I’m more useful than I thought. I participate in discussions in internal mailing lists, I make suggestions, I help colleagues solve problems. I participate in interviews and file tickets when our products don’t work as expected. I don’t wait to be handed work or to be sent to a customer, I look for places where I can be of use.

I don’t change jobs often. So its quite possible that I don’t know everything there is to know about starting a new job. If you have tips and suggestions to share with me and my readers, please comment!

Environment Variables in Grid Control User Defined Metrics

This post originally appeared at the Pythian blog.

Emerson wrote: “Foolish consistency is the hobgoblin of small minds”. I love this quote, because it allows me to announce a presentation titled “7 Sins of Concurrency” and then show up with only 5. There are places where consistency is indeed foolish, while other times I wish for more consistency.

Here is a nice story that illustrates both types of consistency, or lack of.

This customer Grid Control installed in their environment. We were asked to configure all kinds of metrics and monitors for several databases, and we decided to use the Grid Control for this. One of the things we decided to monitor is the success of the backup jobs.

Like many others, this customer runs his backup jobs from cron and the cron job generates an RMAN logfile. I thought that a monitor that will check the logfile for RMAN- and ORA- errors will be just the thing we need.

To be consistent, I could have moved the backup jobs to run from Grid Control scheduler instead of cron. But in my opinion, this would have been foolish consistency – why risk breaking perfectly good backups? Why divert my attention from the monitoring project to take on side improvements?

To have Grid Control check the log files, I decided to use OS UDM: Thats a “User Defined Metric” that is defined on “host” targets and allows to run a script on the server. I wrote a very small shell script that finds the latest log, greps for errors and counts them. The script returns the error count to Grid Control. More than 0 errors is a critical status for the monitor. I followed the instructions in the documentation to the letter – and indeed, everything easily worked. Hurray!

Wait. There’s a catch (and a reason for this blog post). I actually had two instances that are backed up, and therefore two logs to check. I wanted to use the same script and just change the ORACLE_SID in the environment.

No worries. The UI has a field called “Environment” and the documentation says: “Enter any environmental variable(s) required to run the user-defined script.”

One could imagine, based on the field name and the documentation, that if I type: “ORACLE_SID=mysid” in this field, and later run “echo $ORACLE_SID” in my script, the result would be “mysid”.

Wrong. What does happen is that $ORACLE_SID is empty. $1, on the other hand, is “{ORACLE_SID=mysid}”.

To get the environment variable I wanted, I had to do: tmp=(`echo $1 | tr ‘{}’ ‘  ’`); eval $tmp

It took me a while to figure this out as this behavior is not documented and I found no usage examples that refer to this issue.

Consistency between your product, the UI and the documentation is not foolish consistency. I expect the documentation and field descriptions to help me do my job and I’m annoyed when it doesn’t.

At least now this behavior is documented somewhere so future googlers may have easier time.

It’s the End of the World As We Know It (NoSQL Edition)

This post originally appeared over at Pythian. There are also some very smart comments over there that you shouldn’t miss, go take a look!

Everyone knows that seminal papers need a simple title and descriptive title. “A Relational Model for Large Shared Data Banks” for example. I think Michael Stonebraker overshot the target In a 2007 paper titled, “The End of an Architectural Era”.

Why is this The End? According to Michael Stonebraker “current RDBMS code lines, while attempting to be ‘one size fits all’ solution, in face, excel at nothing. Hence, they are 25 years old legacy code lines that should be retired in favor of a collection of ‘from scratch’ specialized engined”.

He makes his point by stating that traditional RDBM design is already being replaced for a variety of specialized solutions: Data-warehouses, streams processing, text and scientific databases. The only uses left for RDBMS is OLTP and hybrid systems.

The provocatively named paper is simply a description of a system, designed from scratch for modern OLTP requirements and the demonstration that this system gives better performance than traditional RDBMS on OLTP type load. The conclusion is that since RDBMS can’t even excel at OLTP – it must be destined for the garbage pile. I’ll ignore the fact that hybrid systems are far from extinct and look at the paper itself.

The paper starts with a short review of the design considerations behind traditional RDBMS, before proceeding to list the design considerations behind the new OLTP system, HStore.:

  1. The OLTP database should fit entirely in-memory. There should be no disk writes at all. Based on TPC-C size requirements this should be possible, if not now then within few years.
  2. The OLTP database should be single threaded – no concurrency at all. This should be possible since OLTP transactions are all sub-millisecond. In an memory-only system they should be even faster. This will remove the need for complex algorithms and data structures and will improve performance even more. Ad-hoc queries will not be allowed.
  3. It should be possible to add capacity to an OLTP system without any downtime. This means incremental expansion – it should be possible to grow the system by adding nodes transparently.
  4. The system should be highly available, with a peer-to-peer configuration – the OLTP load should be distributed across multiple machines and inter-machine replication should be used for availability. According to the paper, in such a system redo and undo logging becomes unnecessary. This paper references another paper that argues that rebuilding a failed node over the network is as efficient as recovering from redo log. Obviously, eliminating redo logs eliminates one of the worse OLTP bottlenecks where data is written to disk synchronously.
  5. No DBAs. Modern systems should be completely self tuning.

In other sections Stonebraker describes few more properties of the system:

  1. With persistent redo logs gone, and locks/latches gone, the overhead of JDBC interface is likely to be the next bottleneck. Therefore the application code should be in form of stored procedures inside the data store. The only command ran externally should be “execute transaction X”.
  2. Given that the DB will be distributed and replicated, and network latencies still take milliseconds, the two-phase commit protocol should be avoided
  3. There will be no ad-hoc queries. The entire workload will be specified in advance.
  4. SQL is an old legacy language with serious problems that were exposed by Chris Date two decades ago. Modern OLTP systems should be programmable in a modern light-weight language such as Ruby. Currently the system is queried with C++.

The requirements seem mostly reasonable and very modern – use replication as a method of high availability and scalabilty, avoid disks and their inherent latencies, avoid the complications of concurrency, avoid ad-hoc queries, avoid SQL and avoid annoying DBAs. If Stonebraker can deliver on his promise, if he can do all of the above without sacraficing the throughput and durability of the system, this sounds like a database we’ll all enjoy.
In the rest of the paper, the authors describe some special properties of OLTP work loads, and then explains how HStore utilizes the special properties to implement a very efficient distributed OLTP system. In the last part of the paper, the authors use HStore to run a TPC-C like benchmark and compare the results with an RDBMS.

Here are in very broad strokes the idea:
The paper explains in some detail how things are done, while I only describe what is done:

The system is distributed, with each object partitioned over the nodes. You can have specify how many copies of each row will be distributed, and this will provide high availability (if one node goes down you will have all the data available on other nodes).

Each node is single threaded. Once SQL query arrives at a node, it will be performed to the end without interruptions. There are no physical files. The data objects are stored as Btrees in memory, Btree block is sized to match L2 cache line.

The system will have a simple cost-based optimizer. It can be simple because OLTP queries are simple. If multi-way joins happen they always involve identifying a single tuple and then tuples to join to that record in a small number of 1-to-n joins. Group by and aggregation don’t happen in OLTP systems.

The query plans can either run completely in one of the nodes, can be decomposed to a set of independent transactions that can run completely in one node each, or require results to be communicated between nodes.

The way to make all this efficient is by using a “database designer” – Since the entire workload is known in advance, the database designer’s job is to make sure that most queries in the workload can run completely on a single node. It does this by smartly partitioning the tables, placing parts that are used together frequently on the same node and copying tables (or just specific columns) that are read-only all over the place.

Since there are at least two copies of each row and each table, there must be a way to consistently update them. Queries that can complete on a single node, can just be sent to all relevant nodes and we can be confident that they will all complete them with identical results. The only complication is that each node must wait a few milliseconds before running the latest transaction to allow for recieving prior transactions from other nodes. The order in which transactions run is identified by timestamps and node ids. This allows for identical order of execution on all nodes and is responsible for consistent results.

In case of transactions that span multiple sites and involve changes that affect other transactions (i.e. The order in which they execute in relation to other transactions matter), one way to achieve consistency could be locking the data sources for the duration of the transaction. The HStore uses another method – each worker node recieves its portion of the transaction from a coordinator. If there are no conflicting transactions with lower timestamps, the transaction runs and the worker sends the coordinator an “ok”, otherwise the worker aborts and notifies the coordinator. The transaction failed and its up to the application to recover from this. Of course, some undo should be used to rollback the successfull nodes.

The coordinator monitors the number of aborts and if there are too many unsuccessfull transactions, it starts waiting longer between the time a transaction arrives at a node until the node attempts to run it. If there are still too many failures, a more advanced strategy of aborting is used. In short, this is a very optimistic database where failure is prefered to locking.

I’ll skip the part where a modified TPC-C proves that HStore is much faster than a traditional RDBMS tuned for 3 days by an expert. We all know that all benchmarks are institutionalized cheating.

What do I think of this database?

  1. It may be too optimistic in its definition of OLTP. I’m not sure we are all there with the pre-defined workload. Especially since adding queries can require a complete rebuild of the data-store.
  2. I’m wondering how he plans to get a consistent image of the data stored there to another system to allow querying. ETL hooks are clearly required, but it is unclear how they can be implemented.
  3. Likewise, there is no clear solution on how to migrate existing datasets into this system.
  4. HStore seems to depend quite heavily on the assumption that networks never fail or slow down. Not a good assumption from my experience.
  5. If Stonebraker is right and most datasets can be partitioned in a way that allows SQL and DML to almost always run on a single node, this can be used to optimize OLTP systems on RAC.
  6. I like the idea of memory only systems. I like the idea of replication providing recoverability and allowing us to throw away the redo logs. I’m not sure we are there yet, but I want to be there.
  7. I also like the idea of a system allowing only stored procedures to run.
  8. I’m rather skeptical about systems without DBAs, I’ve yet to see any large system work without someone responsible for it to keep working.
  9. I’m even more skeptical about systems without redo logs and how they manage to still be atomic, durable and just plain reliable. Unfortunately this paper doesn’t explain how redo-less systems can be recovered. It references another paper as proof that it can be done.
  10. Stonebraker deserves credit for anticipating the NoSQL boom 3 years in advance. Especially the replication and memory-only components.

I hope that in the next few month I’ll add few more posts reviewing futuristic systems. I enjoy keeping in touch with industry trends and cutting-edge ideas.

Notes on Learning MySQL (as an Oracle DBA)

This post originally appeared over at Pythian. There are also some very smart comments over there that you shouldn’t miss, go take a look!

I spent some time last month getting up to speed on MySQL. One of the nice perks of working at Pythian is the ability to study during the workday. They could have easily said “You are an Oracle DBA, you don’t need to know MySQL. We have enough REAL MySQL experts”, but they didn’t, and I appreciate.

So how does an Oracle DBA goes about learning MySQL?
Obviously you start by reading the docs. Specifically, I looked for the MySQL equivalent of the famous Oracle “Concepts Guide”.
Unfortunately, it doesn’t exist. I couldn’t find any similar overview of the architecture and the ideas behind the database. The first chapter of “High Performance MySQL” had a high level architecture review, which was useful but being just one chapter in a book, it lacked many of the details I wanted to learn. Peter Zaitsev’s “InnoDB Architecture” presentation had the kind of information I needed – but covered just InnoDB.

Thats really too bad because I definitely feel the lack – which I can easily tell you what Oracle does when you connect to a database, run a select, an update, commit or rollback – I can’t say the same about MySQL. So far I managed without this knowledge, but I have a constant worry that this will come back and bite me later.

Lacking a concepts guide, I read the documentation I had access to: Sheeri has nice presentations available for Pythian employees (and probably customers too. I’m not sure if she ever released them to the whole world). The official documentation is not bad either – it covers syntax without obvious errors and serves as a decent “how do I do X?” guide.

But reading docs is only half the battle. The easier half too. So I installed MySQL 5.1 on my Ubuntu from ready packages. Then I installed MySQL 5.5 from the tarball – which was not nearly as much fun, but by the time this worked I know much more about where everything is located and the various ways one can mis-configure MySQL.

Once the installation was successfull, I played a bit with users, schemas and databases. MySQL is weird – Schemas are called databases, users have many-to-many relation with databases. If a user logs in from a differnet IP, it is almost like a different user. If you delete all the data files and restart MySQL – it will create new empty data files instead. You can easily start a new MySQL server on the same physical box by modifying one file and creating few directories.

MySQL docs make a very big deal about storage engines. There are only 2 things that are important to rememeber though: MyISAM is non-transactional and is used for mysql schema (the data dictionary), it doesn’t have foreign keys or row level locks. InnoDB is transactional, has row level locks and is used everywhere else.

There are a confusing bunch of tools for backing up MySQL. MySQLDump is the MySQL equivalent of Export. Except that it creates a file full of the SQL commands required to recreate the database. These files can grow huge very fast, but it is very easy to restore from them, restore any parts of the schema or even modifying the data or schema before restoring.
XTRABackup is a tool for consistent backups of InnoDB schema (remember that in MyISAM there are no transactions so consistent backups is rather meaningless). It is easy to use – one command to backup, two commands to restore. You can do PITR of sorts with it, and you can restore specific data files. It doesn’t try to manage the backup policies for you the way RMAN does – so cleaning old backups is your responsibility.

Replication is considered a basic skill, not an advanced skill like in the Oracle world. Indeed once you know how to restore from a backup, setting up replication is trivial. It took me about 2 hours to configure my first replication in MySQL. I think in Oracle Streams it took me few days, and that was on top of years of other Oracle experience.

Having access to experienced colleagues who are happy to spend time teaching a newbie is priceless. I already mentioned Sheeri’s docs. Chris Schneider volunteered around 2 hours of his time to introduce me to various important configuration parameters, innoDB secrets and replication tips and tricks. Raj Thukral helped me by providing step by step installation and replication guidance and helping debug my work. I’m so happy to work with such awesome folks.

To my shock and horror, at that point I felt like I was done. I learned almost everything important there was to know about MySQL. It took a month. As an Oracle DBA, after two years I still felt like a complete newbie, and even today there are many areas I wish I had better expertise. I’m sure it is partially because I don’t know how much I don’t know, but MySQL really is a rather simple DB – there is less to tweak, less to configure, fewer components, less tools to learn.

Jonathan Lewis once said that he was lucky to learn Oracle with version 6, because back then it was still relatively simple to learn, but the concepts didn’t change much since so what he learned back then is still relevant today. Maybe in 10 years I’ll be saying the same about MySQL.

Cloning Oracle Home from RAC to Stand-Alone

This post originally appeared over at Pythian. There are also some very smart comments over there that you shouldn’t miss, go take a look!

This should have been the easiest task on my todo list: Install Oracle EE standalone on a new Linux RHEL 5 server, later to be used as a standby for a production RAC system. This means 2 lines of “runinstall -silent …”, less than 5 minutes of DBA work and maybe 20 minutes of waiting. I did not expect to spend over 5 hours doing this.

Problems started when I discovered that I don’t have the patchset and another patch that exists on production and should be installed on the standby. I had to wait for my Metalink credentials to be approved for this customer CSI before I could download the patches for them.

“Why don’t you just clone the software from production?” asked a helpful colleague.

Sounds like a great suggestion. I cloned Oracle software before and it is a simple process: tar $ORACLE_HOME, copy the tar file to the new server, untar, run the cloning script which will register the new home with the inventory, and you are done!

In theory, at least.

Here is what actually happened:

  1. Tar, copy, untar, script
  2. Ran OPatch to verify that the new oracle home is in the inventory and that I see the correct version and patches.
  3. OPatch is showing two nodes. Oops. I didn’t realize oracle home has information about the cluster – didn’t Oracle move the inventory elsewhere? Spend an hour looking for the cause of this.
  4. Found that the two nodes are mentioned in  $ORACLE_HOME/inventory/ContentsXML/oraclehomeproperties.xml
  5. Removed this file.
  6. Deattached Oracle Home to clean inventory without deleting the software.
  7. Ran the clone script again
  8. Yay! OPatch looks good now.
  9. Decided to create test database  to be extra sure everything is fine
  10. NETCA failed with linking error. Spent an hour figuring out why. Cursed a bit.
  11. Had to install libc-devel, 32 bit version. Too bad RDA didn’t catch this.
  12. Created test database, but SQLPLUS now fails with linking error.  More cursing. Wondered what I did to deserve this.
  13. was missing so I had to install the 64 bit version of libaio. Too bad RDA was silent about this as well.
  14. Couldn’t start the database because the database couldn’t find the cluster. Why was it even looking for a cluster? Spent an hour figuring out why. Ah, because I copied the software from a RAC server and it was linked as RAC database.
  15. Relinked everything with RAC_OFF option.
  16. Finally things are working. Too bad it is 8pm already.

What I should have done: (I’m not sure if it is supported by Oracle, but at least it works)

  1. Double check that we have all RPMs.
  2. Tar, copy, untar
  3. remove $ORACLE_HOME/inventory/ContentsXML/oraclehomeproperties.xml
  4. run clone/bin/ ORACLE_HOME=/appl/oracle/product/10.2.0/db_1 ORACLE_HOME_NAME=OraDb10g_home1
  5. Relink as non-RAC:  make -f rac_off
  6. Verify with OPatch.
  7. Create test DB:
    netca /silent /responsefile ~/netca.rsp
    dbca -silent -createDatabase -templateName General_Purpose.dbc -gdbName TST -sid TST -SysPassword xxx -SystemPassword xxxx -emConfiguration NONE -datafileDestination /appl/oracle/oradata  -storageType FS -characterSet WE8ISO8859P1 -nationalcharacterSet AL16UTF16 -memoryPercentage 40
  8. Go for a nice afternoon ride.

I hope that I’m not the only DBA who always have to find the most difficult way to accomplish a task, and that this post will be useful to others. Perhaps the best piece of advice I can offer is to avoid this type of cloning in the first place.

BAAG, Best Practices and Multiple Choice Exams

(This post originally appeared at the Pythian blog)

I’ve been following the discussion in various MySQL blogs regarding the sort_buffer_size parameters. As an Oracle DBA, I don’t have an opinion on the subject, but the discussion did remind me of many discussions I’ve been involved in. What’s the best size for SDU? What is the right value for OPEN_CURSORS? How big should the shared pool be?

All are good questions. Many DBAs ask them hoping for a clear cut answer – Do this, don’t do that! Some experts recognize the need for a clear cut answer, and if they are responsible experts, they will give the answer that does the least harm.

Often the harmless answer is “Don’t touch anything, because if you have to ask this question you don’t have the experience to make the correct decision”. As Sheeri noted, it is a rather patronizing answer and it is stands in the way of those who truly want to learn and become experts.

But I can appreciate that it comes from long and bitter experience. Many users read random bits of information off the web and then rush to modify the production database without fully digesting the details. They end up tuning their database in a way no database should ever be tuned. Not even MySQL.

I used to think that users search for those “best practices” out of laziness, or maybe a lack of time. I used to laugh at the belief that there are best practices and clear answers, because if there were – we wouldn’t have a parameter. But now I think the problem is in the way most institutions evaluate intelligence, which affects the way many people approach any problem.

Even though all of us DBAs come from a wide variety of cultures, I’m willing to bet that every one of us had to work his way through a multiple choice test. If you ever took an Oracle certification exam, you know what I mean:

How do you find the name of the database server?
D) none of the above

You run into those in certification exams, job interviews and in slightly different variation when you try to get accepted to a university. You had to learn to succeed at those multiple choice tests at a very early age, or you would be labled “less intelligent”.

Yet those questions are absurd. In the question above, the answer could be A, but A would be wrong if my database is a RAC cluster. Besides a much better way would be to use /etc/oratab because there may be more than one DB on the machine.

But you can’t have a discussion with the exam author. You can’t ask for assumptions and clarifications  and you can’t even explain your assumptions in the test. What’s more, these tests also check for speed, so you don’t have much time to contemplate the options.

What these exams teach you is that every question has a single solution and one that is so obvious that once you see it, you recognize its rightness in less than 30 seconds. They also teach you that someone else knows the right answer (the person grading the test). So finding the right answer can be a matter of getting the “expert” to give you the one and obvious correct answer.

If our society teaches this point of view from a very young age, why are we surprised that DBAs keep looking for the one obvious answer?

I fully support the BAAG cause, and I believe that DBAs should be given a full explanation of the problem involved, the different choices that exist, their meaning, the trade-offs involved and in general give them the tools to make the right decision themselves time after time. But we should realize that undoing years of faulty teaching can take a while.

There is an even worse side effect to those multiple-choice-should-be-obvious tests. You may learn to never ask for clarifications. That asking for clarifications is “bad” or “wrong” in some way. In any real problem, asking for clarifications such as “why are you asking this?”, “what is the real issue you are trying to solve?” and  “How will you use this script?” is the most important part of finding a solution. It is not cheating – it is doing a professional job.

It is a scary thought that the very way we train and evaluate the future DBAs is something that will prevent them from doing a good job of being DBAs.

On the difficulties of Migrations – Especially to new Blogs

I haven’t posted here in a log while. That’s because I’ve been posting all my stories and ideas over at the Pythian blog.

I knew that migrations are one of the most difficult tasks in IT operations, but I did not realize this also applies to blogs. Yesterday, Alex helped me look at the blog statistics over at the Pythian blog and it turns out that over there I have about 10% of the readers that I had over here. While I’m just as brilliant in the Pythian blog as I was here, I guess that with all the old links, google ranks and people not changing their RSS subscriptions – blog locations have a lot more momentum than I suspected.

Anyway, to the 90% of my readers who apparently only read me at this address, in the next few days I’ll copy over the blog posts that I neglected to post here. I’ll try to post new articles here in the future, but they will always appear in the Pythian blog first, so you really should add my new address to whatever it is you use to follow blogs.

On the Difficulty of Data Migrations (Especially to NoSQL Databases)

(Originally posted at Pythian Blog)

I’ve been reading a lot of NoSQL blogs recently, and one thing that bothers me is that many of the leading NoSQL bloggers seem to have very different experience in operations that I’ve had.

Here’s an example:
Over at the O’Reilly community blogs, Andy Oram interviewed two MongoDB experts about migrating from a relational databases to MongoDB.

Here’s what the experts said:

” 1. Get to know MongoDB. Download it, read the tutorials, try some toy projects.
2. Think about how to represent your model in its document store.
3. Migrate the data from the database to MongoDB, probably simply by writing a bunch of SELECT * FROM statements against the database and then loading the data into your MongoDB model using the language of your choice.
4. Rewrite your application code to query MongoDB through statements such as insert() or find().

OK, so which step do you think takes the longest? And the answer is…step 2. Design is critical, and there are trade-offs that provide no simple answers but require a careful understanding of your application. Migrating the data and rewriting the application are straightforward by comparison. “

I’ve never migrated anything to MongoDB, but I was involved in the migration of a large application from SQLServer to Oracle. Both are relational databases so there was almost no need to rethink the data model. The rewrite and the migration took over two years, with significant bugs discovered and fixed up to the last week. The majority of the time spent on migration. None of it was done by “simply by writing a bunch of SELECT * FROM statements against the database”.

We did not lack expertise – we had plenty SQLServer and Oracle developers and DBAs with 10+ years of experience. Note that no one has 10 years of MongoDB experience.

I don’t doubt that modeling is critical and the trade-offs are always difficult, but I’ve yet to see a modeling phase that took more than rewrite + migration of large applications with big data. Note that large applications and big data are the target customers of NoSQL databases, so I’m not inventing irrelevant issues here.

I’ve experienced two major difficulties with migrations:
The first one is that you normally have large number of users, and you may be reluctant to migrate everyone to a new system at once. No matter how good your load testing skills are, you will still not be 100% certain your new system will have perfect performance under peak load. So you do phased migration. Start by moving 5% of the users, then another 15%, then another 30%, and then if everything goes well, you may migrate the rest.

Why is this a difficulty? First, the users may share data with users that have not yet migrated. There could be dependencies. You’ll need to figure these out and write temporary code to solve those that will be used only during the migration phase. But before that, you need to find a way to migrate specific parts of your data. This requires figuring out how to tear things apart carefully within and across tables. A mini modeling project in its own right. This complicates the “bunch of SELECT * FROM statements” quite a bit.

Oh, and the migration may fail. Spectacularly. At 3am. You now need to migrate all the users back. With the new data they inserted into the new DB. I hope you prepared a script in advance to do that.

And that is just the first difficulty. The second major problem is that you may have large amounts of data arriving at high rates. You could declare 3 days downtime to move all the data, but I can see some reasons not to do that.

The alternative is to move the data in increments. First select and copy all the data inserted until today at 8am. Once this is done, select and copy all the data inserted between 8am and now. Then all the data between the previous now and the now-now. All in ever shrinking deltas of data that will eventually converge to a point where you can switch the users over. This requires that all large tables will have timestamps, preferably indexed, hopefully partitioned. Even with timestamps it is not a trivial application to write, and it has to take care of dependencies – you can’t migrate comments on a document without migrating the document itself.

During the incremental migration and the data streaming phase, you have to support two systems with the same one operational group. The same operational group that now have to learn to support a new database and a lot of new code rewritten for it. Not impossible, but far from “straightforward”.

I always thought that the biggest misconception developers have about operations is the “just add a bunch of servers to solve the performance issue” myth. I can add “migration to a new system is straighforward” as another dangerous myth.

I’m not blaming them, they are architects and developers. Solving difficult operational problems is not their job. The “migration is straightforward” attitude is a problem only when you ask your developers to support your operations. Something that seems depressingly common when NoSQL databases arrive to operations. Operations have no NoSQL experience and management asks the developers to help out until the ops teams learn to support the new beast. Problem is that NoSQL developers without operations experience are likely to cause just as much damage as operations without NoSQL experience.