Wednesday, June 23, 2010

Dad to the third power

Welcome to the world Miss Alison. We couldn't possibly love you more.
You are an answer to our deepest prayers, and a blessing to our family.
We cannot wait to bring you home.

Love, Dad3

Alison Rose, 6/22/10. 7lbs, 11oz. All girl!
Posted by Picasa

Monday, June 21, 2010

Friday, June 18, 2010

SQL Server Job performance

I've been fighting a mis-perception at work that ETL (data transformation and loads) is one of our company's threats to database scalability.  It certainly was when I started here a few years ago, but I've worked hard to change all that.  At the risk of sounding like I'm tooting my own horn, my ETL processes are really well optimized.  I'm cranking through 50+ different large file loads a night in well under an hour.  As far as I'm concerned, I've got each individual ETL process smokin' fast, while occasionally making conscious trade-offs between performance and maintainability.

But the perception is still there in part because I've not made my SQL job stats transparent enough.  Stuff just works, and so outdated opinions never got challenged.  The reality today is that the jobs that are causing us the most trouble aren't the ones where we're loading external vendor files, but the ones where some developer is doing internal data processing and they're doing it BADLY.  It's the linked servers and the cursors and the UDFs and the over-architected single-object-inflate-then-commit patterns that threaten our scalability as our business grows.  So I set out to prove it.

Job reports are available for everyone to see in SSMS, but you have to consciously go after it and interpret the results.  That wasn't working for me, because as the DBA I'm the only one who was doing that regularly.  It's only in my face, and my jobs run fine.  So I developed the "SQL Server Job Report", which is a daily e-mail report to my whole dev team highlighting the following info:
  • Job name
  • Number of job runs in the past month
  • Average duration during past month
  • Success rate during past month
  • Most recent run date
  • Most recent run duration
  • Most recent run status
While everyone has been concerned with what happened during last night's runs, I wanted to put the spotlight on two other key metrics:
  1. What's the average job duration?  Is a job taking over an hour consistently?  That's a threat when it fails and we have to re-run it during business hours.
  2. What's the success rate for a job?  Jobs that aren't robust or well tested sap developer time.  Do we have jobs that fail more than 25% of the time?  Why?
This did the trick!  I got the responses I needed which were, "I didn't know that job was failing that often!" and the "I didn't know that job always took that long!".  Progress...

Using this as empirical evidence, I was able to devote some time today to the job that was biggest offender.  It happened to be a series of stored procs with linked servers, UDFs, and cursors that went against our largest tables and consistently took 3+ hours a night.  With less than 1/2 a day's work, I was able to help the developer get the job down to 20 minutes!  That's pretty satisfying.  We'll see how much traction I can maintain on the other problem children...

Friday, June 11, 2010

Goodbye Gallbladder, Hello Home

Goodbye gallbladder,
goodbye pain
and those midnight wake-ups
that made me insane
A sneak attack
no time to react
and now staples from my toes to my brain

Hello home
hello future
with my two little boys
and a new baby girl to nurture
least not my wife
faithful through the strife
and my constant comfort through all the torture

By His grace
I've gone from gallbladder
to home