But the perception is still there in part because I've not made my SQL job stats transparent enough. Stuff just works, and so outdated opinions never got challenged. The reality today is that the jobs that are causing us the most trouble aren't the ones where we're loading external vendor files, but the ones where some developer is doing internal data processing and they're doing it BADLY. It's the linked servers and the cursors and the UDFs and the over-architected single-object-inflate-then-commit patterns that threaten our scalability as our business grows. So I set out to prove it.
Job reports are available for everyone to see in SSMS, but you have to consciously go after it and interpret the results. That wasn't working for me, because as the DBA I'm the only one who was doing that regularly. It's only in my face, and my jobs run fine. So I developed the "SQL Server Job Report", which is a daily e-mail report to my whole dev team highlighting the following info:
- Job name
- Number of job runs in the past month
- Average duration during past month
- Success rate during past month
- Most recent run date
- Most recent run duration
- Most recent run status
- What's the average job duration? Is a job taking over an hour consistently? That's a threat when it fails and we have to re-run it during business hours.
- What's the success rate for a job? Jobs that aren't robust or well tested sap developer time. Do we have jobs that fail more than 25% of the time? Why?
Using this as empirical evidence, I was able to devote some time today to the job that was biggest offender. It happened to be a series of stored procs with linked servers, UDFs, and cursors that went against our largest tables and consistently took 3+ hours a night. With less than 1/2 a day's work, I was able to help the developer get the job down to 20 minutes! That's pretty satisfying. We'll see how much traction I can maintain on the other problem children...