Midlands PASS July Meeting – July 10

The Midlands PASS Chapter will hold its next meeting on July 10. We meet at MicroStaff IT in Cayce, SC. Here is the main presentation:

Statistics, Indexes, and their Impact

Speaker: Brian Kelley, SQL Server MVP

Statistics. Indexes. Clustered Indexes. Non-Clustered Indexes. Covering Indexes. Bookmark Lookups. Perhaps you’ve heard these terms. They determine how well or poorly your queries run. In this session, we’ll look at what these things are, how they impact your queries, what to do to maximize their use, and when you should consider making changes.

This is a 100-200 level presentation.

You can RSVP for the meeting (it helps us plan for food) at the EventBrite Event Page for this meeting.

 

PASS Summit Session Selection

Let’s make it democratic. Let’s ensure we get solid sessions from key people. And let’s save a ton of work in the process.

Spotlight Sessions:

There are certain folks that are extremely knowledgeable in their areas of expertise. They also happen to be excellent presenters. Have the spotlight sessions and invite them to present a talk. Limit the number of these, obviously. However, this ensures top speakers are presenting.

Let the Community vote:

Since folks had to update their community profiles in order to participate in the voting, let’s go down that road, except for session selection. It doesn’t matter if it’s a pre-con, a regular session, or a lightning talk. Put the abstracts up, complete with who is giving them, and give the community a chance to vote on a particular number. Perhaps for each track you get to vote for your top 10.

Does this skew things in favor of those who are more popularly known? Yes. But it also means the community is seeing who they want. So what about those who don’t have as solid a reputation? Let them build a reputation via the following:

  • local user groups
  • SQL Saturdays
  • virtual chapters

That’s what’s effectively being done by having a speaker rating score, anyway.

Set the deadline, tabulate the votes, and then take the top presentations per track and schedule them.

What if there’s a tie? Use a random number generator to make the selection.

But what if there are too many tracks?

Limit the number of submissions. Perhaps:

  • 1 pre-con
  • 2 regular session talks
  • 2 lightning session talks

Speakers try to game the system today because they don’t know what the selection committee for the Summit or for a particular SQL Saturday will want. So they submit more sessions than they actually want to give. Limit the # of submissions. What about panels? If the panel discussion is that important to you, then it takes one of your slots. No apologies, because you know that ahead of time. This causes a speaker to focus on what topics he or she really want to speak on and think will go over with the audience.

Will there be issues?

Of course there will. But this is more transparent than having selection committees behind the scenes. PASS, after all, is a community organization. It also eliminates any board influence (and there has been board influence in the past). So let’s keep this simple.

But that’s not how XYZ Conference does it!

No, it’s not. But XYZ Conference is probably not run by a community organization. If it is, perhaps they should follow the same model.

What is an “operational” DBA?

On Facebook last night, I posted the following:

An operational DBA isn’t just a manager of a traditional RDBMS, transactional system. An operational DBA manages the data platform, whatever it is, when it hits production. Their goals are not traditionally the same as someone focused on development. They are looking to keep the system secure, performing well, and ensure it is recoverable in case of failure/disaster. This could be over a traditional system, a no-SQL solution, a warehousing solution built around star/snowflake schemas, or anything that has to do with data storage, management, and retrieval. If you think an operational DBA is a manager of just a transactional system, please update your definition. Operations has changed greatly in the last decade to keep pace with development and business.

The reason I posted this is because I saw some indications that there is a misunderstanding of what an operational DBA is nowadays. Not surprisingly, this misunderstanding came from folks who aren’t in an operational role. Here’s the gist of what an operational DBA is typically concerned with:

  • Production security
  • Production performance / reliability
  • Production recovery

I’m not ranking those in order of importance. The order of those three items depends on the environment. I’ve been in environments where security was more important performance / reliability or recovery (military). I’ve also been in environments where performance /reliability was king.

Note that I didn’t include a particular type of platform. That’s on purpose. I also didn’t indicate whether or not production was on premise or hosted. That’s also on purpose. If you’re still thinking that an operational DBA is only concerned with the care and feeding of an on-premise transactional SQL Server, Oracle, DB2, MySQL/MariaDB, etc. platform, it’s time to update your thinking. An operational DBA takes over when a system transfers from Dev/QA/UAT to production, when it becomes operational (hence the name). Platform, type, where it’s hosted, and those types of details are irrelevant if the system is about data management and it just rolled to production. While there are some operational DBAs who solely focus on transactional systems, that’s not true of all operational DBAs. And that’s not what we operational types mean when we say, “operations.”

“A good DBA is a lazy DBA”

I’m borrowing from Andy Leonard (blog | twitter) who says all the time, “Good engineers are lazy.”

If you’re thinking, “Why would I want (to be) a lazy DBA?” let me explain. There’s a lot to be said for hard work. However, have you ever seen someone who is always busy but seems to get very little done? Hard work, in and of itself, isn’t the goal. The goal is to get things done. This is where laziness comes in.

If I have to repeat a task, I should look at automating it. I don’t want to have to repeat those steps each time. I want to be lazy. For instance, as an operational person, there are a lot of things I need to review within my production environment on a periodic basis to ensure my systems are running as they should. Case in point: ensuring the SQL Server drives don’t run out of free space. This is something that I should monitor regularly (daily).  There are different ways I could handle this:

  • I could log on to each server and check each drive.
  • I could use a tool like TreeSize to hit the drives one by one.
  • I could automate the checks which results in a report in my inbox.

I prefer the last option. If I’m smart, I’ll even schedule it to run so it’s ready for me when I get in each morning. But why stop there? I could not only automate gathering the data, but also automate some of the analysis. Let’s say I set thresholds and the automation bubbles up anything crossing a threshold to the top of my report, meaning the most important data gets seen first. I don’t want to just be lazy, I want to be very lazy. By being this lazy and automating, I free up the one resource I can never get more of: time.

What can you automate? Anything and everything you can successfully automate frees up time for you to spend tackling other things. The more your organization sees and understands that you do, the more valuable you are. If you are in IT but don’t happen to be a DBA, this is still a solid approach. Let me generalize and say being a lazy IT pro is being a good IT pro.

 

Service Packs coming for SQL Server 2008/2008 R2

I’m not prophetic, I promise. However, some good news on the service pack front with regards to SQL Server 2008 and 2008 R2. There have been rumblings about a last service pack for these versions of SQL Server for a while, but nothing official had been said. However, an official announcement was made yesterday afternoon:

“We are planning to ship one last Service Pack for both SQL Server 2008 and SQL Server 2008 R2. Because of the maturity of SQL Server 2008 and 2008 R2, these Service Pack(s) will be an exception in terms of timing and will ship after mainstream support of these releases ends on July 8th 2014.”

The good news is that we will be seeing one final service pack for both SQL Server 2008 and SQL Server 2008 R2. If you’re not familiar with the support dates, you can see them at the Microsoft Product Lifecycle site which includes a search by product. For instance, you can see the support dates for all the versions of SQL Server from SQL Server 2000 on.

 

Do you apply SQL Server Cumulative Updates?

I think Steve Jones makes a great point here with respect to cumulative updates:

“This is one reason I’ve been hesitant to remain current with Cumulative Updates (CUs). Microsoft doesn’t stand behind them, with the text on each CU page that users should only apply the patch if they are experiencing specific problems. Otherwise users are told to wait for the next Service Pack, which seem to be coming less and less often.”

When you look at the fact that service packs for SQL Server (and most Microsoft products) have been few and far between, this presents a problem. There aren’t a lot of bug fixes for SQL Server specifically, but there are important ones likes ones to fix data corruption, inaccurate result sets, and an infinite loop condition against certain dynamic management views. However, if you consider applying a cumulative update, here’s the text Steve was referring to:

“A supported cumulative update package is now available from Microsoft. However, it is intended to correct only the problems that are described in this article. Apply it only to systems that are experiencing these specific problems. This cumulative update package may receive additional testing. Therefore, if you are not severely affected by any of these problems, we recommend that you wait for the next SQL Server 2008 R2 service pack that contains the hotfixes in this cumulative update package. “

When was the last time a service pack released for SQL Server 2008 R2 (just taking one supported version)? It was July 26, 2012. In other words, we’re approaching the two year point. Therefore, is it wise to wait for that service pack? According to Microsoft, you should unless you are “severely affected.” However, what is meant by “severely?” If I don’t get accurate result sets back because I implement a FULL JOIN with CROSS APPLY, that’s a problem. If I have data corruption because LOB data, that’s a problem. If I try to query what’s executing and lock up my SQL Server, that’s a problem. In my view, all three of those qualify as severely. Great, but will I get the kind of support I should? If I take that text at face value, it basically says, “Installer beware.” That’s a terrible position to be in as a customer.

Which leads me to the conclusion that either (a) Microsoft should step up support on the cumulative updates and reflect this in their language or (b) Microsoft should release service packs more regularly. I don’t foresee either happening in the near future, but as a customer, I believe it’s a reasonable request.

Minimize permissions for file locations

When we talk about security, we often point to the Point of Least Privilege. I write a lot about applying this to SQL Server, but it’s important to handle this outside of SQL Server, especially at the file / share level. Why would we care about this as DBAs / DB Pros? Here are a few good reasons:

Too Much Read Access:

By having these locations exposed with greater than required read access, it means folks can potentially get to the data and abuse it. You have an information disclosure / data exposure issue. It’s not just about trusting the individual. Anyone can fall prey to a malicious email which then deploys malware onto the system. From there the attacker hops, using the credentials of said user. Therefore, it’s important to lock down read access as much as possible.

Too Much Write Access:

This one is more of a concern with regards to ETL processes as there is typically minimal validation done on the files. For instance, do you perform an MD5 hash check on the files you might import into SQL Server? Therefore, if someone understands what’s in the files, it would be trivial to undermine the contents therein. This is why reconciliation checks and the like are so important. But even they can be beat.

Another problem, though, is that if someone has write access in both production and development, a write intended for development can happen in production. If it’s not caught, you could have a big problem with respect to that ETL process. Again, this isn’t a trust issue. All it takes is someone who has too much work to do (that describes just about everyone in IT), hasn’t had enough sleep lately (ditto), or someone who is distracted for some other reason.

Typically, when someone has write access, that person also has delete access. Therefore, if someone wanted to be malicious, you can imagine the kind of damage that could be done if they deleted the snapshot replication files or the backup files. The same is true for the ETL files, especially ones where there are some cycles turning to produce them or the files that are the result of manual processes (like where Excel spreadsheets are filled out and then imported).

The Bottom Line:

That’s why we have the Principle of Least Privilege. Applying it, regardless of your full trust in your personnel, is important. And as I described to someone recently, people change. Someone who is trustworthy but who is certainly underwater with respect to finances could consider committing an act that he or she would normally avoid. How fast can this happen? Divorce, unexpected medical bills, being ripped off, totaling a brand new car, etc., are all ways that a person’s finances can suddenly do a 180. Therefore, seek to lock down anywhere files are used as part of any of your processes.

Previous Older Entries

Follow

Get every new post delivered to your Inbox.

Join 3,833 other followers