Posted by Deepak Kumar | Posted in SQL BI/DW/SSRS, SQL DBA, SQL Dev | Posted on 21-03-2011
On any given production environment a SQL DBA must follow set of best practices to ensure smooth and highly optimized databases availability. Here are few common mistakes that I have identified and would recommend to fix if you have encountered or seeing anything similar.
1. Not preparing and validating a reliable backup and recovery policy with backup files retention period for databases as per agreed data loss and application downtime it may cause while recovering databases.
Very often some DBA’s rely heavily on adding a database maintenance plan only for regular backups and considering it done. I believe it’s not enough; a DBA must frequently check how long it takes to recover databases and should verify backup files for point in time recovery. Also, depending on database size it may take longer than expected time to do database recovery. Therefore, taking backups may not solve your requirements and it’s advised to use standby server with log shipping or database mirroring for quick database DR solutions.
2. Not knowing server hardware limits and utilization. It’s must to check System resource utilization & identifying bottlenecks on regular intervals for DISKIO, memory, CPU, Locks & Blocking etc counters. This information can give insight about how much server hardware resources are utilized/ available for current and near future workload generated by application(s). This information can help stabilizing the performance for a longer period, as well as knowing its limitations.
3. Not verifying SQL Server security, and not applying latest service pack/ version. It’s highly recommended to patch and apply bug fixes as soon as possible. An obsolete SQL server build/version is not recommended to keep running for production servers. Do check that you have carefully selected sysadmin role members and removed excessive permission from normal database users. It’s a general practice to enable only SQL option/settings that are required like – xp_cmdshell, CLR etc and make use of encryption & auditing to safeguard critical data.
4. Not defragging indexes or running Database consistency checks. Also not running regular maintenance activities like update stats on production databases. Or in some cases executing maintenance and other high resource consuming activities during production working hours.
5. Not checking how much free space is available in data or log files and shrinking databases again and again. Or setting ‘Auto Shrink’ database option to true is equivalent harmful.
6. Creating databases for Dev, Staging, UAT, Test, QA etc on Production SQL Server or Instance. This majorly causes manual/human mistakes by deleting/modifying objects or incorrectly assigning permissions.
7. Proactive Monitoring: Not setting SQL Alerts with email notification for Severity errors from 17 to 25; Disk free space check, CPU utilization, SQL/Agent/SSRS/SSAS services restart etc.
It’s highly advisable to set up automated alerts on SQL server for critical events. So that you are fully aware and feel secure that all events are reaching to you/team even though not sitting in front of computer screen. Going forward you may use Policy based management to capture events and streamlining various SQL instances in your environment.
8. Not using/applying High availability techniques (clustering, Mirroring, Log shipping) for critical databases. A DBA must identify business requirements and apply high availability techniques as per already defined SLA targets. It ensures timely availability of databases for applications in case of failure.
9. Running SQL profiler on production with no filters and end time. This is among most frequently happening mistakes. A DBA must understand that running profiler with no filters will capture all events and T-SQL queries thus generating extra workload on production. Its highly advised to use profiler with caution.
10. Setting database recovery option to ‘Full’ and never taking transaction log backups. I have seen cases where transaction log file size have increased 100 times to actual data due to backup negligence. Make sure you select database recovery model according to requirements and use backup policy accordingly. Also, carefully watch your transaction log backup jobs in logs hipping scenario to avoid large size database transaction log files.