A few basic notes on Oracle Database Administration.
Saturday, October 18, 2008
Lazy Log Writer - Vindication & thank you Fitaloon
I crawled out of bed, bleary eyed, signed in, and tried to start it. I got:
ALTER DATABASE MOUNT
Sat Oct 11 00:35:05 2008
ORA-00202: control file: '/u01/control02.ctl'
ORA-27037: unable to obtain file status
That woke me up. OK, Who/what messed with the control file?
I tried to look at the file and got a message that told me that not only was the control file inaccessable, the entire drive /u01 was inaccessable.
I called across the great divide to the company who controls the hardware and eventually got, "The backup battery for the write cache needs replacing in the SAN".
Monday morning I resent my email to management from a week and a half before. I highlighted my paraphrasing of Fitaloon's comment to my first posting on this subject, "Could our problem be caused by the write cache on the disk having some sort of problem? For example, could it be something as simple as failed backup batteries for the write cache?" Thank you, Fitaloon. You hit the nail on the head.
Since the replacement of the battery, our log writer has been zipping right along, good performance, at last!
Sunday, September 28, 2008
Delete vs. Truncate - graphically
I had not been expecting such a graphical re-enforcement of the rule. I had found the full table delete in some executing code and was curious how much redo it was generating. I ran René Nyffenegger's script 'how_much_redo' .
Assuming that it is giving accurate results, here is what I got for the 3 operations.
SQL> INSERT INTO T1_DEL (SELECT * FROM T1);
1161874 rows created.
SQL> exec how_much_redo;
New Redo Bytes Written: 408985600 (390 MB of redo TO INSERT)
SQL> delete from T1_DEL;
1161874 rows deleted.
Elapsed: 00:06:30.90 (6 MINUTES TO DELETE)
New Redo Bytes Written: 661496320 (630 MB of redo TO DELETE)
SQL> truncate table T1;
Table truncated.
Elapsed: 00:00:06.80 (6 SECONDS TO TRUNCATE)
New Redo Bytes Written: 815616 (less than 1Mb of redo to TRUNCATE)
Saturday, September 27, 2008
Lazy Log Writer - The non Resolution
1. Management asked that the entire database be moved to the NAS disks because they initially seemed faster. I had to let them know, that, no, the writes there are not faster.
2. The efforts to cross the great divide and get the storage manager in the other company to actually look at his configuration resulted in, "Since the writes are slow on two different pieces of our machinery, it can't be our fault - it must be oracle. I'm debugging nothing."
3. The other company informed us, "By the way, we will be installing new hardware in a couple of weeks at the same time that you are making a major application upgrade." How does that sound for a prospect of a smooth transition? This was followed by an email, "Claudia, are you working with the other company on this?" Not only am I not working with them, I never heard of it!
4. Management informed me, "Since there will be a hardware change soon, don't bother to follow up on this problem."
So they will install the hardware, we will deploy the new application version, and there will be storm and drama about the excessive waits for redo log writes. I think that I should change my name to Cassandra.
Thursday, September 25, 2008
A lazy log writer
What was going on at work? Not much - as in not what should have been. I like looking at the performance monitor on Enterprise Manager for a quick glance at what is going on. It wasn't a pretty picture.
Clicking on the 'Blocking Sessions' tab I saw that that the log writer session was blocking various other sessions.
I went into the alert log and was pointed to a log writer trace file. Inside the trace file I found
*** 2008-09-25 15:28:24.239
Warning: log write time 15460ms, size 6999KB
*** 2008-09-25 15:28:24.836
Warning: log write time 590ms, size 6898KB
*** 2008-09-25 15:28:29.852
Warning: log write time 5020ms, size 6999KB
I looked at metalink and got
" The above warning messages has been introduced in 10.2.0.4 patchset. This warning message will be generated only if the log write time is more than 500 ms and it will be written to the lgwr trace file .
"These messages are very much expected in 10.2.0.4 database in case the log write is more than 500 ms. This is a warning which means that the write process is not as fast as it intented to be . So probably you need to check if the disk is slow or not or for any potential OS causes. "
We just upgraded to 10.2.0.4. Our storage is across the great divide at another company. We are often short of answers other than, "Everything is configured correctly". With quite a bit of work we have gotten a pair of LUNS allocated for redo logs.
As a test, I moved the redo logs from the SAN to a NSF drive - NOT one that should be allocated to redo. Here was the immediate result:
The log writer waits stopped. Compliments from management. A request from management to storage management to move the entire database to this kind of storage, everyone is happy
almost.
Tonight I looked at the trace file:
*** 2008-09-25 22:53:34.154
Warning: log write time 750ms, size 0KB
*** 2008-09-25 22:53:35.943
Warning: log write time 1770ms, size 28KB
*** 2008-09-25 22:53:39.889
Warning: log write time 940ms, size 0KB
Log writer is taking forever, and it isn't even doing anything!
To be continued..... and detective suggestions welcome!
Monday, September 22, 2008
OOW - Day 2 - Simple things
I then went to a hands on on Oracle Advanced Compression in 11g. The class really was just working through a couple of OBE examples with the comfort of people who actually know the a LOT about Oracle Compression there to help with the silly little problems that people have working through examples. I wish I had had some sort of deep questions to ask about compression. All I learned was, it works. There is some time, and some space savings. It will be useful when we move up to 11g.
When I say that the Advanced Compression was 'Hands On', I mean it literally. Braille would have been more accurate. For some reason, the whole demo was available in a box slightly larger than a youtube video. I get annoyed with myself when I have difficulties with the parts that I think should be the simple parts.. Here were a whole group of Oracle employee experts who couldn't make the VM window expand to the whole screen. There was more compression in the demo than anyone intended!
Sunday, September 21, 2008
OOW - Day 1 - Blogger's perks.
I happened to be seated next to Fuad Arshad, who apparently does work with Oracle 11g and is ready for the exam. I wish him all the best on the exam.
Fuad's conference badge had a large 'Blogger' written in bright orange on the case. I know that I signed up for one. I asked at the registration desk, and they gave me a 'Blogger' badge too. I lined up for the Keynote Address. Immediately, the ushers pushed me out of the main line and into a smaller line for press & bloggers. I was led to a special area of the auditiorium saved for bloggers - a long table set up so that we could record every golden world of Mary Matalin & James Carville. For Pete's sake! I think this 'Blogger' badge is going to have some strange side effects.
Then this evening I was able to get together with all of the other Oracle Bloggers who I have been following for a year. That was really special. What a nice bunch of people!
Thursday, August 28, 2008
travelling trace file
We had an error logged in the alert log. It informed me that details were in
db_ckpt_1234.log. Fine, but there was no such log in the bdump directory.
The dba who was here before me wrote a nice script to run every Sunday to clean up old trace files. What he didn't think of is that the trace files seem to be written per session, and that the checkpoint 'session' continues from instance start up.
His script came along and happily moved the trace file to a backup_bdump directory. Oracle couldn't find it and didn't write the details.
One more fix to make.
Wednesday, August 20, 2008
NoCOUG conference 8/21/08
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2008 NoCOUG. All rights reserved. |
Thursday, July 31, 2008
Make it go away! (rman delete)
I am doing perfectly lovely incremental compressed backupsets of my database. Then one day, playing with EM, I in experimented with the 'Oracle recommended backup stategy'. I may have missed something in my set up, but I ended up with an un needed image copy backup of my database. Help! This thing is devouring my flashback area. So what is the proper way to remove it.
First step, of course, is to kill the Oracle job, so it doesn't do that again!
Then in rman:
RMAN> list copy;
gives me a nice long list - one line of which was:
List of Datafile Copies
Key File S Completion Time Ckp SCN Ckp Time Name
30428 4 A 29-JUL-08 224478 29-JUL-08 /bkup/flashbk/datafile/o1_mf_users_44y8x123_.dbf
I double check on the file system:
$> ls o1_mf_users_48y8x123_.dbf
o1_mf_users_44y8x123_.dbf
Yes, the copy is there.
To delete it in RMAN I enter
RMAN> delete datafilecopy 30428;
and get:
List of Datafile Copies
Key File S Completion Time Ckp SCN Ckp Time Name
30428 4 A 29-JUL-08 224478 29-JUL-08 /bkup/flashbk/datafile/o1_mf_users_44y8x123_.dbf
Do you really want to delete the above objects (enter YES or NO)? y
deleted datafile copy
datafile copy filename=/bkup/flashbk//datafile/o1_mf_users_44y8x123_.dbf recid=30428 stamp=1334123
Deleted 1 objects
I check in the file system and this time my ls returns
ls: 0653-341 The file o1_mf_users_44y8x123_.dbf does not exist.
Good bye, nusance.
Thursday, July 10, 2008
Something is VERY busy
This is a solaris box. I read that there is a 2G file limit on Solaris. (Maybe that means a limit on log files. Obviously, my data files are much larger than that.).
I issued the lsnrctl command:
lsnrctl >set log_file listener2.log
Quickly it resumed recording connections, now into listener2.log. It recorded 2 of them EVERY second!!! OK. my suspicion was right. Now to track down exactly WHAT is connecting and disconnecting so much.
Sunday, July 6, 2008
I'm beginning to distrust Oracle
Now I am learning that ASMM has been allocating our shared pool too large, and has been adjusting sizes not once an hour, but once a minute.
I am not happy with the statistics that the 'wonderful' automatic GATHER_STATS_JOB has been collecting. Also the job has been causing contention with some of our production jobs.
The one that seems really strange to me is Log_buffer. I allocated 28M to log_buffer. According to Oracle, LGWR should be writing, at a maximum, when log_buffer is 1M full. Instead, I see that it is writing 2M at a time. I am going to have to figure that one out.
I expect that when Oracle offers to automate these processes, that this is their own product, so the calculations will be well done. I expected, but I am wrong.
Thursday, June 19, 2008
Q.E.D.
DBA, "You are the only user on the DB, but you have a dozen sessions open. Can you close some sessions?"
User, "I can't see my sessions through the application. Bounce the database"
DBA, " I can see the sessions just fine. I'll kill your excess sessions."
User, "No, don't kill my sessions. Bounce the database. I'm bouncing the database."
We don't need no stinking controls around here.
Everyone can do everything.
He bounced the database.
User, "I bounced the database. My [one] session is running just fine."
Q.E.D. - bouncing the database improves performance.
==========================================================
An update to the post above:
The same user has informed me that I should always shutdown the database with "Shutdown abort" . "It works much better."
I have always assumed that everyone else knows more than I do.
Maybe that isn't true. Maybe I do understand more than some people...
Monday, June 16, 2008
Lessons Learned after a Hell Weekend
- A necessary part of any database project plan is a fall back plan. What is planned if there is a failure at any particular step.
- Backup - this includes enough space allocated for additional backups as needed.
- Part of any upgrade script should be rollback scripts.
I should have know that I was in trouble when on Friday afternoon when I was given a time line which was called 'optimistic' with no what-if forseen.
Wednesday, May 14, 2008
Northern California Oracle Users' Group Conference
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2008 NoCOUG. All rights reserved. |