DBPedias

Your Database Knowledge Community

John Piwowar

  1. NOW they’re fixed…

    A few of our test EBS instances had developed a well-deserved reputation of taking a long time to start up. Here’s what we saw:

    1. Database ground to a near-halt shortly after running adstrtal.sh
    2. Most concurrent manager processes were being marked as “dead” in the internal manager log not long after startup, and the internal manager attempted to restart them.
    3. Consequence of #2: Three to four times as many FNDLIBR processes running on the app tier server as expected.
    4. Consequence of #2 and #3: The “Active sessions” graph in Grid Control resembled Mt. Kilimanjaro, and all three database server load average numbers (1, 5, and 15-minute) on were over 100.

    Needless to say, neither users (who could not connect to the test instance), developers (who could not connect to the Apps database), nor DBAs (who had to answer endless “when will the instance be up?” emails) were very happy.

    The culprit? This little query, executed for each FNDLIBR process as it started up, generating a ridiculous number of “control file sequential read” waits along the way:

    SELECT count(*)
      FROM v$thread;
    

    When I took a closer look at the query, this is what I found:

    SQL> select /*+ gather_plan_statistics */ count(*)
      2  from v$thread;
    
      COUNT(*)
    ----------
             2                                                                                                              
    
    Elapsed: 00:00:11.17
    SQL>  select *
      2  from table(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'ALLSTATS LAST'));
    
    PLAN_TABLE_OUTPUT
    ------------------------------------------------------------------------------------------------------------------------
    SQL_ID  057v054v2svhp, child number 0
    -------------------------------------
    select /*+ gather_plan_statistics */ count(*) from v$thread                                                             
    
    Plan hash value: 3150894624                                                                                             
    
    ------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                 | Name            | Starts | E-Rows | A-Rows |   A-Time   |  OMem |  1Mem | Used-Mem |
    ------------------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT          |                 |      1 |        |      1 |00:00:11.16 |       |       |          |
    |   1 |  SORT AGGREGATE           |                 |      1 |      1 |      1 |00:00:11.16 |       |       |          |
    |   2 |   NESTED LOOPS            |                 |      1 |      1 |      2 |00:00:00.37 |       |       |          |
    |   3 |    MERGE JOIN CARTESIAN   |                 |      1 |      1 |   3969 |00:00:00.17 |       |       |          |
    |*  4 |     FIXED TABLE FULL      | X$KCCTIR        |      1 |      1 |     63 |00:00:00.09 |       |       |          |
    |   5 |     BUFFER SORT           |                 |     63 |      1 |   3969 |00:00:00.08 | 73728 | 73728 |          |
    |   6 |      FIXED TABLE FULL     | X$KCCCP         |      1 |      1 |     63 |00:00:00.05 |       |       |          |
    |*  7 |    FIXED TABLE FIXED INDEX| X$KCCRT (ind:1) |   3969 |      1 |      2 |00:00:11.00 |       |       |          |
    ------------------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------                                                                     
    
       4 - filter("TR"."INST_ID"=USERENV('INSTANCE'))
       7 - filter(("RT"."INST_ID"=USERENV('INSTANCE') AND "RTNLF"<>0 AND "TIRNUM"="RTNUM" AND "CPTNO"="RTNUM"))
    

    That’s a lot of work to do for 2 rows. Clearly the internal manager was expecting a faster start time from its children, which explains why it kept attempting to start new ones. Repeatedly. Until we had over 200 FNDLIBR processes running instead of our expected 80-ish.

    After gathering fixed object statistics, the query behaved a bit better, and we therefore expect that our “slow to awaken” instance should be a bit more speedy:

    SQL> exec dbms_stats.gather_fixed_objects_stats
    
    PL/SQL procedure successfully completed.
    
    Elapsed: 00:03:11.45
    SQL> select /*+ gather_plan_statistics */ count(*) from v$thread
      2  ;
    
      COUNT(*)
    ----------
             2                                                                                                              
    
    Elapsed: 00:00:00.03
    SQL>  select *
      2  from table(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'ALLSTATS LAST'));
    
    PLAN_TABLE_OUTPUT
    ------------------------------------------------------------------------------------------------------------------------
    SQL_ID  f1pmbmcstp1rj, child number 0
    -------------------------------------
    select /*+ gather_plan_statistics */ count(*) from v$thread                                                             
    
    Plan hash value: 93051267                                                                                               
    
    -----------------------------------------------------------------------------------------------
    | Id  | Operation                  | Name             | Starts | E-Rows | A-Rows |   A-Time   |
    -----------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT           |                  |      1 |        |      1 |00:00:00.02 |
    |   1 |  SORT AGGREGATE            |                  |      1 |      1 |      1 |00:00:00.02 |
    |   2 |   NESTED LOOPS             |                  |      1 |      2 |      2 |00:00:00.02 |
    |   3 |    NESTED LOOPS            |                  |      1 |      2 |      2 |00:00:00.01 |
    |   4 |     FIXED TABLE FULL       | X$KCCRT          |      1 |      2 |      2 |00:00:00.01 |
    |   5 |     FIXED TABLE FIXED INDEX| X$KCCCP (ind:1)  |      2 |      1 |      2 |00:00:00.01 |
    |   6 |    FIXED TABLE FIXED INDEX | X$KCCTIR (ind:1) |      2 |      1 |      2 |00:00:00.02 |
    -----------------------------------------------------------------------------------------------
    
  2. Logging for non-existent listeners

    While working on an 11gR1 database server today, I fat-fingered the name of a secondary listener (you know, practicing my stock-trading skills), and in the process I noticed something that I hadn't considered before. Here's the setup:

    [oracle@11gr1srv ~]$ cd $DIAG_DIR/tnslsnr/11gr1srv
    [oracle@11gr1srv 11gr1srv]$ ls
    listener  listener_old
    [oracle@11gr1srv 11gr1srv]$ lsnrctl start LISTENER_ODL
    
    LSNRCTL for Linux: Version 11.1.0.7.0 - Production on 16-MAY-2010 13:09:10
    
    Copyright (c) 1991, 2008, Oracle.  All rights reserved.
    
    Starting /opt/oracle/app/oracle/product/11.1.0/dbhome_1/bin/tnslsnr: please wait...
    
    TNSLSNR for Linux: Version 11.1.0.7.0 - Production
    System parameter file is /opt/oracle/app/oracle/product/11.1.0/dbhome_1/network/admin/listener.ora
    Log messages written to /opt/oracle/app/oracle/diag/tnslsnr/11gr1srv/listener_odl/alert/log.xml
    TNS-01151: Missing listener name, LISTENER_ODL, in LISTENER.ORA
    
    Listener failed to start. See the error message(s) above...

    It looks like there are more consequences than brief embarrassment for that typo, though. My failed attempt to start the listener created a full Automatic Diagnostic Repository (ADR) directory structure for the non-existent listener:

    [oracle@11gr1srv 11gr1srv]$ ls
    listener  listener_odl  listener_old
    [oracle@11gr1srv 11gr1srv]$ ls -R listener_odl
    listener_odl:
    alert  cdump  incident  incpkg  lck  metadata  stage  sweep  trace
    
    listener_odl/alert:
    log.xml
    
    listener_odl/cdump:
    
    listener_odl/incident:
    
    listener_odl/incpkg:
    
    listener_odl/lck:
    AM_1096102193_3488045378.lck  AM_1744845641_3861997533.lck
    AM_1096102262_3454819329.lck  AM_3216668543_3129272988.lck
    
    listener_odl/metadata:
    ADR_CONTROL.ams       INC_METER_IMPT_DEF.ams
    ADR_INVALIDATION.ams  INC_METER_PK_IMPTS.ams
    
    listener_odl/stage:
    
    listener_odl/sweep:
    
    listener_odl/trace:
    listener_odl.log

    I looked through the Net Services docs, but wasn't able to find a way to disable this behavior, and my Google-fu failed me as well. This only seems to be happening for the 'start' command; other lsnrctl commands (reload, status, stop, etc) just return the expected "TNS-01101: Could not find service name" response.

    This sort of thing happens in 10g, too (and probably in older versions, I'm just too lazy to fire up my 9i test system), but the impact is much smaller: a single log file, not all of the ADR stuff that comes with 11g. Even in 11g, it's not a big deal; this just generates some light housecleaning work. At the extreme, I suppose it could be possible to launch the lamest DOS attack ever by chewing up lots of inodes, but there have to be more entertaining (and faster) ways to do that.

  3. Resolving ORA-4023 during a 10gR2->11gR1 upgrade

    Here's one for a hypothetical frustrated Googler. I just had something goofy happen to me while testing preparations for a database upgrade from 10gR2 to 11gR1. I'd already been through the process a few times on this server, so I wasn't expecting any problems. Since production upgrades usually happen during time windows that guarantee reduced mental capacity, however, I'd decided to perform one last dry run to verify my documentation and to test my "upgrade day" checklist. I cleaned out the database files and related configurations from the previous successful upgrade, and started over from step 0.

    Everything was going as expected until I actually ran DBUA. The first step after selecting the database to upgrade, "Gathering database information," was failing with the message:

    ORA-04023: Object SYS.STANDARD could not be validated or authorized

    At first, I thought I might have fallen prey to the conditions described in My Oracle Support Note 729909.1: Upgrading to 11.1.0 and DBUA reports ORA-4023 On SYS.STANDARD, but the diagnostic steps in the note did not match what I was seeing. Next stop: the DBUA logs in $ORACLE_BASE/cfgtoollogs/dbua/logs. The answer became obvious when I saw the following lines in the trace.log from the failed dbua session (some lines omitted for brevity):

    [main] [2:35:45:170] [DatabasesPage.initializePage:351]  Loading databases instance found in OraTab/Registry
    [main] [2:35:45:170] [OsUtilsUnix.enumerateSIDs:350]  checking sid: xxxx
    [main] [2:35:45:179] [Database.getStepSQLInterface:690]  OH in database: /opt/oracle/app/oracle/product/11.1.0/dbhome_1
    [main] [2:35:45:186] [Database.getStepSQLInterface:718]  sqlplus created with home:=/opt/oracle/app/oracle/product/11.1.0/dbhome_1 and sid:=xxxx
    [Thread-4] [2:35:49:44] [CompManager.setSelectedDatabase:1339]  setSelectedDatabase::oracleHome=/opt/oracle/app/oracle/product/11.1.0/dbhome_1 sid=xxxx
    [Thread-4] [2:35:49:45] [CompManager.setSelectedDatabase:1424]  Old home=/opt/oracle/app/oracle/product/11.1.0/dbhome_1
    [Thread-4] [2:35:49:47] [CompManager.setSelectedDatabase:1425]  New home=/opt/oracle/app/oracle/product/11.1.0/dbhome_1

    Sure enough, when I checked the same log file from a previous successful upgrade, the corresponding lines in the trace.log file all referred to the old (10.2.0) ORACLE_HOME. Despite all my careful cleanup, I had somehow forgotten to reset the value of the ORACLE_HOME for the database in /etc/oratab:

    [oracle@testsrv logs]$ grep product /etc/oratab
    xxxx:/opt/oracle/app/oracle/product/11.1.0/dbhome_1:N
    

    To make matters worse, I had also failed to notice that the ORACLE_HOME value was wrong in the DBUA interface. This, I suppose, is why we test. :-P

    After setting the ORACLE_HOME in /etc/oratab back to the 10gR2 value, I was able to complete the upgrade successfully.

  4. Friday mumblings: VanOUG, 11gR2, and EBS

    At the first meeting of the newly-reconstituted Vancouver Oracle Users Group this past week, we were treated to three great presentations by Caleb Small and Dan Morgan. They've made the content of their presentations available on the VanOUG web site (these links go to PDFs, if that sort of thing bugs you, consider yourself warned):

    I'm not going to go into a full recap of the presentations, but they were all full of really cool information. This post is an attempt to collect some of my mental notes, mostly cast in the context of one of my favorite topics, Oracle Applications.

    11gR2 HA Best Practices

    Caleb's presentation was very thorough and well-constructed. Dan gave him grief for boring the audience, but I think there was just so much new content to absorb that people were too busy processing to ask many questions on the fly. ;) Here are some one-liners from my notebook (anything that looks like an opinion is my commentary/interpretation, not Caleb's):

    • Lots more "moving parts" in 11gR2 Grid Infrastructure, clear "separation of duties" across three privileged OS accounts.
    • Service startup order is a little different now
    • Cluster status utilities show a lot more information, but need to learn to not rely upon crsstat as much
    • Proper networking configuration of 11gR2 GI not for the faint of heart. ;-)
    • Increased memory requirements will make this tougher to virtualize; I'm going to need a bigger laptop.
    • ACFS looks interesting; I wonder if it will be a valid option for an (shared application tier filesystem) for Oracle Applications. (Turns out the answer is "not currently planned," based on this exchange I had w/ Steven Chan on his blog later in the week).

    11g New Features

    There's a lot of really neat stuff going on in this presentation. I'd like to call out small nugget that, while far from the most important, is still pretty interesting on the surface: "deferred segment creation." When a table is created, no extents are actually allocated until rows are inserted. Seems like an odd feature, but one touted benefit is for large ERP systems like SAP and Oracle Applications, where lots of tables are created that may never be used, depending on what products are implemented. Those thousands of initial extents can certainly add up to real storage, and a more cluttered data dictionary. I can't speak to SAP implementations, but I don't see it as a huge win for EBS customers, given that:

    1. This feature is available only when tables are created, which means the benefit will only really be available when Oracle starts shipping Oracle Applications install media with an 11gR2 database. Anyone upgrading to the 11gR2 database will still be stuck with those empty extents.
    2. Given the overall footprint of an EBS database, the storage savings isn't such a big deal. For example, here's the potential savings from eliminating "empty" tables from an R12 Vision database:
        SYSTEM@R12VIS(11.1.0.7)>select sum(bytes)/1024/1024 potential_savings
      2  from dba_segments s
      3  where exists (select table_name
      4                  from dba_tables
      5                 where num_rows = 0
      6                   and table_name = s.segment_name
      7               )
      8  /
      
      POTENTIAL_SAVINGS
      -----------------
      3850.36719
      

      3.5(ish) GB out of 200GB is okay, I guess, but not a huge deal for a system that's only going to keep growing. FWIW, I'm going to wave my hands and pretend that the fact that a Vision database has way more populated tables than a "fresh-install" EBS database is balanced by the fact that my quick query doesn't account for the possibility that table stats are stale and some of those tables are actually populated. ;-)

    Of course, it's possible that I'm missing the point. It wouldn't be the first time! Maybe it really comes down more to a less-cluttered data dictionary. I mean, it can't be about tablespace fragmentation, since we're not supposed to care about that anymore, right?

    Edition Based Redefinition (EBR)

    This seemed like an interesting feature when I first heard about it last autumn, but I'll confess that I didn't quite comprehend the power of EBR until seeing Dan's demo (parts 1, 2, and 3 are on his Morgan's Library site, with part 4 still in the works). Setting aside the obvious benefits for home-grown applications, the potential benefits in an Oracle Applications environment are huge. Consider:

    1. There's already an option to create a staged Applications System to shorten patch downtime windows, allowing administrators to run the "copy" and "generate" portions of large EBS patches prior to applying the patch to production. With EBR, it could be possible to stage the "database" portion of a patch as well, and switch to a new default edition at patch time. You'd probably still want to do the database staging at a quiet time in the database, of course, but daring souls could accomplish "almost-no-downtime" patching if EBR were worked into the Oracle Applications patching framework. Wicked.
    2. EBR might even make it possible to truly have EBS patches that could be rolled back. The current patching process already backs up files that are replaced. Thoughtful application of cross-edition triggers might make it possible to revert to a previous edition without loss of data if a patch needs to be backed out. Granted, the process would have to be demonstrated to be pretty bullet-proof before I'd try it in production, but it could save restoring test and dev systems from backup in the event that a patch doesn't work out as expected.

    Just as I might be missing the point about deferred segment creation's advantages in EBS, I might be guilty of over-extended enthusiasm with respect to edition based redefinition. Or maybe I've decided to turn this into a science fiction blog. ;-) It's sure to be far more complicated to implement EBR in an Oracle Applications context than I'm implying above, and this is only speculation on my part, not anything that's actually promised by Oracle. Still, a nerd can dream...

    Thanks again to Caleb and Dan for the great presentations, and for your continued support in getting the user group launched!

  5. 32-bit to 64-bit database migration tips: OLAP upgrade

    A while ago, I had the opportunity to migrate an E-Business Suite database (Apps version 12.0.4, database version 10.2.0.4) from 32-bit Linux to 64-bit Linux. It's a straightforward process, outlined in My Oracle Support Note 471566.1: Migrating Oracle E-Business Suite R12 from Linux 32-bit to Linux 64-bit. Performing one of the critical migration steps, upgrading OLAP analytical workspaces (AWs), requires some careful reading, starting with the primary migration document for the database tier: Note 456197.1: Using Oracle E-Business Suite Release 12 with a Database Tier Only Platform on Oracle 10g Release 2. This document directs you to Note 352306.1: Upgrading OLAP from 32 to 64 bits, which covers the migration process for OLAP workspaces: export and delete from the the 32-bit system, then recreate on the 64-bit system and import the contents. The remainder of this blog post includes some embellishment of those four steps, from my migration notes. Examples were recreated on my test database; please don't look for these workspace names in an actual EBS database.

    Please note that I'm by no means an OLAP expert; if you have your own observations or experiences to share, including corrections to any errors I might have made, please leave a comment. The last thing I want to do is spread misinformation! And, as always, remember: test systems exist for a reason, and instructions from Oracle Support should trump anything you read in this blog entry :-)

    "No objects to export" error when exporting AWs

    The export process is explained thoroughly in Note 352306.1. You may encounter the following error, however, when attempting to export an empty workspace:

    BEGIN dbms_aw.execute('export all to eif file ''EXPORT_DIR/AWTEST.eif'''); END;
    *
    ERROR at line 1:
    ORA-33390: There are no objects to export.
    ORA-06512: at "SYS.DBMS_AW", line 93
    ORA-06512: at "SYS.DBMS_AW", line 122
    ORA-06512: at line 1
    

    An export file will not be created, since there's no data in the workspace. Nonetheless, you will still need to recreate the AW in the 64-bit database, which leads us to the next section...

    Before deleting AWs

    In addition to gathering the OLAP workspace's name, schema, and tablespace, make sure that you make a note of how the AW is partitioned. This will allow you to more faithfully reconstruct the AW in the 64-bit database. Again, the basics can be found in Note 352306.1, except for a discussion of workspace partitioning. According to the documentation for DBMS_AW.AW_CREATE, by default, analytic workspaces are created with 8 partitions. Querying dba_segments seemed to tell a different story:

    SYSTEM@mactest(10.2.0.4)>select segment_name
    2    , segment_type
    3    , count(*)
    4   from dba_segments
    5   where segment_name= 'AW$TESTDEFAULT'
    6   group by segment_name
    7   , segment_type;
    
    SEGMENT_NAME                   SEGMENT_TYPE           COUNT(*)
    ------------------------------ -------------------- ----------
    AW$TESTDEFAULT                 TABLE SUBPARTITION           16
    

    This initially confused me, until I found that the table created for the default workspace is actually comprised of two partitions, each comprised of 8 subpartitions. Apparently, "partition" means different things to different people:

    SYSTEM@mactest(10.2.0.4)>select table_name
    2  , partition_name
    3  , subpartition_count sub
    4  from all_tab_partitions
    5  where table_name = 'AW$TESTDEFAULT';
    
    TABLE_NAME           PARTITION_NAME        SUB
    -------------------- -------------- ----------
    AW$TESTDEFAULT       PTN1                    8
    AW$TESTDEFAULT       PTNN                    8
    

    So, before you delete the AWs in the 32-bit database, be sure to consult the data dictionary. In most cases, you'll probably see segment count of 16 in dba_segments (implying a default partitioning scheme). But there are exceptions...

    There's always one goofball

    One of those exceptions came when my query to get a count of AW segments returned a 1. Naturally, I was expecting an even number, so this came as a surprise. At first, I thought this might be a special case when specifying partnum=>1 when creating the workspace:

    SYSTEM@mactest(10.2.0.4)>exec dbms_aw.aw_create('JPTEST.TEST1PART','USERS',1);
    PL/SQL procedure successfully completed.
    
    SYSTEM@mactest(10.2.0.4)>select segment_name
    2  , segment_type
    3  ,count(*)
    4  from dba_segments
    5  where segment_name = 'AW$TEST1PART'
    6  group by segment_name
    7  , segment_type
    8  /
    
    SEGMENT_NAME                   SEGMENT_TYPE           COUNT(*)
    ------------------------------ -------------------- ----------
    AW$TEST1PART                   TABLE SUBPARTITION            2
    

    Then it occurred to me that zero is also a number... ;-)

    SYSTEM@mactest(10.2.0.4)>exec dbms_aw.aw_create('JPTEST.TEST0PART','USERS',0);
    PL/SQL procedure successfully completed.
    
    SYSTEM@mactest(10.2.0.4)>select segment_name
    2  , segment_type
    3  ,count(*)
    4  from dba_segments
    5  where segment_name = 'AW$TEST0PART'
    6  group by segment_type
    7  , segment_name
    8  /
    
    SEGMENT_NAME                   SEGMENT_TYPE           COUNT(*)
    ------------------------------ -------------------- ----------
    AW$TEST0PART                   TABLE                         1
    

    Please recall: Note 352306.1 recommends using the Analytic Workspace Manager (AWM) tool to recreate the AWs in the 64-bit database. If you want to create the AWs manually, I suggest engaging with Oracle Support to get their approval. The preceding examples are provided only for illustration of what's going on when the AW is created.

    Importing AWs

    I don't really have much to add here, other than:

    1. If you had any empty AWs in the 32-bit system, you won't have anything to import for those workspaces, though hopefully you recreated them in the 64-bit system
    2. Isn't this process the sort of thing that cries out to be scripted? Would I really resort to such cheap, obvious devices to foreshadow my next post? I would indeed!
  6. Keeping OPatch up-to-date: an object lesson

    I just did something silly while applying the January 2010 PSU to an 11gR1 ORACLE_HOME, and thought I'd share, in case someone else is Googling for the error message.

    [oracle@dbserv patches]$ cd 9209238
    [oracle@dbserv 9209238]$ opatch apply
    
    Invoking OPatch 11.1.0.6.2
    Oracle Interim Patch Installer version 11.1.0.6.2
    Copyright (c) 2007, Oracle Corporation. All rights reserved.
    
    (some OPatch output snipped)
    
    ApplySession failed: Patch ID is null.
    System intact, OPatch will not attempt to restore the system
    
    OPatch failed with error code 73
    

    A closer look at the installation prereqs reveals the following statement: "You must use the OPatch utility version 11.1.0.6.7 or later," which I confess that I missed the first time around. D'oh. After installing the latest version of OPatch, the "opatch apply" command worked as expected.

    Lessons reinforced:

    1. Even if you think you're up-to-date on prerequisites, triple-checking is a good idea
    2. Even humble tools like the OPatch utility can change pretty frequently
    3. Test systems are useful for testing your patching process, not just patches themselves

    (Everyone join in, now: "Thanks, Mr. Obvious, you're a life saver!") :-P

  7. Oracle 10gR2 RDBMS for Mac OS X (Intel) has arrived!

    This post was written before Snow Leopard (OS X 10.6) shipped. If you're interested in an installation guide for Oracle 10gR2 on Snow Leopard, Raimonds Simanovskis has provided instructions on his blog.

    In the wake of a small flurry of announcements yesterday about the release of Oracle 10gR2 for Mac OS X Intel (almost two weeks after April Fools' Day, so knock it off with the jokes already ;-) ), I decided to try a test installation on my Macbook Pro. The good news is that it works, even if you aren't running OS X Server 10.5.4 as specified in the release notes:

    OraDBCASuccess.jpg

    For the record, this is Mac OS X Leopard, desktop version 10.5.6. I'm still considering whether or not to post an "install guide," since everything more or less worked as advertised. And really, does the Internet need another dozen screenshots of OUI and DBCA doing their usual thing? I feel guilty enough for the screenshot above.

    Install guide updates, 12-13 April 2009:

    So now what?

    Apart from the novelty of it all, is this a worthwhile exercise? I'm not sure yet. I've been using the "Oracle in a VM" approach for a really long time, and I really like the flexibility of being able to play around with system configurations, etc. without worrying about doing any damage to my host system. With snapshots enabled, I don't even have to worry very much about permanently damaging the VM itself. Also, as stated in the release notes  (and called out in this thread on the OTN Apple forum, where hope springs eternal and no one is ever bitter or cranky), the list of supported features for the OS X version is a bit shorter than one might expect. On the other hand, "unsupported" doesn't always mean "won't work," and the memory footprint of a native database is a bit less than that of a full OS + database running in a virtualized environment. My expectation is that for quick testing and tinkering, the native database install could fit a niche, but VM still rules for more advanced stuff like RAC, ASM, 11g, etc.

    What's encouraging is that it appears that Oracle hasn't abandoned OS X as a platform. Feature set aside, I suspect they'll need to do more frequent releases before many people would be very comfortable using this in an enterprise setting. But this also opens the possibility of different licensing options...Express Edition for OS X, anyone? OSXXE? ;-)

    If someone manages to port this to iPhone, though, let me know. That would rock.

  8. Fixing file permission problems on Grid Control targets

    When installing Grid Control agents on a Linux or Unix platform, a recommended practice is to install as a user that doesn't own the ORACLE_HOMEs to be monitored. This poses a challenge when configuring monitoring for some targets, particularly those based on Oracle Application Server 10g. For security purposes, file and directory permissions on some ORACLE_HOMEs are fairly restrictive, and the Grid Control agent can't read all of the files necessary for target discovery and metric collection. The Grid Control release notes mention this complication, and Metalink Note 437078.1 provides additional suggestions for resolving discovery and metric collection errors related to file permission restrictions on a target ORACLE_HOME. I can't reveal the content of the Metalink My Oracle Support Note, and the information in the publicly-available Release Notes provides inadequate coverage, but the overall problem-solving method reduces to:

    1. Make sure that the owner of the agent software is in the same group as the owner of the monitored target software
    2. Check the agent log files for file permission errors, and fix them.

    My experience so far indicates that fixing some errors tends to reveal others. Who doesn't love a rousing game of log file whack-a-mole? In the interest of saving others a few rounds of this, I'm offering up a few additional permission change steps that could make your lives easier. For ease of navigation, I've broken things into the following sections:

    Disclaimer: When considering making the changes I describe below, please consider the trade-off you are making: more robust monitoring at the expense of some level of security. The perceived degree of sacrifice will vary from person to person and organization to organization. I would argue that the benefits outweigh the risks, and since Oracle itself suggests similar changes, this is clearly not a cut-and-dried issue. Hey, if doing the right thing were always easy, this job wouldn't be any fun, right? ;-)

    Please feel free to leave any similar tips (or links to your own posts about this topic) in the comments.

    Oracle Application Server

    Note 437078.1 is a good start, but there are a few additional items I found to ease metric collection and discovery errors for Oracle 10g App Server. ORACLE_HOME refers to the location of the 10gAS software. Depending on the application server components installed on your system, you may need to change more or fewer permissions.

    chmod g+rx $ORACLE_HOME/webcache
    chmod g+r $ORACLE_HOME/webcache/*.xml
    chmod g+rx $ORACLE_HOME/portal
    chmod g+rx $ORACLE_HOME/portal/conf
    chmod g+rx $ORACLE_HOME/discoverer
    chmod g+rx $ORACLE_HOME/discoverer/config
    chmod g+rx $ORACLE_HOME/forms
    chmod g+rx $ORACLE_HOME/forms/server
    chmod g+rx $ORACLE_HOME/ldap
    chmod g+rx $ORACLE_HOME/ldap/das
    chmod g+rx $ORACLE_HOME/uix
    chmod g+rx $ORACLE_HOME/ultrasearch
    chmod g+rx $ORACLE_HOME/ultrasearch/webapp
    chmod g+rx $ORACLE_HOME/ultrasearch/webapp/config
    chmod g+rx $ORACLE_HOME/jpi
    chmod g+rx $ORACLE_HOME/jpi/doc
    chmod g+rx $ORACLE_HOME/Apache/oradav
    chmod g+rx $ORACLE_HOME/Apache/oradav/conf
    chmod g+rx $ORACLE_HOME/Apache/jsp
    chmod g+rx $ORACLE_HOME/Apache/jsp/conf
    chmod g+rx $ORACLE_HOME/Apache/modplsql
    chmod g+rx $ORACLE_HOME/Apache/modplsql/conf
    chmod g+rx $ORACLE_HOME/sso
    chmod g+rx $ORACLE_HOME/sso/conf

    Database targets

    Some environments (for example, the 10gAS infrastructure database) will require permission changes to effectively monitor database and listener targets. You may find the scope of these changes to be a bit broad; at some point I got tired of chasing down individual files and just hit them all with the same hammer. :-) In this case, ORACLE_HOME refers to the ORACLE_HOME of the RDBMS software.

    find $ORACLE_HOME/admin -type d -exec chmod g+rx {} \;
    find $ORACLE_HOME/admin -type f -exec chmod g+r {} \;
    
    find $ORACLE_HOME/network -type d -exec chmod g+rx {} \;
    find $ORACLE_HOME/network -type f -exec chmod g+r {} \;
    
    find $ORACLE_HOME/rdbms -type d -exec chmod g+rx {} \;
    find $ORACLE_HOME/rdbms -type f -exec chmod g+r {} \;
    
    ## The following are needed for database health checks to work properly...
    chmod g+rx $ORACLE_HOME/dbs
    chmod g+rw $ORACLE_HOME/dbs/hc*
    

    Another general configuration note for the database listener, which I stumbled upon when setting up monitoring for a Secure Enterprise Search repository database: If sqlnet.ora is configured with TCP.VALIDNODE_CHECKING=yes, then the server that hosts the Grid Control OMS needs to be added to the list of hosts defined by TCP.INVITED_NODES. Otherwise, you'll get messages like this when attempting to add the database target, even if all monitoring credentials are correct:

    Failed to connect to the database: Io exception: Got minus one from a read call
    The Connect Descriptor was (description=(address=(host=ses.myorg.com)(protocol=tcp)(port=xxxx))
    (connect_data=(service_name=SESdb)(instance_name=SESdb)(UR=A)))</pre>
    And corresponding messages in the listener log:
    <pre>TNS-12546: TNS:permission denied
     TNS-12560: TNS:protocol adapter error
      TNS-00516: Permission denied
    

    Of course, after adding the OMS host to TCP.INVITED_NODES in sqlnet.ora, you need to tell the listener that you've made a change before the OMS will be able to connect to the database and add it as a target:

    oracle@ses:~> lsnrctl  reload
    
    LSNRCTL for Linux: Version 10.1.0.5.0 - Production on 05-DEC-2008 10:58:45
    
    Copyright (c) 1991, 2004, Oracle.  All rights reserved.
    
    Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ses.myorg.com)(PORT=xxxx)))
    The command completed successfully

    Oracle Collaboration Suite Calendar Server

    These commands will resolve some issues with metric collection for OCS Calendar Server. I suspect that the intersection of Collaboration Suite administrators (a small set) and readers of this blog (a vanishingly small set) is probably zero, but Google works in mysterious ways. ORACLE_HOME refers to the OCS software location, not the Grid Control agent home.

    chmod g+rx $ORACLE_HOME/ocal/bin
    chmod g+rx $ORACLE_HOME/ocal/bin/uniwho
    chmod g+rx $ORACLE_HOME/ocal/bin/unireqdump
    chmod g+rx $ORACLE_HOME/ocal/bin/uniping
    chmod g+rx $ORACLE_HOME/ocal/bin/unistatus
    chmod g+rx $ORACLE_HOME/ocal/sbin
    chmod g+rx $ORACLE_HOME/ocal/sbin/who
    chmod g+rx $ORACLE_HOME/ocal/sbin/reqdump
    chmod g+rx $ORACLE_HOME/ocal/sbin/ping
    chmod g+rx $ORACLE_HOME/ocal/sbin/status
    chmod g+rx $ORACLE_HOME/ocal/oem
    chmod g+rx $ORACLE_HOME/ocal/oem/scripts
    chmod g+rx $ORACLE_HOME/ocal/oem/scripts/ocal_ps.pl
    chmod g+rx $ORACLE_HOME/ocal/oem/scripts/ocal_dbsize.pl
    chmod g+rx $ORACLE_HOME/ocas
    chmod g+rx $ORACLE_HOME/ocas/linkdb
    chmod g+rx $ORACLE_HOME/ocas/sessiondb
    chmod g+rx $ORACLE_HOME/lib
    chmod g+rx $ORACLE_HOME/ocal/lib
    chmod g+r $ORACLE_HOME/ocal/lib/*.so
    chmod g+rx $ORACLE_HOME/ocal/db
    chmod g+rx $ORACLE_HOME/ocal/db/nodes
    find $ORACLE_HOME/ocal/db/nodes -type d -exec chmod g+rx {} \;
    find $ORACLE_HOME/ocal/db/nodes -type f -exec chmod g+r {} \;
    

    E-Business Suite

    Setting these permissions in an E-Business Suite R12 instance resolved some metric collection errors of the form

    Couldn't open INST_TOP/ora/10.1.3/j2ee/oafm/application-deployments/oafm/orion-application.xml: Permission denied at AGENT_HOME/sysman/admin/scripts/ias/simpleXPath.pm line 116

    These errors may not manifest in 11i, since that version's tech stack is not based on 10gAS. INST_TOP is the "Instance TOP" environment variable for the R12 Apps installation, and serves as a reminder to connect as the Apps owner (not database software owner) to run these commands.

    chmod g+r $INST_TOP/ora/10.1.3/j2ee/forms/config/jms.xml
    chmod g+r $INST_TOP/ora/10.1.3/j2ee/forms/application-deployments/forms/orion-application.xml
    chmod g+r $INST_TOP/ora/10.1.3/j2ee/forms/application-deployments/forms/formsweb/orion-web.xml
    chmod g+r $INST_TOP/ora/10.1.3/j2ee/oacore/config/jms.xml
    chmod g+r $INST_TOP/ora/10.1.3/j2ee/oacore/application-deployments/oacore/orion-application.xml
    chmod g+r $INST_TOP/ora/10.1.3/j2ee/oacore/application-deployments/oacore/html/orion-web.xml
    chmod g+r $INST_TOP/ora/10.1.3/j2ee/oafm/config/jms.xml
    chmod g+r $INST_TOP/ora/10.1.3/j2ee/oafm/application-deployments/oafm/orion-application.xml
    chmod g+r $INST_TOP/ora/10.1.3/j2ee/oafm/application-deployments/oafm/webservices/orion-web.xml
    chmod g+r $INST_TOP/ora/10.1.3/j2ee/oafm/application-deployments/mapviewer/orion-application.xml
    chmod g+r $INST_TOP/ora/10.1.3/j2ee/oafm/application-deployments/mapviewer/web/orion-web.xml
    chmod g+r $INST_TOP/ora/10.1.3/j2ee/oafm/application-deployments/ascontrol/orion-application.xml
    
  9. Two OCP exam mini-reviews: 1Z0-040 (10g DBA New Features) and 1Z0-238 (R12 Install, Patch and Maintain)

    Here are my impressions of two Oracle certification exams I've had the opportunity to take in the past few months: 1Z0-040 (Oracle Database 10g: New Features for Administrators), and the beta version of 1Z0-238 (Oracle EBS R12: Install, Patch and Maintain Applications).

    1Z0-040: Less painful than upgrading to Vista (cheaper, too)

    For the most part, the upgrade path to 10g DBA OCP seems easier than starting from scratch. Compared to the two (soon to be three) exams and shelling out for classroom instruction required for the regular certification path, a single exam with no classroom instruction requirement seems like a piece of cake. Of course, to take the upgrade exam, you have to have a prior OCP DBA certification, which means you've already been down that "class plus several exams" road.

    What evens out the effort even further is this: It's an upgrade exam. You aren't going to be tested on just "standard DBA material" as you would be in the regular exam track. You have to care about the new stuff, and you have to care a lot. Don't use VPD or audit features? Still using normal filesystems and regular files to store your data, instead of ASM? Doesn't matter! For this exam, you need to concentrate on new features. You also need to care about the differences between versions of features you might not have been using in 9i or 10g.

    There's nothing wrong with that. It's only fair that reinforcing and testing knowledge of new features is the primary purpose of an upgrade exam. If you're already a 9i OCP DBA, the upgrade exam is an opportunity to demonstrate your skills as a 10g DBA, not a 9i DBA who uses 10g as if it were 9i. Otherwise, you're not using all the tools at your disposal, or, even worse, you could be using them wrong.

    Although the exam only covers new features, it's not a trivia quiz. You're much more likely to see questions about ASM, new RMAN features, and install/upgrade procedures than questions about obscure initialization parameters and system catalog views. There is a little bit of everything on the exam, though, so it's still useful to be prepared. Besides, there's a benefit to preparing for an exam that tests you exclusively on new features: you're more likely to learn new things. I even learned a thing or two about RMAN during my exam prep, a tool I thought I already knew pretty well. Speaking of prep material, here's what I used to get ready for the exam:

    1Z1-238: Oh, hey, just one more question...

    In a few days, the beta period for this exam expires, and 1Z1-238 becomes 1Z0-238. Some of the following commentary may therefore have a very short "relevance half-life." Update: As of 21 January 2009, the production version of this exam is now live. The Oracle Certification blog has more details.

    In some television game shows, there's the concept of a "lightning round": answer as many questions/perform as many tasks as rapidly as you can in a short period of time, with lots of dramatic buildup. Not much time to think, and the more correct completions you get, the greater the reward.

    Imagine a "lightning round" that goes on for three hours, and you have a rough idea of what it's like to take an Oracle Certification Beta Exam. Non-beta exams have a smaller time allotment, but also substantially fewer questions, such that one usually has an average of 1.3-1.5 minutes to spend on each question. Beta exams allow an average of less than a minute per question, and even though many of the questions can be answered very quickly, even a few "slow questions" can make the experience feel a bit rushed. Since the point of a beta exam is to "test the test," it's not surprising that there were a handful of questions that would rate as fiendishly difficult. There were also a few that were not difficult, just genuinely incomprehensible. In those cases, I used the comment feature to point out where a question could benefit from some editing.

    In contrast to the 10g Database New Features exam, the R12 Install, Patch, and Maintain exam is not an upgrade exam. If you've taken the 11i version of this exam, a lot of the material will be familiar. The twist, of course, is that there *are* differences between R12 and 11i. Naturally, some of the questions covered material that was new in R12, and others of the questions were clearly crafted to highlight the differences between the versions. There were a few times that I had to change my answers when I realized that I'd answered a question in a way that was correct for 11i, but not for R12. As the title of the exam suggests, you can expect lots of questions about patching procedures, Apps directory structures, and using the various AD utilities.

    One last odd thing about taking a beta exam is the lack of instant feedback on the exam. Though I feel pretty confident about my performance, I have no idea if I passed this exam. I certainly have less experience with R12 than with 11i, and with 200+ questions I quickly lost track of any "gut feel" for how many questions I was getting wrong. Regardless, with a 10-week wait after the close of the beta period, it's looking like I won't know the verdict until sometime next year. Ah well, good experience either way.

    That's a wrap

    If you have questions about these exams, please ask in the comments. Also, here are some excellent resources for those interested in a broader range of Oracle certification topics:

  10. Unravelling TNS-03505 with database lookups in Oracle Internet Directory

    I was playing around with using Oracle Internet Directory to resolve database service names recently, and it turned into an adventure. Not exactly a memoir-inspiring adventure, but at least one worth blogging about. Maybe my little voyage of discovery here will help someone else who gets a TNS-03505 when using LDAP to look up database service names.

    The Setup

    My "lab" in this case consisted of two VM servers, running Oracle Enterprise Linux 5. Both Oracle installations are "plain vanilla," no additional patching other than what was required to get the non-OID database server from 10.2.0.1 to 10.2.0.4.

    OID server "normal" database server
    Hostname garibaldi vir
    Software & version 10gAS 10.1.2.0.2 (OIM 10.1.4) RDBMS 10.2.0.4
    SID OIM JPTEST

    The tests

    First, I registered the JPTEST database with OID in the custom Oracle Context dc=jep,dc=com. Based on that statement alone, you may already know where this is going. :-) In order to test the registration, I confirmed that that the client software on the database server was set up to consult OID first for name resolution. Then, I confirmed that I could ping the JPTEST database from its own host:

    oracle@vir ~]$ grep NAMES $TNS_ADMIN/sqlnet.ora
    NAMES.DIRECTORY_PATH= (LDAP,TNSNAMES,EZCONNECT)
    [oracle@vir ~]$ tnsping JPTEST
    
    TNS Ping Utility for Linux: Version 10.2.0.4.0 - Production on 05-OCT-2008 15:12:11
    
    Copyright (c) 1997,  2007, Oracle.  All rights reserved.
    
    Used parameter files:
    /u01/app/oracle/product/rdbms/10gR2/network/admin/sqlnet.ora
    
    Used LDAP adapter to resolve the alias
    Attempting to contact (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=vir.local)(PORT=1563))(CONNECT_DATA=(SERVICE_NAME=JPTEST.dbdomain)))
    OK (0 msec)

    So far, so good. Next, I checked to see if I could resolve other database service names. I only had 1 other database server running, the OID repository, so:

    [oracle@vir ~]$ tnsping OIM
    
    TNS Ping Utility for Linux: Version 10.2.0.4.0 - Production on 05-OCT-2008 15:14:02
    
    Copyright (c) 1997,  2007, Oracle.  All rights reserved.
    
    Used parameter files:
    /u01/app/oracle/product/rdbms/10gR2/network/admin/sqlnet.ora
    
    TNS-03505: Failed to resolve name

    ...Huh.

    Hopping over to the OID server, I saw the following:

    [oracle@garibaldi ~]$ tnsping OIM
    
    TNS Ping Utility for Linux: Version 10.1.0.5.0 - Production on 05-OCT-2008 15:28:22
    
    Copyright (c) 1997, 2003, Oracle.  All rights reserved.
    
    Used parameter files:
    /u01/app/oracle/product/OIM/network/admin/sqlnet.ora
    
    Used LDAP adapter to resolve the alias
    Attempting to contact (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=garibaldi.local)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=OIM.local)))
    OK (10 msec)

    Okay, that looks good, but:

    [oracle@garibaldi ~]$ tnsping JPTEST
    
    TNS Ping Utility for Linux: Version 10.1.0.5.0 - Production on 05-OCT-2008 15:28:29
    
    Copyright (c) 1997, 2003, Oracle.  All rights reserved.
    
    Used parameter files:
    /u01/app/oracle/product/OIM/network/admin/sqlnet.ora
    
    TNS-03505: Failed to resolve name

    Okay, so these servers can see themselves, but not each other. Neato!

    The explanation

    The error message implies a lookup problem, not a connectivity problem. Clearly, both clients can connect to the OID server...they're just not finding their targets. That suggests the targets are not defined similarly in the LDAP directory. What does ldapsearch say?

    [oracle@vir ~]$ ldapsearch -h garibaldi.local -p 389 -s sub  orclservicetype=DB | egrep ^cn
    cn=OIM,cn=OracleContext
    cn=OIM
    cn=JPTEST,cn=OracleContext,dc=jep,dc=com
    cn=JPTEST
    

    Aha. The entries are in different Oracle contexts. Had I not decided to create a custom Oracle context, and just registered my JPTEST database in the default context, I wouldn't have this situation (or this blog post). Just for fun, let's confirm that the clients on the two servers are configured to look in different places.

    On the database server:

    [oracle@vir ~]$ grep CONTEXT $TNS_ADMIN/ldap.ora
    DEFAULT_ADMIN_CONTEXT = "dc=jep,dc=com"

    And the OID server:

    [oracle@garibaldi ~]$ grep CONTEXT $TNS_ADMIN/ldap.ora
    grep: /u01/app/oracle/product/OIM/network/admin/ldap.ora: No such file or directory

    Oops. That's not where ldap.ora lives in the default OID install...

    [oracle@garibaldi ~]$ grep CONTEXT $ORACLE_HOME/ldap/admin/ldap.ora
    DEFAULT_ADMIN_CONTEXT = ""
    

    There we go. The client on the database server is searching in my custom context, and lookups on the OID server start at the root context.

    Solutions and closing rambling

    Here's my quick and dirty solution to my problem. Note that I'm working in a test system, and this may be too dirty for use in a production configuration. :-)

    Since I plan to register the rest of my databases in the custom context, rather than dumping them in the root context, an easy way to make it possible to connect to the OIM database via LDAP naming from non-local clients is to create an alias to the OIM entry in my custom context:

    [oracle@garibaldi ~]$ ldapadd -p 389 -h garibaldi.local -f OIM_alias.ldif -q -v -D cn=orcladmin
    Please enter bind password:
    add objectclass:
    	alias
    add aliasedObjectName:
    	cn=OIM, cn=OracleContext
    adding new entry cn=OIM, cn=OracleContext, dc=jep, dc=com
    modify complete

    Now, from my database server:

    [oracle@vir ~]$ tnsping OIM
    
    TNS Ping Utility for Linux: Version 10.2.0.4.0 - Production on 06-OCT-2008 10:36:57
    
    Copyright (c) 1997,  2007, Oracle.  All rights reserved.
    
    Used parameter files:
    /u01/app/oracle/product/rdbms/10gR2/network/admin/sqlnet.ora
    
    Used LDAP adapter to resolve the alias
    Attempting to contact (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=garibaldi.local)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=OIM.local)))
    OK (0 msec)
    

    Yay, half a problem solved! Still, I'm more or less stuck when performing searches from the OID server. One option is to use the full DN to access the server:

    [oracle@garibaldi ~]$ tnsping cn=JPTEST,cn=OracleContext,dc=jep,dc=com
    
    TNS Ping Utility for Linux: Version 10.1.0.5.0 - Production on 07-OCT-2008 14:33:52
    
    Copyright (c) 1997, 2003, Oracle.  All rights reserved.
    
    Used parameter files:
    /u01/app/oracle/product/OIM/network/admin/sqlnet.ora
    
    Used LDAP adapter to resolve the alias
    Attempting to contact (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=vir.local)(PORT=1563))(CONNECT_DATA=(SERVICE_NAME=JPTEST.dbdomain)))
    OK (20 msec)

    That works, but if I wanted to type that much, I'd just use EZConnect. I could also create a second ldap.ora in $ORACLE_HOME/network/admin on the OID server, with a non-root value (dc=jep,dc=com) for DEFAULT_ADMIN_CONTEXT, and set the LDAP_ADMIN environment variable to point to the alternate file as needed. That would be okay (barely) for a one-user test setup like this, but in the real world it's a recipe for unacceptable levels of confusion.

    It seemed reasonable to expect (to me, anyway...maybe I just expect too much) that specifying the root context as the default for LDAP searches would make it possible to find a database further down the tree, but apparently not. After some looking around on Metalink My Oracle Support, I found that I may be hitting a bug that's fixed in the 11.1.0.7 RDBMS patchset. But either I'm reading the note wrong, or I don't have the problem as described. Here's lookup attempt from a different server with a patched 11gR1 client...still no joy:

    [oracle@lyta ~]$ grep CONTEXT $TNS_ADMIN/ldap.ora
    DEFAULT_ADMIN_CONTEXT = ""
    [oracle@lyta ~]$ tnsping JPTEST
    
    TNS Ping Utility for Linux: Version 11.1.0.7.0 - Production on 07-OCT-2008 14:55:18
    
    Copyright (c) 1997, 2008, Oracle.  All rights reserved.
    
    Used parameter files:
    /u01/app/oracle/product/client/11gR1/network/admin/sqlnet.ora
    
    TNS-03505: Failed to resolve name
    

    I'd welcome any comments with alternate solutions, explanations, or a gentle "Hey, idiot, you're doing it wrong!" for what's going on here. I can live with the setup I have now in my test environment, but I'm always ready to learn something new.