Friday, April 17, 2009

Re-install Grid Control Agent (10.2.0.4)

Here are steps to re-install Grid Control agent on db_server:

0. Have a blackout on the host level to avoid all kinds of GC alerts.
1. Remove Host db_server from the GC site at http://gc_server:7777/em/console/logon/logon , use "Super Administrator" account.
Note: remove the listerner, databases first, then remove the Agent and Host as the last step after the Agent is stopped.

2. On db_server, login as oracle. Stop the Agent and make sure "ps -ef | grep emagent" does not return anything. Then run .../runInstaller to just remove agent10g Home from the server.

3. Remove or rename folder $ORACLE_BASE/product/agent10g. Create a new folder, say, $ORACLE_BASE/product/Agent204download to hold the script.

4. Go to the new folder, and run wget http://gc_server:4889/agent_download/10.2.0.4.0/aix/agentDownload.aix

5. The easiest way is to make a shell script to launch the installation. Here is the content of my file agentinstall.sh

#!/bin/sh
export ORACLE_BASE=/u01/app/oracle

export PATH=$PWD:$PATH
export ORACLE_HOME=$ORACLE_BASE/product/agent10g
export AGENT_HOME=$ORACLE_BASE/product/agent10

./agentDownload.aix -b $ORACLE_BASE/product > agentdownload.log

6. Run the script to fire the installation process
./agentinstall.sh

If the screen is not moving (hanging), hit “Enter” key once during the entire installation!! You may open a different Putty (OS) session to view the log for monitoring the process.

Troubleshootings:

1. Make sure the db_server name is defined in /etc/hosts file.
2. "$ nslookup db_server" to check the IP address and hostname.
3. "$ netstat -an | grep 3872" to verify port is not used (Note 443524.1)
4. On the server where the entire $ORACLE_BASE (Oracle binary) was copied from a different server by SAN COPY on disk level, the Agent installation keeps getting the wrong hostname. I have to add ORACLE_HOSTNAME=db_server to the end of the "InstallCmd=" line in the agentDownload.aix script to force the right hostname (after reading Note 370300.1).

Friday, March 20, 2009

Errors from Using DATAPUMP

Datapump moves data on block level and is faster than Export/Import. It took about 4 hours to export 360GB dump data, and took 13 and half hours to import it on AIX servers with 4 physical CUPs.

Here are some errors I ran into, and the fixes:

- The OS folder can not be recognized by datapump:

ORA-39001: invalid argument value
ORA-39000: bad dump file specification
ORA-31641: unable to create dump file "/images/DB_full.dmp"
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options

Fix (Note 370513.1): Mount the disk by using the recommended mounting options which are: (hard,bg,proto=tcp,suid,rsize=32768,wsize=32768,noac)

- Datapump can not locate the directory:

ORA-39002: invalid operation
ORA-39070: Unable to open the log file.
ORA-29283: invalid file operation
ORA-06512: at "SYS.UTL_FILE", line 475
ORA-29283: invalid file operation

Fix: Make sure to include "/" at the beginning (or the end) of the path, such as '/home/oracle/dumdir' when creating the DIRECTORY in the database.

- On a RAC environment, got errors when running datapump on one of the nodes:

ORA-31626: job does not exist
ORA-31650: timeout waiting for master process response

Wed Mar 18 19:58:06 2009
kwqbcsrmsg: Timeout contacting inst 2 for buffered oper 1,
IPC ret status=36
queue = SYS KUPC$S_1_20090318195629
kupprdp: master process DM00 started with pid=38, OS id=799160
to execute - SYS.KUPM$MCP.MAIN('SYS_EXPORT_FULL_01', 'SYSTEM', 'KUPC$C_1_20090318195629', 'KUPC$S_1_20090318195629', 0);
Wed Mar 18 19:59:00 2009

Fix (Note 454639.1): "alter system set aq_tm_processes=1"

SQL> select value from v$parameter
where name = 'aq_tm_processes';
VALUE
------
0
SQL> alter system set aq_tm_processes=1 scope=both;
SQL> select value from v$parameter
where name = 'aq_tm_processes';
VALUE
------
1

- Got below errors from datapump, and ORA-600 in alert log:

UDE-00008: operation generated ORACLE error 31623
ORA-31623: a job is not attached to this session via the specified handle
ORA-06512: at "SYS.DBMS_DATAPUMP", line 2745
ORA-06512: at "SYS.DBMS_DATAPUMP", line 3712
ORA-06512: at line 1

ORA-00600: internal error code, arguments: [kwqbgqc: bad state], [1], [1], [], [], [], [], []

Fix (Note 754401.1): Re-create the SYS.KUPC$DATAPUMP_QUETAB queue.

SQL> startup restrict
SQL> exec dbms_aqadm.drop_queue_table(queue_table=>'SYS.KUPC$DATAPUMP_QUETAB',force=>TRUE);

PL/SQL procedure successfully completed.

SQL> BEGIN
2 dbms_aqadm.create_queue_table(queue_table => 'SYS.KUPC$DATAPUMP_QUETAB',
3 multiple_consumers => TRUE,
4 queue_payload_type =>'SYS.KUPC$_MESSAGE',
5 comment => 'DataPump Queue Table',
6 compatible=>'8.1.3');
7 EXCEPTION
8 WHEN OTHERS THEN
9 IF SQLCODE = -24001 THEN NULL;
10 ELSE RAISE;
11 END IF;
12 END;
13 /

PL/SQL procedure successfully completed.

Thursday, March 19, 2009

Paralle Degree in Creating MVW

When using below statement to create a materialized view, I find that it not only creates the materialized view with a DEFAULT degree, but also uses parallelism of DEFAULT degree to create the materialized view.

CREATE MATERIALIZED VIEW DSS_DIM_MVW
TABLESPACE PDW_DATA
NOCACHE
NOLOGGING
NOCOMPRESS
PARALLEL ( DEGREE DEFAULT INSTANCES 1 )
BUILD IMMEDIATE
REFRESH COMPLETE ON DEMAND
WITH PRIMARY KEY
AS
SELECT "......query from multiple tables of DOP 1"
;

On the server with 4 LCPU, the database uses 2 slave sets of each 8 (4 * 2, default) parallel processes to run the statement. So totally, 16 slaves are recuited to run the creation. The two slaves in the first slave set could be scanning a table and passing the results to the slaves in slave set 2 for sorting.

After the query finishes, the materialized view gets DEFAULT as its DOP.

Tuesday, March 17, 2009

GC Agent Installation Folder

After the Grid Control server 10.2.0.4.0 is installed, it creates two folders under $OMS_HOME/sysman/agent_download:
10.2.0.1.0
10.2.0.4.0

The Agent installation file AIX_xxxx_10_2_0_4_0.zip file should be downloaded from Oracle site and saved to folder 10.2.0.4.0. After the file is upzipped, it creates folder aix with all installation files there.

The needed modifications are

1. In /10.2.0.4.0/agent_download.rsp:

s_OMSHost="gc_server"
s_OMSPort="4889"
# s_encrSecurePwd=""

2. In /10.2.0.4.0/aix/agentDownload.aix

s_OMSHost="gc_server"
s_OMSPort="4889"

Then, run below line on the database server to start the Agent installation:

wget http://gc_server:4889/agent_download/10.2.0.4.0/aix/agentDownload.aix

Sunday, March 15, 2009

Grid Control 10.2.0.4 Installation

Make sure environment variables $ORACLE_BASE and $ORACLE_HOME do not exist in the OS account before any installation. Steps on installing Grid Control 10.2.0.4 on AIX (OS level 5300-08-03-0831):

1. Install Oracle 11g (11.1.0.6) database on the server for the repository.

- Do not include anything related to Emterprise Manager in the database. (No SYSMAN account exists).
- Temporarily set the SYS password to be "default".

2. Make sure the initialization parameters meet the requirements, such as 8k block size.

3. Run @?/rdbms/admin/dbmspool.sql as SYSDBA to create a package in the database.

4. Edit the em_using_existing_db.rsp file.

- Note 763347.1 provides the best sample on necessary parameters.
- Note 602750.1 describes the worst "Invalid Username/Password" bug. The smart way is to have parameter s_reposDBAPwd="default" in the file. I believe this parameter is only used by ConfigureGC.pl

5. Install Grid Control 10.2.0.1 on the server with "software-only" option.

./runInstaller -noconfig -silent -responseFile /path_to_res_filr/response/em_using_existing_db.rsp -force

6. Stop all the OPMN processes, plus "ps -ef | grep opmn".

$OMS_HOME/opmn/bin/opmnctl stopall

7. Apply the 10.2.0.4 patch set (patch 3731593) to OMS. First, edit the patchset.res with

FROM_LOCATION="/file_path/Disk1/stage/products.xml"
ORACLE_HOME="/oracle_base/product/oms10g"
b_softwareonly=true
s_sysPassword="default"

8. Run command to apply the patch set to the OMS home.

./runInstaller -noconfig -silent -responseFile /file_path/Disk1/response/patchset.rsp -force

9. Apply the 10.2.0.4 patch set (patch 3731593) to Agent. First, backup and edit the patchset.res with

FROM_LOCATION="/file_path/Disk1/stage/products.xml"
ORACLE_HOME="/oracle_base/product/agent10g"
b_softwareonly=true
s_sysPassword="default"

10. Run command to apply the patch to the Agent home.

./runInstaller -noconfig -silent -responseFile /file_path/Disk1/response/patchset.rsp -force

11. Make PERL5LIB to the environment variable.

export PERL5LIB=/oracle_base/product/oms10g/perl/lib/5.6.1

12. Run the perl script to configure Grid Control.

$ cd /oracle_base/product/oms10g/perl/bin
$ ./perl /oracle_base/product/oms10g/sysman/install/ConfigureGC.pl /oracle_base/oracle/product

13. After the Agent is started and the some targets on the Host do not show up on the page http://gc_server:7777/em/console/home, run agentca in the Agent home (/oracle_base/product/agent10g/bin) and then remove/add them on the page if necessary.

./agentca -d

14. Run asked root.sh .

Troubleshooting:

Note 434858.1: Remove a failed GC installation.
Note 435329.1: Agent does not start: Malformed URL: http://:%EM_SERVLET_PORT%/emd/main/
Note 730308.1: How To Troubleshoot Oms Startup Issue. (Not really helps).

UPDATES:

To turn off  Performance Tuning Pack in 11G database to make it unavailable in Grid Control:


SQL> alter system SET CONTROL_MANAGEMENT_PACK_ACCESS='NONE' SCOPE=BOTH sid='*';


To turn it on, run below statement and then wait about one or two minutes in Grid Control:


SQL> alter system SET ONTROL_MANAGEMENT_PACK_ACCESS='DIAGNOSTIC+TUNING' SCOPE=BOTH;


View DBA_FEATURE_USAGE_STATISTICS gives more information on the status, such as


SQL> select * from DBA_FEATURE_USAGE_STATISTICS where lower(name) like '%sql%';