'시스템관리'에 해당되는 글 1건

  1. 2007.01.23 커널 패닉, 시스템이 행 걸렸을때 어떻게 확인을 할까?

출처 : http://tunelinux.pe.kr/gboard/bbs/board.php?bo_table=tip&wr_id=75&page=

RHN 에 지식기반정보가 있습니다.
rhn 등록되어있는 사용자만이 접속할 수 있지요.

한번 테스팅해볼만한 내용이네요.
문제해결을 위하여.

그리고 시스템의 정보를 수집하기 위한 sysreport 라는 프로그램이 있습니다. rhn등을 이용하여 설치하면 되고 위 명령만 치면 시스템의 주요정보를 모읍니다.

Solution Found
Issue:
My system had a kernel panic, an oops message, or is freezing for no apparent reason. How can I find out what is causing this?
Resolution:         Last update: 08-17-04
Resolving a kernel panic or a kernel oops is not a simple task. First off, in order for Red Hat to understand the cause of this, we will need to see the panic or oops message in its entirety. Below you will find our \"Profiling\" document, it contains the information that Red Hat requires in order to best troubleshoot a kernel panic or kernel oops related to a system crash.

We do recommend that you are running the latest kernel available for your release version and that have your system completely updated.

To further debug this problem we will need the following information:
The output from the following commands:

    * sysreport
    * lspci -vv
    * lsmod
    * cat /proc/meminfo
    * cat /proc/cpuinfo
    * uname -a


Please note, sysreport is an application that may not be installed on your system. If you do not have it installed, please install the sysreport RPM in one of the following ways:

    * Run: up2date sysreport if your system is registered with RHN, this will download and install the package for you.
    * Locate the sysreport package on your installation CD's and install the package with: rpm -ivh sysreport-version#.rpm - where version# will match the files version number on your installation CD.


If possible, please run these commands when the slow down is occuring, or as close as possible to a reproduceable crash. That being said, we do recognize that this is not always possible, but the information is still needed.

    * OOPS messages:

      If your machine crashes with an OOPS message, similar to the following:

Unable to handle kernel NULL pointer dereference at virtual address
00000018
*pde = 0f992001
Oops: 0000
CPU:    1
EIP:    0010:[]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010207
eax: 00000000   ebx: c87a1ed0   ecx: c02de5e0   edx: f3de3b00
esi: c87a1eb4   edi: 00000000   ebp: 00000007   esp: c3f5bfa0
ds: 0018   es: 0018   ss: 0018
Process kswapd (pid: 11, stackpage=c3f5b000)
Stack: 00000000 fffffe5d 00000245 00085992 00000001 00000000 000000c0 000000c0
       0008e000 c0136c51 000000c0 00000000 c3f5a000 00000006 c0136ce5 000000c0
       00000000 00010f00 c3ff1fb8 c0105000 c0105866 00000000 c0136c90 c02f5fc0
Call Trace: [] do_try_to_free_pages [kernel] 0x11
[] kswapd [kernel] 0x55
[] stext [kernel] 0x0
[] kernel_thread [kernel] 0x26
[] kswapd [kernel] 0x0
Code: f7 40 18 06 00 00 00 75 f0 8b 40 28 39 d0 75 f0 31 d2 85 d2


>>EIP; c0136177    <=====

Trace; c0136c51
Trace; c0136ce5
Trace; c0105000 <_stext+0/0>
Trace; c0105866
Trace; c0136c90
Code;  c0136177
00000000 <_EIP>:
Code;  c0136177    <=====
   0:   f7 40 18 06 00 00 00      testl  $0x6,0x18(%eax)   <=====
Code;  c013617e
   7:   75 f0                     jne    fffffff9 <_EIP+0xfffffff9>
c0136170
Code;  c0136180
   9:   8b 40 28                  mov    0x28(%eax),%eax
Code;  c0136183
   c:   39 d0                     cmp    %edx,%eax
Code;  c0136185
   e:   75 f0                     jne    0 <_EIP>
Code;  c0136187
  10:   31 d2                     xor    %edx,%edx
Code;  c0136189
  12:   85 d2                     test   %edx,%edx


      We will need the full output from the OOPS message, which can be obtained in one of the following ways:

          o Copied down by hand (or from a digital picture), please remember we need the complete message and that this may sometimes be the only way to get the oops message.
          o Setting up a serial console to capture the message. This can be accomplished by connecting a null modem cable to the serial port of the machine and adding:

            console=ttyS0,115200 console=tty0

            to either the kernel line of grub or in an \"append=\" statement for lilo. Once this is done, on the other machine the null modem is attached to, run a terminal emulator such as \"minicom\" (linux) or \"hyperterminal\" (windows).

    * Mysterious Hangs, Freezes and Slowdowns:

      For hangs and freezes, we would like you to capture some information by enabling the sysrq key. This can be enabled by editing the file /etc/sysctl.conf and changing the line to read:

      kernel.sysrq = 1

      Enable it immediately by saving the file and running:

      # sysctl -p

      Once this is enabled, we will need the output from the following key combinations:
          o alt-sysrq-t
          o alt-sysrq-p
          o alt-sysrq-m
      * Please note that sysrq is the PrintScreen key.

      Please run alt-sysrq-p multiple times so that we can be sure to get output from all CPUs on the machine. Also, run alt-sysrq-m last as it has a possiblity of locking the box up harder then it already is. You may wish to use a serial console to capture the information. You will also want to ensure that we have at least 1 alt-sysrq-p from each CPU, denoted by a CPU: # line in the output. Note the first CPU is number 0.

    * Slowdowns:

      For general slowdowns we will first need to know the following:
          o What kind of load is the box under?
          o Are you running anything to produce this load?
          o If you stop running whatever may cause the load, does the slowness immediately go away?

            Next, we would like you to follow the following steps to gather some data for our engineers:

            1. Enable kernel profiling by turning on nmi_watchdog and allocating the kernel profile buffer. For example, add the following two items to the \"kernel\" line of /boot/grub/grub.conf (using grub):

                  profile=2 nmi_watchdog=1

            as in the following example:

                  kernel /vmlinuz-2.4.9-e.27smp ro profile=2 nmi_watchdog=1

            If using LILO, add the following to the global section (before the first image= line) of lilo.conf:
                  append=\"profile=2 nmi_watchdog=1\"
            and run lilo -v as root.
            Now you should be able to reboot.

            2. Create a shell script containing the following lines:

#!/bin/sh
while /bin/true; do
  echo;date
  /usr/sbin/readprofile -v -m /boot/System.map | sort -nr +2 | head -15
  /usr/sbin/readprofile -r
  sleep 5
done

          o Make the system demonstrate the aberrant behavior.
          o Run the following three commands simultaneously:

                  Execute the readprofile shell script above, redirecting its output to a file.
                  Execute vmstat 5 and redirect its output to a second file.
                  Execute top -d5 and redirect its output to a third file.

          o Attach the output files (preferably in gzip'd tar file format) to a web ticket that either you or a Red Hat Engineer has opened.


      You can open a web ticket with Red Hat support by logging into your www.redhat.com account in the Support and Docs section and selecting the Web Support button located under the \"Active Support Entitlements\" section.