IBM Power Systems S822LC Server Firmware

Applies to: S822LC (8335-GTB)

This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.

Contents

1.0 Systems Affected

1.1 Minimum ipmitool Code Level

1.2 Minimum Browser levels for BMC ASM (Advanced System Management) Console

1.3 Fix level Information on IBM Open Power Components and Operating systems

1.4 OPAL Bare Metal (EC16) only on 8335-GTB systems

1.5 NVMe flash adapter cards with feature codes #EC54 and #EC56 Supported

1.6 Minimum xCAT level 2.13.3 for use in firmware updates

2.0 Important Information

3.0 Firmware Information

3.1 Firmware Information and Description

4.0 Operating System Information

4.1 Linux Operating System

4.2 How to Determine the Version of a Linux Operating System

4.3 How to Determine if the opal-prd (Processor Recovery Diagnostics) package is installed

5.0 How to Determine The Currently Installed Firmware Level

6.0 Downloading the Firmware Package

7.0 Installing the Firmware

7.1 IBM Power Systems Firmware maintenance

7.2 Updating the System Firmware with ipmitool

7.2.1 Return codes from the ipmitool "hpm upgrade" command

7.3 Installing ipmitool on Ubuntu

7.4 Updating the System Firmware using the BMC Advanced System Management (ASM)

7.5 Getting I/O Adapter fixes

8.0 Troubleshooting

8.1 Beginning troubleshooting and problem analysis

8.2 Supporting diagnostics

9.0 System Management

9.1 BMC Service Processor IPMI and ASM Access

9.2 Open Power Abstraction Layer (OPAL)

9.3 Intelligent Platform Management Interface (IPMI)

9.4 Petitboot bootloader

10.0 Quick Start Guide for Installing Linux on LC servers

11.0 Change History

1.0 Systems Affected

This package provides firmware for Power Systems LC S822LC (8335-GTB) server only.

The firmware level in this package is:

OP820.30 - PNOR OP8_v1.12_2.96 / BMC 2.13.104548

1.1 Minimum ipmitool Code Level

This section specifies the "Minimum ipmitool Code Level" required by the System Firmware to perform firmware installations and managing the system. Open Power requires ipmitool level v1.8.17 to execute correctly on the OP820 firmware, specifically for the ipmitool code update function.

Although the recommended level of ipmitool is 1.8.17 (for out of band and in-band firmware updates), the level of ipmitool in the supported Linux distributions will be less than this level and these have been tested and no problems found. At the user's discretion, ipmitool at these levels may be used. However, if ipmitool problems are encountered, the first step of debug will be to load the minimum supported level of ipmitool.

The following are the supported distribution levels and their associated ipmitool levels:

Verify your ipmitool level on your linux workstation using the following command:

bash-4.1$ ipmitool -V

 version 1.8.17

If you need to update or add ipmiitool to your Linux workstation , you can compile ipmitools (current level 1.8.17) for Linux as follows from the Sourceforge:

1.1.1 Download impitool tar from http://sourceforge.net/projects/ipmitool/ to your linux system

1.1.2 Extract tarball on linux system

1.1.3 cd to top-level directory

1.1.4 ./configure

1.1.5 make

1.1.6 ipmitool will be under src/ipmitool

You may also get the ipmitool package directly from your workstation linux packages such as Ubuntu 16.04 (which will be less than the recommended minimum level):

 sudo apt-get install ipmitool

1.2 Minimum Browser levels for BMC ASM (Advanced System Management) Console

The BMC ASM is a web-based application that works within a browser. Supported browser levels are shown below with Chrome being the preferred browser:

Note: BMC Dashboard shows an incorrect level for the BIOS caused by improper translation of the level subfields. The Bios number should reflect the PNOR level for the system of "IBM-garrison-ibm-OP8_v1.11_2.19". In this case, the BIOS version should be 1.11_2.19 but shows as 1.17.19 instead with the "11_2" converted into the "17".

The Firmware Revision for the BMC firmware shows correctly as "2.13.58".

Here is an example output of the Dashboard with an errant BIOS Version:

Dashboard gives the overall information about the status of the device and remote server.

Device Information

Firmware Revision: 2.13.58

Firmware Build Time: Oct 26 2016 11:40:55 CDT

BIOS Version: 1.17.19

The correct BIOS (also known as PNOR) version can be displayed by selecting the System Firmware FRU, 47, on the BMC GUI and looking at field "Product Version" in the Product Information"..

1.3 Fix level Information on IBM Open Power Components and Operating systems

For specific fix level information on key components of IBM Power Systems LC and Linux operating systems, please refer to the documentation in the IBM Knowledge Center for the S822LC (8335-GTB):

http://www.ibm.com/support/knowledgecenter/P8DEA/p8hdx/8335_gtb_landing.htm

1.4 OPAL Bare Metal (EC16) only on 8335-GTB systems

For the 8335-GTB, PowerKVM is not supported. Linux Ubuntu or Red Hat Enterprise distributions may be installed on OPAL Bare Metal (EC16).

1.5 NVMe flash adapter cards with feature codes #EC54 and #EC56 Supported

The NVMe flash adapter card with Feature Code #EC54 or #EC56 is supported on the 8335-GTB.

Booting from the NVMe flash adapter is not supported on the 8335-GTB.

For information about service tools for a PCIe3 1.6 TB NVMe Flash adapter, see PCIe3 1.6 TB NVMe Flash Adapter (FC EC54; CCIN 58CB):

http://www.ibm.com/support/knowledgecenter/POWER8/p8hcd/fcec55.htm

For information about service tools for a PCIe3 3.2 TB NVMe Flash adapter, see PCIe3 3.2 TB NVMe Flash Adapter (FC EC56; CCIN 58CC):

www.ibm.com/support/knowledgecenter/POWER8/p8hcd/fcec57.htm

1.6 Minimum xCAT level 2.13.3 for use in firmware updates

If using xCAT on the host OS to do firmware updates, the minimum xCAT level that should be used is 2.13.3 because it has stability improvements for the firmware updates process. See the xCAT 2.13.3 release notes for more information: https://github.com/xcat2/xcat-core/wiki/XCAT_2.13.3_Release_Notes

2.0 Important Information

Downgrading firmware from any given release level to an earlier release level is not recommended.

If you feel that it is necessary to downgrade the firmware on your system to an earlier release level, please contact your next level of support.

Filename

Size

Checksum

8335GTB_820.1923.20190613n_update.hpm

67109473

ed66653d81608d8489ae8228e413e0cf

Concurrent Firmware Updates not available for LC servers.

Concurrent system firmware update is not supported on LC servers.

3.0 Firmware Information

Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.

For the LC server systems, the installation of system firmware is always disruptive.

3.1 Firmware Information and Description

The update.hpm file updates the primary side of the PNOR and the primary side of the BMC only, leaving the golden sides unchanged.

Note: The Checksum can be found by running the Linux/Unix/AIX md5sum command against the Hardware Platform Management (hpm) file (all 32 characters of the checksum are listed), ie: md5sum <filename>

After a successful update to this firmware level, the PNOR components and BMC should be at the following levels:

 The ipmitool "fru" command can be used to display FRU ID 47: "ipmitool -H bmc_ip_ipaddress -I lanplus -U ipmi_user -P ipmi_password fru print 47".

And the BMC command line command "cat" can be used to display the BMC level file: "cat /proc/ractrends/Helper/FwInfo".

Note: FRU information for the PNOR level does not show the updated levels via the fru command until the system has been booted once at the updated level.

PNOR firmware level: driver content

Display pnor FW level using this ipmitool

# cmd ipmitool -H bmc_ip_ipaddress -I lan -U ipmi_user -P ipmi_password fru print 47

System Firmware:

Product Name : OpenPOWER Firmware

Product Version :IBM-garrison-OP8_v1.12_2.96
Product Extra : op-build-v2.3-7-g99a6bc8
Product Extra : buildroot-2019.02.1-16-ge01dcd0
Product Extra :skiboot-v6.3.1
Product Extra : hostboot-p8-c893515-pd6f049d
Product Extra :  occ-p8-a2856b7
Product Extra : linux-5.0.7-openpower1-p8e31f00
Product Extra : petitboot-v1.10.3
Product Extra : machine-xml-c5c35cb
Product Extra : hostboot-binaries-hw041519a.opv23
Product Extra :capp-ucode-p9-dd2-v4

Display BMC FW level via ssh session on the BMC , using this

   command:  # cat /proc/ractrends/Helper/FwInfo  

FW_VERSION=2.13.104548

FW_DATE=Jun 14 2019

FW_BUILDTIME=10:04:45 CDT

FW_DESC=8335-GTB SRC BUILD RR9 06142019

FW_PRODUCTID=1

FW_RELEASEID=RR9

FW_CODEBASEVERSION=2.X

OP820

For Impact, Severity and other Firmware definitions, please refer to the below ‘Glossary of firmware terms’ url:

https://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs

8335GTB_820.1923 /

OP820.30

07/01/2019

New features and functions

  • In response to recently reported security vulnerabilities, this firmware update is being released to address Common Vulnerabilities and Exposures issue numbers CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754.  Operating System updates are required in conjunction with this FW level for CVE-2017-5753 and CVE-2017-5754.  This replaces an earlier firmware update for the same problem which was found to not be effective.  

  • Support was added for a Self-Boot Engine (SBE) validation during the IPL to verify that the firmware images are the shipped versions. 

  • Added BMC support to be able to detect Self Boot Engine (SBE) SEEPROM corruption 

  • Support has been removed from XIVE interrupt controller for the store EOI operation.  Hardware has limitations which would require a sync after each store EOI to make sure the MMIO operations that change the ESB state are ordered. This would be performance prohibitive and the PCI Host Bridges (PHBs) do not support the synchronization. 

  • Support was added to recognize a port parameter in the URL path for the Preboot eXecution Environment (PXE) in the ethernet adapters.  Without the fix, there could be PXE discovery failures if a port was specified in the URL for the PXE.   

System firmware changes that affect all systems

  • A security problem was fixed to prevent host programs from being able to corrupt the BMC using the internal software bridges between the host and BMC.  The Common Vulnerabilities and Exposures issue number is CVE-2019-6260. 

  • A security problem was fixed to detect and prevent Self Boot Engine (SBE) SEEPROM corruption.   The Common Vulnerabilities and Exposures issue number is CVE-2018-8931. 

  • A problem was fixed for system hangs for early fails that occur in Hostboot.  With the fix, the early fails are handled and recovery attempted to allow the IPL to succeed. 

  • A problem was fixed for the power capping range allowed for the user.  OCC provides two limits for minimum powercap. One being hard powercap minimum which is guaranteed by OCC and the other one is a soft powercap minimum which is lesser than hard-min and may or may not be asserted due to various power-thermal reasons. So to allow the users to access the entire powercap range, this fix exports soft powercap minimum as the “powercap-min” DT property. And it also adds a new DT property called “powercap-hard-min” to export the hard-min powercap limit. 

  • A problem was fixed for lost output on the console when the OS is stopping or rebooting.  With the fix, the console output is always flushed before stopping the system. 

  • A problem was fixed for the AST VGA device which could sometimes fail to initialize when the vendor ID for the device was parsed incorrectly.   

  • A problem was fixed for a system hang that could occur while printing with system debug options and having a active user on the console. 

  • A problem was fixed for an intermittent opal-prd crash that can happen on the host OS.  This is the fault signature:  " opal-prd[2864]: unhandled signal 11 at 0000000000029320 nip 00000 00102012830 lr 0000000102016890 code 1" 

  • A problem was fixed for diagnostic code trying to read sensor values for PCI Host Bridge (PHB) entries that are unused, which causes debug output to have incorrect values for the unused entries.  With the fix, only the used entries are processed by the diagnostic code. 

  • A problem was fixed for Petitboot exiting to the shell with xCAT genesis in the menu when trying to do a network boot.  Petitboot was timing out when trying to access the ftpserver but it was not doing the network re-queries necessary for a proper retry.  If this error happens on a system, it can be made to boot with the following two steps: 

1) Type the word "exit" and press enter key.  This brings it back to petitboot menu.

2) Press the enter key again to start the boot of the xCAT image.

  • A problem has been fixed for a slow start up of a process that can occur when the system had been previously in an idle state. 

  • A problem has been fixed for a TOD error that can cause a soft lockup of the kernel. A 'soft lockup' is defined as a bug that causes the kernel to loop in kernel mode for  ore than 20 seconds, without giving other tasks a chance to run. The current stack  race is displayed upon detection and, by default, the system will stay locked up. 

  • A problem has been fixed to add part and serial numbers to the processors when accessed through the device tree. 

  • A problem has been fixed to make the OS aware of the DARN random number generator at 0x00200000 PPC_FEATURE2_DARN) and the SCV syscall at 0x00100000 (PPC_FEATURE2_SCV).  Without this fix, these service constants are not defined in the OS userspace. 

  • A problem was fixed for Coherent Accelerator Processor Proxy (CAPP) mode for the PCI Host Bridge (PHB) to improve DMA write performance by enabling channel tag streaming for the PHB.  With this enabled, the DMA write does not have to wait for a response before sending a new write command on the bus.  

  • A problem was fixed for the Open-Power Flash tool "pflash" failing with a blocklevel_smart_erase error during a pflash.  This problem is infrequent and is triggered if pflash detects a smart erase fits entirely within one erase block. 

  • A problem was fixed in the Petitboot user interface to handle cursor mode arrow keys for the VT100 'application' cursor to prevent mis-interpreting an arrow key as an escape key in some situations.  For more information on the VT100 cursor keys, see http://www.tldp.org/HOWTO/Keyboard-and-Console-HOWTO-21.html. 

  • A problem was fixed in the Petitboot user interface to cancel the autoboot if the user has exited the Petitboot user interface.  This prevents the user dropping to the shell and then having the machine boot on them instead of waiting until the user is ready for the boot.  

  • A problem was fixed in the Petitboot parsing of manually-specified configuration files that caused the parser to create file paths relative to the downloaded file's path, not the original remote path. 

  • A problem was fixed for a flood of OPAL error messages that can occur for a processor fault.  The message "CPU ATTEMPT TO RE-ENTER FIRMWARE" appears as a large group of messages and precede the relevant error messages for the processor fault.  A reboot of the system is needed to recover from this error. 

  • A problem was fixed for a skiboot hang that could occur rarely for a i2C request if the i2c  bus is in error or locked by the On-Chip Controller (OCC). 

  • A problem was fixed for an OS reboot after a shutdown that intermittently fails after the shutdown.  This can happen if the BMC is not ready to receive commands.  With the fix, the messages to the BMC are validated and retried as needed.  To recover from this error, the system can be rebooted from the BMC interface. 

  • A problem was fixed for a kernel hard lock up that could occur if IPMI synchronous messages were sent from the OS to BMC while the BMC was rebooting.  For these type of messages, a processor thread remains waiting in OPAL until a response is returned from the BMC. 

8335GTB_820.1742 / OP820.21

01/12/18

New features and functions for MTM 8335-GTB:

    • In response to recently reported security vulnerabilities, this firmware update is being released to address Common Vulnerabilities and Exposures issue numbers CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754. Operating System updates are required in conjunction with this FW level for CVE-2017-5753 and CVE-2017-5754.

System firmware changes that affect all systems

    • A problem was fixed for systems losing performance and going into Safe mode (a power mode with reduced processor frequencies intended to protect the system from over-heating and excessive power consumption). This happened because of an On-Chip Controller (OCC) internal queue overflow. The problem has only been observed for systems running heavy workloads with maximum memory configurations (where every DIMM slot is populated - size of DIMM does not matter), but this may not be required to encounter the problem. Recovery from Safe mode back to normal performance can be done with a re-IPL of the system.

    • A problem was fixed for correctly showing the Chassis Serial number in the "lshw" command output instead of incorreclty showing the Board Serial number. This regression error was introduced in the OP820.10 service pack.

    • A problem was fixed in OPAL skiroot for systems, configured to boot from a Non-Volatible Memory express (NVME) adapter, failing to reboot after the XCAT rpower reset command. Some of the failed systems will be present with a mon prompt showing a Program Check; other systems will complete the boot process but have no usable network interfaces. The systems can recovered by doing a power off and a IPL.

    • A problem was fixed on the BMC for java applet failures when using the BMC JViewer. To resolve the problem, the BMC JDK was updated. The applet failures had the following message: "Error: Unsigned application requesting unrestricted access to system. The following resource is signed with a weak signature algorithm MD5withRSA and is treated as unsigned: http://lc-pls1605c-con.wellsfargo.com:80/Java/release/JViewer.jar"

8335GTB_820.1642 / OP820.10

04/28/17

New features and functions for MTM 8335-GTB:

    • Support was added for HTTP(S) proxies when downloading resources during the Petitboot. For example, this support allows the user to set HTTP(S) proxies for use when loading configuration or boot files.

    • Support for expanded time out options for the BMC web gui/KVM sessions. The old time out range (300 - 1800 seconds) has been changed as follows: 1) Increased the maximum timeout from 1800 seconds to 1296000 seconds, i.e., basically increased the maximum timeout to 15 days. 2) Decreased the minimum timeout from 300 seconds to 0. If the timeout value is 0, then it is considered an infinite timeout. KVM will not timeout and that in-turn will also keep the BMC web gui running too.

    • Support for 2200w redundant power supply.

    • Support was added for booting from the PCIe3 2-port 100GB Ethernet ConnectX-4 EN (NIC and RoCE) QSFP28 adapter with feature code #EC3L. The #EC3L feature adapter can be obtained using RPQ #8A2399 for the 8335-GTB system.

    • Support was added to allow booting from a PCIe Non-Volatile Memory express (NVMe) adapter. The adapters affected are Feature code #EC54 and #EC56 NVMe flash adapters with CCIN 58CB and 58CC respectively.

    • Support was added for detecting and logging a SEL for power supply fan faults and turning on the system attention LED.

    • Support was enhanced in the BMC LDAP configuration: 1) Added support for allowing the hostname in any of LDAP fields that currently only allow IP addresses. 2) Added support for BIND names greater than 64 characters.

    • Support was enhanced for the configuration of the System and Audit Log settings: 1) Log settings were enhanced to support "local Logs" and "remote Logs" at the same time. 2) The "Server Address" field was enhanced to have a custom port number in addition to the default 514 port. 3) The Syslog System log was enhanced to support a TCP configuration in addition to the UDP configuration that was already supported.

    • Support was changed to for the BMC SMTP configuration to remove the "machine name" field since a SMTP relay server cannot be configured.

    • Support was enhanced for the BMC SMTP configuration to allow the "Server Address" field to have a symbolic host name with a domain name or an IP address as was supported.

    • Support was enhanced for the BMC DNS configuration to allow host names to have a "." included in the name and to allow the host names to be greater than 15 characters.

System firmware changes that affect all systems

    • A problem was fixed for error handling in complete resets for the PCI Host Bridge (PHB). During a complete reset, there can be a timeout waiting for a pending transaction, resulting in the PHB being marked as broken and the reset is not completed, leaving the adapters in an error state. With the fix, the PHB is fenced and the Linux kernel can retry the complete reset.

    • A problem was fixed for a missing device discovery message and overly verbose output messages during the boot. The output is now less verbose during the boot - only error-level messages are printed during Petitboot bootloader initialization. This means that there will be fewer messages printed as the system boots. Additionally, the Petitboot user interface is started earlier in the boot process. This means that the user will be presented with the user interface sooner, but it may still take time, potentially up to 30 seconds, for the user interface to be populated with boot options as storage and network hardware is being initialized. During this time, Petitboot will show the status message "Info: Waiting for device discovery". When Petitboot device discovery is completed, the following status message will be shown "Info: Connected to pb-discover!".

    • A problem was fixed for Java error messages being displayed when logging into the BMC web gui. The Remote Console Preview window is no longer displayed on the dashboard which was causing all the extra Java error messages.

    • A problem has been fixed for the system MAC address being cleared with zeroes on an AC power cycle.

    • A problem was fixed for a GPU hang where NVLink (#EC4c, #EC4D, or #EC4F) is being used.

    • A problem was fixed for SEL "Progress | Unknown Progress |Asserted" during the firmware boot. This error can occur during any boot and can be ignored.

    • A problem was fixed for "Preserve All Configuration" not working for the BMC web gui HPM update.

    • A problem was fixed for not being able to change the IP address settings via Petitboot. Fixed a timing issue with the BCM5421 Controller. Without this fix, the timing issue would lead to BMC not getting any network even though all the network parameters/configuration are proper.

    • A problem was fixed for PCIe Non-Volatile Memory express (NVMe) adapters not showing up after a power on of the system. The adapters affected are Feature code #EC54 and #EC56 NVMe flash adapters with CCIN 58CB and 58CC respectively. These flash adapters are not supported on the IBM Power System S822LC (8335-GTA).

    • A problem has been fixed for Fault LEDs returning to the off state after an AC cycle when a FRU had failed and was guarded. With the fix, after an AC power cycle and next power on, the Fault LED will turn on again if the FRUs are guarded.

    • A problem was fixed for a missing SEL log for a fault on the second power supply that caused OCC 1 and 2 "Active Device Disabled" and Quick Pwr Drop 1 and 2 "State Asserted" console messages.

    • A problem has been fixed for systems losing performance and going into safe mode because of On-Chip Controller (OCC) timeouts collecting Analog Power Subsystem Sweep (APSS) data. This data is used by OCC to tune the processor frequency. This problem occurs more frequently on systems with large configurations that are running heavy workloads.

    • A problem has been fixed for processor checkstops that occur after a processor has gone into sleep mode. The fix for this problem required nap mode be substituted for sleep mode. While this significantly reduces the power savings that might be achieved on idle systems, it is not expected to make much of a power consumption difference on system running HPC workloads.

    • A problem was fixed for a security issue on the BMC login.

    • A problem was fixed for GPU power faults preventing the IPL of the system. With the fix, the system will IPL with GPUs that have power faults. From the OS, the unavailable GPUs can be identified for repair.

8335GTB_820.1636 / OP820.01

11/02/16

New features and functions for MTM 8335-GTB:

    • Support for the Red Hat Enterprise Linux 7.3 OS as a OPAL bare-metal install. For more information on the features delivered with RHEL7.3, see the Red Hat information portal: https://access.redhat.com/documentation/en/red-hat-enterprise-linux/.

    • Support was added to allow booting from a PCIe Non-Volatile Memory express (NVMe) adapter. The adapters affected are Feature code #EC54 and #EC56 NVMe flash adapters with CCIN 58CB and 58CC respectively.

    •    Support for a OPAL raw console to receive output from the PowerPC boot EPAPR (Embedded Power Architecture Platform Requirements) wrapper. This allows decompression failures inside the wrapper caused by data corruption to be reported to the user.

System firmware changes that affect all systems

    • A problem was fixed for ipmitool "mc reset cold" failing to reset the the BMC service processor. The ipmitool "mc reset cold" will stop working after the user issues power on when system is already on or after user tries to do power off when system is already powered off. The problem circumvention is to run the ipmitool "mc reset warm" command to the BMC which will restart the service processor IPMI process and clear internal flags that are preventing "mc reset cold" from working.

    • A problem was fixed for slow IPMI Serial Over LAN (SOL) console connection to the server on the BMC. The problem was triggered by an incorrect handling of the definition bits for VGA and serial console output.

    • A problem was fixed for the IPMI Serial Over LAN (SOL) console to the Petitboot user interface for the left and right arrow movements. When editing the command line for the kernel, the user could not go to the start of the line and then go forward one character at a time.

    • A problem was fixed for the BMC integrated ethernet adapter BCM5421 connection speed being downgraded to 100 Mb/s instead of running at the expected 1000 Mb/s for the petitboot and the Linux OS. After the fix is applied, an A/C power cycle of the system is needed to activate the fix.

    • A problem was fixed for a kexec-hardboot reboot of the system that caused USB devices to be lost. A system power cycle is needed to recover the USB devices when this error occurs.

    • A problem was fixed for the shutdown of PCI devices that was causing spurious reboots of the system for a power off. The logical PCI devices are now removed during the shutdown.

    • A problem was fixed for failures that happen when multiple Hypervisor Virtual Console (HVC) are active at the same time. On machines with more than one HVC console, any console after the first failed to register an interrupt handler since all consoles shared the same IRQ number.

    • A problem was fixed for fundamental PCI resets at boot time causing the PCI adapters to not be usable in the Linux OS. No errors occur in the skiboot but the adapters are configurable once the OS is reached.

    • A problem was fixed for time-out errors during the power off of PCI slots with " Timeout powering off slot ... FIRENZE-PCI: Wrong state 00000000 on slot" error message during a power off of the system.

    • A problem was fixed for the system remaining in "safe" mode after an On-Chip Controller (OCC) reset. In "safe" mode, the system is running at reduced processor frequencies, affecting system performance. The OCC reset is an error recovery command that can be requested by the BMC or OPAL for certain OCC errors.

    • A problem was fixed for a false kernel error message during the boot when configuring the Nvidia GPUs. This message can be ignored:      " [ 0.000000] pnv_ioda_reserve_pe: Invalid PE 4 on PHB#4 [ 0.000000] pnv_ioda_reserve_pe: Invalid PE 4 on PHB#5".

8335GTB_820.1634 / OP820.00

09/26/16

Impact:  New      Severity:  New

New features and functions for MTM 8335-GTB:

GA Level

NOTE:

This firmware release only supports OPAL firmware. It does not support running the PowerVM hypervisor.

System firmware changes that affect all systems

    • A problem was fixed for looping error processing for some hardware failures where OPAL-PRD becomes unresponsive. The loop has been fixed to prevent repeated error messages and system slow-down. The error message "0xdeadbeefdeadbeef" was added so it is known when the error handling in OPAL-PRD has failed.

    • An error message was changed for a problem where the SLW (Sleep Winkle) timer gets stuck and the firmware falls back to OPAL pollers. The previous error message was "Stuck with odd generation !". The new message to the SOL console is "SLW : timer stuck, falling back to OPAL pollers. You will likely have slower I2C and may have experienced increased jitter." These messages can be safely ignored at this time until a future firmware release resolves the issue of the stuck timer. The error only occurs when running test procedures that stress the hardware.

4.0 Operating System Information

IBM Power LC servers supports Linux which provides a UNIX like implementation across many computer architectures. Linux supports almost all of the Power System I/O and the configurator verifies support on order. For more information about the software that is available on IBM Power Systems, see the Linux on IBM Power Systems website: http://www.ibm.com/systems/power/software/linux/index.html

 

4.1 Linux Operating System

The Linux operating system is an open source, cross-platform OS. It is supported on every Power Systems server IBM sells. Linux on Power Systems is the only Linux infrastructure that offers both scale-out and scale-up choices. The minimum supported version of Linux on the 8335-GTB server is Ubuntu Server 16.04 TLS for IBM POWER8 with 16.04.1 being the recommended level.

For more information about Ubuntu Server for Ubuntu for POWER8 see the following website:

http://www.ubuntu.com/download/server/power8

Other supported versions of Linux on the 8335-GTB server are the Red Hat Enterprise Linux 7.2 / 7.3 and later releases, in LE modes. For additional questions about the availability of this release and supported Power servers, consult the Red Hat Hardware Catalog:

https://hardware.redhat.com

For information about the PowerLinux Community, see the following website:

https://www.ibm.com/developerworks/group/tpl

For information about the features and external devices that are supported by Linux, see this website:

http://www.ibm.com/systems/power/software/linux/index.html

4.2 How to Determine the Version of a Linux Operating System

Use one of the following commands at the Linux command prompt to determine the current Linux version:  

cat /proc/version

 uname -a

The output string from the command will provide the Linux version level.

4.3 How to Determine if the opal-prd (Processor Recovery Diagnostics) package is installed

The opal-prd package on the Linux system collects the OPAL Processor Recovery Diagnostics messages to log file syslog. It is recommended that this package be installed if it is not already present as it will help with maintaining the system processors by alerting the users to processor maintenance when needed.

On Ubuntu Linux, perform command dpkg -l "opal-prd". The output shows whether the package is installed on your system by marking it with ii (installed) and un (not installed).

This package provides a daemon to load and run the OpenPower firmware's Processor Recovery Diagnostics binary. This is responsible for run-time maintenance of Power hardware.

If the package is not installed on your system, the following command can be run on Ubuntu to install it:

sudo apt-get install opal-prd

On Red Hat Linux, perform command "rpm -qa | grep -i opal-prd ". The command output indicates the package is installed on your system if the rpm for opal-prd is found and displayed.

This package provides a daemon to load and run the OpenPower firmware's Processor Recovery Diagnostics binary. This is responsible for run-time maintenance of Power hardware.

If the package is not installed on your system, the following command can be run on Red Hat to install it:

sudo yum update opal-prd

5.0 How to Determine The Currently Installed Firmware Level

Use the ipmtool "fru" command or the BMC Advanced System Management (ASM) FRU option to look at product details of FRU 47.

ipmitool -I lanplus -H <bmc host IP address> -P admin -U ADMIN fru print 47

6.0 Downloading the Firmware Package

Follow the instructions on Fix Central. You must read and agree to the license agreement to obtain the firmware packages.

7.0 Installing the Firmware

7.1 IBM Power Systems Firmware maintenance

The updating and upgrading of system firmware depends on several factors, such as the current firmware that is installed, and what operating systems is running on the system.

These scenarios and the associated installation instructions are comprehensively outlined in the firmware section of Fix Central, found at the following website:

http://www.ibm.com/support/fixcentral/

Any hardware failures should be resolved before proceeding with the firmware updates to help insure the system will not be running degraded after the updates.

Run this command "hpm check" before starting an update to understand what is going to be updated:

ipmitool -H <BMC IP> -U ADMIN -I lanplus -P admin hpm check <xxxxx.hpm> where xxxx.hpm is the firmware update file name.

7.2 Updating the System Firmware with ipmitool

Firmware update steps for the LC servers can be managed via the command line with ipmitool command.

https://www.ibm.com/support/knowledgecenter/POWER8/p8ei8/p8ei8_update_firmware_ipmi.htm

Note: You can find the latest steps for this procedure at the IBM Knowledge Center:

1) Power off the machine - install code from Standby Power state:

 ipmitool -H <hostname> -I lanplus -U ADMIN -P admin chassis power off

2a) Issue bmc reset to establish a stable starting point:

 ipmitool -H <BMC IP> -I lanplus -U ADMIN -P admin mc reset cold

2b) Wait for 2 to 3 minutes after the "mc reset cold" and check for BMC ready before continuing with step 3. Check BMC ready state and look for a response of 00h (booting complete or aka BMC ready state). If response is C0h, BMC is still booting. Wait for the 00h response before continuing with step 3:

 ipmitool -H <BMC IP> -I lanplus -U ADMIN -P admin raw 0x3a 0x0a

  1. 3) Run the following command (a) to preserve IPMI and network settings. Alternatively, some users will need to use command (b) instead to do a full save of settings (IPMI, network, SSH, and authentication):

(a) ipmitool -H <BMC IP> -I lanplus -U ADMIN -P admin raw 0x32 0xba 0x18 0x00

OR

(b) ipmitool -H <BMC IP> -I lanplus -U ADMIN -P admin raw 0x32 0xba 0x98 0x02

Here is the definition of the raw 0xba command bytes for configuring preservation bits in case further customization is needed for this

command:

# ipmitool -H <BMC IP> -I lanplus -U ADMIN -P admin raw 0x32 0xba 0x98 0x02 where

Byte[1] = 0x98

Byte[2] = 0x02

 ipmitool -H <BMC IP> -U ADMIN -I lanplus -P admin -z 15000 hpm upgrade <xxxxx.hpm> force

  1. 5) If the BMC is using STATIC IP network settings and these get lost it is possible to restore them with the following command line steps(DHCP settings should auto-recover):

 /usr/local/bin/ipmitool -H 127.0.0.1 -I lanplus -U ADMIN -P admin lan set 1 ipsrc static

  /usr/local/bin/ipmitool -H 127.0.0.1 -I lanplus -U ADMIN -P admin lan set 1 ipaddr x.x.x.x

             /usr/local/bin/ipmitool -H 127.0.0.1 -I lanplus -U ADMIN -P admin lan set 1 netmask 255.255.x.x

            /usr/local/bin/ipmitool -H 127.0.0.1 -I lanplus -U ADMIN -P admin lan set 1 defgw ipaddr x.x.x.x

 6) Power on and IPL the machine:

 ipmitool -H <hostname> -I lanplus -U ADMIN -P admin chassis power on

7.2.1 Return codes from the ipmitool "hpm upgrade" command

The "hpm upgrade" returns a "0" return code on success and a "-1" return code for any type of failure. To get more error information, the user must go to the /var/log/notice.log on the system where the ipmitool was run and there will be an error message that corresponds to the "-1" returned in the ipmitool.

Below are possible error messages that can be generated for a failure in the command:

HpmfwupgValidateImageIntegrity: Validate Image failure = "Invalid MD5 signature" or "Invalid image signature" or "Unrecognized image version" or " Invalid header checksum".

HpmfwupgPreparationStage: Performing Preparation Stage = "Invalid image file for manufacturer", Invalid image file for product , Invalid device ID

Version not compatible for upgrade ="Version: Major x1, Minor: y1 Not compatible with Version: Major: x2 Minor: y2"

HpmfwupgPreUpgradeCheck: Pre-upgrade check

HpmfwupgUpgradeStage: Upgrade Stage - Activation = "Self test failed: Result1 = xx, Result2 = yy"

7.3 Installing ipmitool on Ubuntu

Open Power requires Source Forge ipmitool level v1.8.17 to execute on the OP820 firmware. But the level of ipmitool shipped with Ubuntu 16.04 is lower than this and no problems were found testing it. This lower level may be used at the user's discretion, knowing that v1.8.17 is the supported level. The next step shows how to install ipmitool from an Ubuntu distribution:

sudo apt-get install ipmitool

7.4 Updating the System Firmware using the BMC Advanced System Management (ASM)

One method to update the System Firmware on the LC server is to use the Advanced System Management browser GUI. The Chrome browser must be used for this method as there are problems in this release with using Firefox or IE where the firmware update will fail.

Note: You can find the latest steps for this procedure at the IBM Knowledge Center:

http://www.ibm.com/support/knowledgecenter/POWER8/p8ei8/p8ei8_update_firmware_bmc.htm

  1. 1.First you have to connect to the BMC Service Processor Interface. Use your browser and access the BMC service processor with it's configured IP address. 

  2. 2.After the successful login, the "Advanced System Management Dashboard" will be displayed. This is the common screen for multiple activities that can be performed such as configuration, FRU information and firmware updates. General information regarding the current power consumption, sensor monitoring, event logs is displayed. 

  3. 3.The next step is to select the Firmware Update Menu. 

  4. 4.Then select the correct firmware update image type. Please select the HPM type for firmware updates. This is the only type that will be provided by the IBM Fix Central site which would have been downloaded to your workstation earlier.  

  5. 5.Now select the firmware update file from where it was stored when down loaded to the web browser. 

  6. 6.When the correct firmware image is selected, the GUI will show a list of components that will be updated. By default all the components will be selected. When the Proceed button is pressed, the firmware update will finally be performed 

  7. 7.After the firmware is completed, the System will perform a reboot. 

  1.  

7.5 Getting I/O Adapter fixes

Use the IBM Knowledge Center to learn how to obtain and apply I/O adapter firmware fixes.

https://www.ibm.com/support/knowledgecenter/POWER8/p8ei8/p8ei8_update_io_kick.htm

8.0 Troubleshooting

Use the IBM Knowledge Center to learn how to troubleshoot LC server problems:

http://www.ibm.com/support/knowledgecenter/POWER8/p8hdx/p8_troubleshootingsystem_8335_gtb.htm

8.1 Beginning troubleshooting and problem analysis

Use the IBM Knowledge Center as a starting point for analyzing problems:

http://www.ibm.com/support/knowledgecenter/POWER8/p8ei3/p8ei3_kickoff.htm

8.2 Supporting diagnostics

Use the IBM Knowledge Center to supplement your problem analysis skills with resources that help you identify, diagnose, and report system issues:

 http://www.ibm.com/support/knowledgecenter/POWER8/p8ei8/p8ei8_diags_kickoff.htm

9.0 System Management

The service processor, or baseboard management controller (BMC), provides an operating system-independent layer that uses the robust error detection and self-healing functions that are built into the POWER8 processor and memory buffer modules. Open Power Abstraction Layer (OPAL) is the system firmware that provides this service in the stack of POWER8 processor-based Linux-only servers.

9.1 BMC Service Processor IPMI and ASM Access

The service processor, or baseboard management controller (BMC), is the primary control for autonomous sensor monitoring and event logging features on the 8335-GTB server. The BMC supports the Intelligent Platform Management Interface (IPMI) for system monitoring and management. The BMC monitors the operation of the firmware during the boot process and also monitors OPAL for termination. The firmware code update is supported through the BMC and Intelligent Platform Monitoring Interface (IPMI) and the Advanced System Management (ASM) console. The ASM console is accessed using a web browser with a "http:" connection to port. See section 1.2 for the supported browsers that can be used with ASM.

For more information on using the BMC ASM, see the IBM Redbook PDF file for the IBM

Power System S822LC Technical Overview and Introduction - section 2.3 Reliability, availability, and serviceability":, http://www.redbooks.ibm.com/redpieces/pdfs/redp5405.pdf.

9.2 Open Power Abstraction Layer (OPAL)

The Open Power Abstraction Layer (OPAL) provides hardware abstraction and run time services to the running host Operating System.

For the 8335-GTB, Ubuntu or Redhat Linux may be installed to run on OPAL Bare Metal (F/C #EC16).

Find out more about OPAL skiboot here:

https://github.com/open-power/skiboot

9.3 Intelligent Platform Management Interface (IPMI)

The Intelligent Platform Management Interface (IPMI) is an open standard for monitoring, logging, recovery, inventory, and control of hardware that is implemented independent of the main CPU, BIOS, and OS. It is the default console to use. The 8335-GTB provides one 10M/100M baseT IPMI port.

The ipmitool is a utility for managing and configuring devices that support IPMI. It provides a simple command-line interface to the service processor. You can install the ipmitool from the Linux distribution packages in your workstation, sourceforge.net, or another server (preferably on the same network as the installed server). For example, in Ubuntu, use this command:

$ sudo apt-get install ipmitool

For installing ipmitool from sourceforge, please see section 1.1 "Minimum ipmitool Code Level".

For more information about ipmitool, there are several good references for ipmitool commands:

# ipmitool help

# ipmitool channel help

For a list of common ipmitool commands and help on each, you may use the following link:

https://www.ibm.com/support/knowledgecenter/linuxonibm/liabp/liabpcommonipmi.htm

To connect to your host system with IPMI, you need to know the IP address of the server and have a valid password. To power on the server with the ipmitool, follow these steps:

  1. 1.Open a terminal program. 

  2. 2.Power on your server with the ipmitool: ipmitool -I lanplus -H fsp_ip_address -P ipmi_password power on 

  3. 3.Activate your IPMI console: ipmitool -I lanplus -H fsp_ip_address -P ipmi_password sol activate 

For more help with configuring Linux on Power Systems server see the following: https://www.ibm.com/support/knowledgecenter/linuxonibm/liabp/liabpusingipmi.htm

9.4 Petitboot bootloader

Petitboot is a kexec based bootloader used by IBM POWER8 systems using the OPAL firmware.

After the POWER8 system powers on, the petitboot bootloader scans local boot devices and network interfaces to find boot options that are available to the system. Petitboot returns a list of boot options that are available to the system. If you are using a static IP or if you did not provide boot arguments in your network boot server, you must provide the details to petitboot.

You can configure petitboot to find your boot with the following instructions:

https://www.ibm.com/support/knowledgecenter/linuxonibm/liabp/liabppetitbootadvanced.htm

You can edit petitboot configuration options, change the amount of time before Petitboot automatically boots, etc. with these instructions:

https://www.ibm.com/support/knowledgecenter/linuxonibm/liabp/liabppetitbootconfig.htm

You can read more about the petitboot bootloader program here:

https://www.kernel.org/pub/linux/kernel/people/geoff/petitboot/petitboot.html

10.0 Quick Start Guide for Installing Linux on LC servers

This guide helps you install Ubuntu on a Linux on Power Systems server.

Overview

Use the information found in http://www.ibm.com/support/knowledgecenter/linuxonibm/liabw/liabwkickoff.htm to install Linux (Ubuntu) on a non-virtualized or bare metal IBM Power LC server.

The Ubuntu installer is available for download for specific Linux on Power Systems.

For information about which systems support Ubuntu, see Supported Linux distributions for POWER8 Linux on Power systems: http://www.ibm.com/support/knowledgecenter/linuxonibm/liaam/liaamdistros.htm

11.0 Change History

Date

Description

07/01/2019

Update for OP820.30

01/08/2018

Update for OP820.21

04/28/2017

Update for OP820.10

11/03/2016

Update for OP820.01

09/26/2016

New for LC server 8335-GTB – OP820.00 release