KB Logo

TOLIS Group Knowledge Base

Browse KB by category:
Go to KB #:
Glossary
Email   Bookmark



BRU Server Troubleshooting - Tape-Related Guidance

Views: 15193
Votes: 0
Posted: 29 Feb, 2008

BRU Server Identifies Hardware Issues

  • During Installation
    • Determines available tape drives and library via kernel interface
      • If the system knows your device, BRU Server knows your device
      • No device drivers required - future-proof compatibility
      • Automatically matches drives with library's - supports multiple devices without extra drivers or licenses
  • During Operation
    • Communicates with devices
      • During initialization
      • During backup, verify, or restore
      • During media changes
      • ... but NOT when idle

Important Considerations

  • Incompatible competitor products
    • Both RetroSpect® and BakBone® assign exclusive locks when running.  This prevents BRU Server from accessing the devices.
    • BRU Server will recognize the devices, but the initial device scan or an access attempt will fail.
    • If you install onto a system that is currently running either, the other product must be removed and its daemon processes stopped.
  • Low quality SCSI or Fibre-Channel cables
    • You might save money on low cost cables, but they will usually fail during the high-demand I/O that tape access requires.
    • Be sure to match your cables with your environment
      • Ultra SCSI II/III - 3 meter max cable length, active or passive termination.
      • Single Ended (SE) - 3 meter max cable length, active termination required.
      • Low Voltage Differential (LVD ) - 12 meter max cable length, active termination required.
      • If you mix device types, you must use the lowest cable limit by highest terminator type.
      • Usually dictated by your SCSI HBA
    • Attach tape devices to their own SCSI bus/channel
      • Because modern tape drives stream data at over 130MB/sec, a single LTO -4 drive will saturate an Ultra 320 SCSI channel during maximum I/O.
      • Discs on the same channel will get priority and actually cause a backup to run slower.
    • Use dedicated Fibre-Channel zoning
      • Tape devices are generally not multi-initiator aware or capable.
      • A dedicated zone keeps your backup traffic away from normal SAN disc activity.
    • Running a partitioned library in a multi-vendor environment
      • While partitioning a library allows it to be shared with multiple backup operations, be sure that the non-BRU Server software releases the robot when it is not in use.
      • Do not share slots.  Dedicate sections of the library to each software package and do not mix media.

Common Symptoms & Solutions

  • BRU Server doesn't identify any tape or library devices
    • Check the system profiler on mac and the /proc/scsi/scsi hierarchy under Linux.
    • Be sure the device was powered on BEFORE the system was powered up.
    • Check the cable and terminator if SCSI and check for proper zoning in Fibre-Channel setups.
  • Backups fail with write errors after a few MB 's
    • This is usually caused by a bad SCSI or FC environment.
      • Check cables and terminators - terminators do age and can start to fail after three (3) or four (4) years of utilization.
      • Improper cables lengths - double check the SCSI HBA for type (Ultra, SE, LVD/SE).
    • Less often, can be caused by bad media
      • Try a different tape.
      • BRU Server may report a TapeAlert condition.
    • Dirty drive assembly
      • Run cleaning cycle. May require more than one pass.
      • BRU Server will usually report a TapeAlert condition.
    • Reusing tape previously written by another software package
      • Some backup applications use a partitioning scheme to segment data on tape.
      • Use a different tape.
      • Use the original software to erase the tape.
      • Use LTO/VXA tool to erase a tape.
    • Verification fails and asks for tape n+1 tapes when n tapes were used in the backup.
      • This is usually caused when the BRU I/O engine runs into a hard read error on a previously written tape.  Because no end of archive (EOA) block was seen, the I/O engine assumes there is additional media.  BRU Server will report "Unexpected tape requested."
        • You may rerun the verify pass to double check the failure.
        • Initially suspect media and transmission layer (cables, terminators, connectors).
        • Check the system log for bus or device resets.
        • Check power to the library.
      • Can be caused by unexpected human intervention
        • Library or drive disconnected or reset/powered off
        • Tape manually ejected from the drive during the verify operation.
    • E-mail is not sent when interaction required
      • Interaction e-mail is sent to the Admin user email account.  Check the preferences in the Console for the Admin e-mail.
    • E-mail to listed users only goes to the first user listed
      • Check that the addresses are separated by a comma and a space.
        • account@server.com, another@server.com
      • Check that additional e-mail addresses are valid.
      • Check the job completion report for server connection failures.
    • No e-mail is sent on job completion or error
      • Check the job completion report for server connection failures.
        • Usually caused by the MX record not being available for the listed mail server. Check the DNS assignments.
        • Can be caused when the mail server refuses the connection of the BRU Server system because it is unknown. Check DNS or configure /etc/hosts on the mail server to recognize the BRU Server system's IP address .
    • Backups fail with message "hardware has changed since last scan
      • This is usually caused when devices are not powered on when the BRU Server daemon is started.  Be sure to power on all tape drives and libraries before starting the server daemon.
      • Can also be caused by the Fibre-Channel ID assigned changing since the last time the device was scanned.  This is usually because of zoning problems or new devices being added to the SAN zone.
      • Sometimes caused if a device's hardware ID (SCSI ID, LUN , serial number, or inquiry string) has changed.
      • Most can be resolved by rescanning the hardware using the BRU Server Config Tool application or from the Text Console (bru-server.cmd), however, not from the GUI Console.
    • Interaction note indicates that the tape cannot be returned to it's original slot
      • This is caused when either the library doesn't know which slot the tape is from, or when the library is full and a tape is in the drive (meaning 1 more tape than slots).
        • The first usually occurs if there is a tape in a drive when the library is power cycled or reset.  When this occurs, the library will report the tape in the drive as coming from non-existent slot "0".
        • The second occurs when there is a tape in a drive and the slots are refilled.
        • Both require human intervention before BRU Server can continue.
    • Previously written tapes are reported as non-BRU tapes
      • When reading the header of the tape, the process of loading the drive can take so long that the read attempt times out.  This results in a read error.
        • It is very important that the various timeout values for Online, Eject, and SCSI values be properly set depending on the environment. Check our web site for examples for different library and drive types.

 

Others in this Category
document How do I control tape devices with tapectl(tm)?
document Using USB Tape Drives on BRU Server for Linux
document BRU Server doesn't remain started or I'm getting a "gdbm fatal: read error" when BRU Server starts. What's causing this?
document Correcting "Error: (0x05:0x3B0D) Medium destination element full" error messages
document Restoring from tape with BRU Server using the command line. How is this done?
» More Articles



RSS
Powered by KnowledgebasePublisher
Page Load Time: 0.043537 seconds / 43.537 milliseconds.
Page File Size: 37395 bytes.