Emergency Procedures and Trouble-Shooting Tips


Campus Computing · University of Missouri - Columbia · 200 Heinkel Building · 314-882-2000

Update 16 August 1994 by ccpaulh@monad.missouri.edu. Subject to change!

Abstract
: ªTrouble-Shootingº is intended as quick reference for problems that may occur at the MU Campus Computing NeXT Lab. See document ªProceduresº for more detailed references. As always, when all else fails, RTFM. If you don't know what RTFM means, you probably will make matters worse by trying to fix the serious problems described here: get help!

Contents:

Sources of Information for Trouble-Shooting

See document Procedures!

See also files /LocalLibrary/Questions and especially /LocalLibrary/NextAnswers. These are indexed by Digital Librarian. Drag their Icons to the Digital Librarian Shelf if you want to add them to your search, or try double-clicking:

/LocalLibrary/Bookshelves/Everything.bshlf.

Note that the /LocalLibrary/NextAnswers contains both the general interest NextAnswers and the more nearly hardware-related ªbulletinº.

One station is not responding , but others seem ok

To try to save unsaved files, approach the problem gradually. There are several degrees of hung-ness.

Symptom: One application, for example Mathematica, is not responding, although other windows work ok.

Try this: In order of increasing damage...

Symptom: All applications are frozen, cursor won't move.

Try this: In order of increasing damage...

No station except server can do anything useful: Network down

This is generally caused by a problem in the ethernet cable connections. Ethernet cable must not be bent sharper than about a six-inch radius. Ethernet is a bus network, and is terminated by a resistor "terminator" at each end of the network. Each station must be connected via a T-connector, not plugged directly in. If either of the two terminators is off, or there is any gap at any point in the cable, then all network communication will fail! Cables and connections are susceptible to kicking, crimping, and general unitentional abuse by users and those who clean the room. Try to keep the cables where this is least likely to occur.

Some indications of cable problems: Only the server seems to be fully functional. The loginwindow of a client shows ªlocalhostº rather than the individual hostname.

To locate a faulty point in the cable, divide the network into smaller sets by moving the terminators. You may need to start with both terminators on the server, and spread out, or try a binary search along the length of the network. Use the "pinger" command to quickly determine when the network is functioning. This issues audible pings as long as the given workstation is connected to working ethernet. If pinger is not in your search path, enter the Unix command:

/usr/local/etc/pinger [workstationname]

Since you can't ping any workstation unless you can also ping your own, it is usually not necessary to specify a workstationname.

Floppy Disk Problems

There have been several unresolved problems that seem to have as common cause the use of DOS or Macintosh-formatted diskettes. Perhaps these will be addressed in the next system release. The symptoms are, system does not eject floppy disk that was already inserted or system does not recognize floppy disk when inserted. The cleanest way to fix this and minimize risk to floppy disk files is to logoff and reboot the machine (Command-~). Then if necessary, login and drag the disk to the recycler to eject it. If that doesn't work, use a paper clip to eject the disk and then reboot the machine so the next person won't have the same problem.

DOS-format filenames are limited to a eight-character name and three-character extension. This can affect some long Unix and NeXT filename conventions. For example, a WriteNow file that contains graphics is contained in a directory whose members on the NeXT have names like WNDocument.wn and WNGraphic.123456.eps. If these are copied to a DOS-format diskette, the names but not the contents of the files will be truncated, and thus will not open properly. Move them back to disk and fix up the names with the Unix mv command.

Using Macintosh disks is somewhat easier since the filenames are not so restriced. Macintosh filenames can be at most 31 characters long.

If one creates files on a PC and transports them via diskette to a NeXT, each line may contain an extra carriage return character (^M, octal 015). There also may be a ^Z at the end. This won't appear in Edit but may in vi and emacs. To remove all ^M and ^Z characters from a file, open a Unix shell (for example by double-clicking the Terminal application), then enter:

flip -u filename

Enter ªman flipº for information.

Users can't create disk files: Hard Disk Full

A hard disk partition on the server or a hard disk on one of the clients can fill.

Quick Checklist:

Preventative and General Corrective Measures:

Start worrying if the Unix ªdfº command indicates a disk is over 90% full. Users may more likely notice the File Viewer message ª1.0MB availableº. When the system reaches 100% disk usage, ordinary users can't login. Root can login and will have some free disk available for root's usage that is not considered by ªdfº.

The simplest expedient is to just reboot the server and/or clients. Press the Command and ~ (keypad) keys at the same time, and reply ªrº to the prompt. This will reset the virtual storage swapping area (/private/vm/swapfile) to its usual size (16M or 20M). The swapfile can get large if someone allocates a large amount of virtual memory on a given machine, say by trying to edit a 30M file, or often by Mathematica's normal operation. Booting also clears /tmp. It may also clear out hanging printer-spool files, but don't count on this. Of course, booting won't solve problems caused by users taking too much space and a few other things.

The server disk is currently divided into two partitions:

sd0a for system files primarily, and

sd0b for user files primarily.

Most sd0a files and directories are read-only with respect to ordinary users, but there are some notable exceptions. For example /private/spool/mail is write-exported to all clients so each user can get his or her mail. Also other items in /private/spool may be written by server daemons such as the line-printer daemon. Of course, to confuse things yet more there may be hard or symbolic links.

Finding Users With Excessive Disk Usage

User home directories are currently on the server. Corrective measures should be executed from the server machine.

Look at /LocalLibrary/Misc/Avid_Disk_Users. This file (formerly named DiskPigs) summarizes the home directory files of the top 10 users of disk space, as determined by the nightly maintenance procedure (/usr/local/etc/daily.maint.server). To see the complete summary of all users, see file /LocalLibrary/Misc/DiskUsage.

Avid_Disk_Users summarizes usage at level homedirectory/* . For detail, you might need to do something like:

# cd ~c543210 # ls -lR A common problem caused by naive users is a large system folder or file copied into their own directory. They probably meant to put it on their shelf. Look for files like /NextApps or Mathematica.app in the user's directories, and remove them.

If Mailboxes contains large attachments, consider either removing them outright and/or warning the user. Sound attachments in a mailbox have the filename form ªVoiceMail_useridxx.voxº.

Watch for .gif, .tiff, .snd, .vox, and other files that have a needfulness/size ratio approaching zero!

Users who are simply greedy or lazy can be encouraged to copy their files to diskette, or to use the Unix ªcompressº command to at least economize the hard disk space. We have some local commands ªpackº and ªunpackº (in /usr/local/bin) that will compress a file or contents of a directory and preserve permissions, etc. Note the policies stated in /LocalLibrary/MizzouInfo. One user cannot be allowed to prevent others from using the system, even if they have relatively academic disk usage, but especially if the file storage has no academic merit. In a really bad case, I could consider copying files to a diskette (in compressed form), removing them from disk, and changing the user's password to force them to come negotiate.

/usr/local/etc/

/LocalLibrary/Misc.

Typical File Sizes on a NeXT WorkStation

13 April 1992

Use these numbers for comparison with a client whose disk is full. This applies not only to a client next, but also to the sd0a ªsystem filesº partition of the server!

The ªUsual Suspectsº

All of these apply to the server as well, but then you also must check for users abusing space.

To generate a report of disk usage on all clients, logon the server as a wheel group user other than root, and do:

muebnx1> ClientDiskUsage

This update /LocalLibrary/Misc/ClientDiskUsage.

Inspecting a single client for disk usage

# How much space do we have on this 105MB NeXT workstation?

muebnx18# df /dev/sd0a

Filesystem kbytes used avail capacity Mounted on

/dev/sd0a 98442 77075 16444 82% /

# Take off devices that are NFS mounted. Remount them with ªmount -aº when done!

# umount -a & mount -a can be done only from root.

muebnx18# umount -a

/bin: Device busy /lib: Device busy /usr/bin: Device busy /usr/lib: Device busy /usr/ucb: Device busy /NextLibrary: Device busy muebnx18# du -s /*

1 /Net 0 /etc 1682 /NextAdmin 3578 /lib 7248 /NextApps 8 /lost+found 3365 /NextDeveloper 0 /mach 99427 /NextLibrary 44 /me 67 /NextTour 704 /odmach 3 /Users 21342 /private 2221 /bin 704 /sdmach 1 /cores 0 /tmp 0 /dev 37218 /usr muebnx18# cd /private

muebnx18# du -s *

1 Net 93 adm 10 dev 561 etc 1 preserve spool/mail/.NextTrash/..: Permission denied (mail is not universally readable) 51 tftpboot 29 tmp 20505 vm muebnx18# cd /private/spool

muebnx18# du -s *

10 NeXT 1 NeXTFaxes 1 appkit 3 at 1 lpd 1 lpd.lock mail/.NextTrash/..: Permission denied 29 mqueue 10 uucp 1 uucppublic muebnx18# cd /private/adm

muebnx18# du -s *

0 aculog 1 monthly 0 aculog.old 0 monthly.log 1 daily 0 monthly.log.old 0 daily.log 0 msgbuf 0 daily.log.old 4 psout 18 lastlog 0 rcs.log 21 lastlog.old 0 software_version 2 lpd-errs 1 weekly 2 lpd-errs.old 0 weekly.log 16 messages 0 weekly.log.old 16 messages.old 2 wtmp 8 wtmp.old

Users can't access a Public File

The default permission for a new file is read-write by user only, no one else. Thus, even if a new file is placed or copied to a public directory, it may not be readable. To fix the file, login as the Instructor-owner or as superuser, and use ªchmod -R a+r filenameº. If the instructor seems to have trouble remembering to do this, he or she can reset their default file creation mask via Preferences ªUnix Expertº and by ªset umaskº in ~/.cshrc; then they'll have to cautious that files that are not to be public are appropriately protected.

Check also that the directories in the path to a public file are all be searchable (x) if not readable (r).

If the network or file server is out (or slow) this would also make files unavailable (or slow coming).


Questions and comments concerning these policies should be directed to the NeXT Lab site consultant or the MU Campus Computing "NeXT Lab Co-ordinator", 882-5000.