The nicinfo utility is usually the first debug tool that you'll use (aside from ifconfig) when problems with networking occur. This will let you know whether or not the driver has properly negotiated at the link layer and whether or not it's sending and receiving packets.
Ensure that the slogger2 daemon is running, and then after the problem occurs, run the slog2info utility to see if the driver has logged any diagnostic information. You can increase the amount of diagnostic information that a driver logs by specifying the verbose command-line option to the driver. Many drivers support various levels of verbosity; you might even try specifying verbose=10.
Let's look at the output from the nicinfo utility. Here's a typical example:
Physical Node ID ........................... 000102 C510D4 Current Physical Node ID ................... 000102 C510D4 Current Operation Rate ..................... 100.00 Mb/s full-duplex Active Interface Type ...................... MII Active PHY Address ......................... 3 Power Management State ..................... Active Maximum Transmittable data Unit ............ 1514 Maximum Receivable data Unit ............... 1514 Receive Checksumming Enabled ............... TCPv6 Transmit Checksumming Enabled .............. TCPv6 Hardware Interrupt ......................... 0x5 DMA Channel ................................ 0 I/O Aperture ............................... 0xd400 - 0xd47f ROM Aperture ............................... 0 Memory Aperture ............................ 0xe6000000 - 0xe6000FFF Promiscuous Mode ........................... Off Multicast Support .......................... Enabled Packets Transmitted OK ..................... 104 Bytes Transmitted OK ....................... 10067 Broadcast Packets Transmitted OK ........... 6 Multicast Packets Transmitted OK ........... 1 Memory Allocation Failures on Transmit ..... 0 Packets Received OK ........................ 1443 Bytes Received OK .......................... 168393 Broadcast Packets Received OK .............. 427970 Multicast Packets Received OK .............. 37596 Memory Allocation Failures on Receive ...... 0 Single Collisions on Transmit .............. 0 Multiple Collisions on Transmit ............ 0 Deferred Transmits ......................... 0 Late Collision on Transmit errors .......... 0 Transmits aborted (excessive collisions) ... 0 Transmits aborted (excessive deferrals) .... 0 Transmit Underruns ......................... 0 No Carrier on Transmit ..................... 0 Jabber detected ............................ 0 Receive Alignment errors ................... 0 Received packets with CRC errors ........... 0 Packets Dropped on receive ................. 0 Ethernet Headers out of range .............. 0 Oversized Packets received ................. 0 Frames with Dribble Bits ................... 0 Total Frames experiencing Collision(s) ..... 0
When you're dealing with a network problem, start with these:
The information includes the following:
If the value represented is FFFFFF FFFFFF or 000000 000000, there's likely something wrong with the setup of the hardware, or you need to assign a MAC address to the card. Check the hardware manual to see whether or not this is the case.
The first six digits of the MAC address are the vendor ID. Check the entries against the list at http://www.cavebear.com/archive/cavebear/Ethernet/vendor.html to see if the vendor ID is valid. Then check the card ID (the last 6 digits). The card ID should be something semi-random. A display similar to 444444 is likely incorrect.
Another way of thinking about this is to compare it to a postal system, where if we wanted to pretend to be someone else, we would accept all mail from the Post Office. However, we would then have to sort all the mail. This would take a much longer time compared with the amount of time the Post Office would take to presort the mail, and give us only the mail addressed to us. For more information, see Promiscuous Mode, below.
The easiest way to illustrate this is to think of a road. If the road has two lanes, it's full-duplex, because cars can drive in both directions at the same time without obstructing the other lane. If the road has only a single lane, it's half-duplex, because there can be only one car on the road at a time.
When you examine the media rate, check the speed, the form of duplex, and what the hub supports. Not all hubs support full-duplex.
Also, when a card is placed in promiscuous mode, a network MAC address can be spoofed, (i.e., the card accepts all packets whether they're addressed to it or not). Then on a higher (software) level, you can accept packets addressed to whomever you please. Promiscuous mode is disabled by default.
When a memory-allocation error occurs, the system is likely very low on memory. Make sure that there's sufficient memory on the system; if you continuously get this error, consider adding more memory. Another thing to check for is memory leaks on the system, which may be slowly consuming system memory.
The NIC checks for a carrier sense when it knows that the network hasn't been used for a while, and then starts to transmit a frame of data. The problem occurs when two network cards check for the carrier sense and start to transmit data at the same time. This error is more common on busy networks.
When the NICs detect a collision, they stop transmitting and wait for a random period of time. The time periods are different for each NIC, so in theory, when the wait time has expired, the other NIC will have already transmitted or will be still waiting for its time to expire, thus avoiding further collisions.
You can reduce this type of problem by introducing a full-duplex network.
Depending on the protocol, these types of errors can be detrimental to the protocol's overall throughput. For example, a 1% packet loss on the NFS protocol using the default retransmission timers is enough to slow the speed down by approximately 90%. If you experience low throughput with your networking, check to make sure that you aren't getting these types of errors. Typically, Ethernet adapters don't retransmit frames that have been lost to a late collision.
These errors are a sign that the time to propagate the signal across the network is longer than the time it takes for a network card to place an entire packet on the network. Thus, the offending system doesn't know that the network is currently in use, and it proceeds to place a new frame on the network.
The nodes that are trying to use the network at the same time detect the error after the first slot time of 64 bytes. This means that the NIC detects late collisions only when transmitting frames that are longer than 64 bytes. The problem with this is that, with frames smaller than 64 bytes, the NIC can't detect the error. Generally, if you experience late collisions with large frames on your network, you're very likely also experiencing late collisions with small frames.
These types of errors are generally caused by Ethernet cables that are longer than that allowed by the IEEE 802.3 specification, or are the maximum size permitted by the particular type of cable, or by an excessive amount of repeaters on the network between the two nodes.
Another thing to note is that these errors may actually be caused by a node on the network that has faulty hardware and is sending damaged frames that look like collision fragments. These damaged frames can sometimes appear to a network card to be a late collision.
If these sort of errors are being experienced, see if the network can be reduced, or introduce a strategically placed switch into the network to help eliminate the number of packets that are being placed on the entire network. Switching to a full-duplex network also resolves these problems.
These errors are caused by plugging and unplugging cables on the network and by poor optical power supplied to the Fiber Optic Transceiver (FOT).
These errors are commonly due to faulty wiring, cable runs that are out of the IEEE 802.3 specification, a faulty NIC, or possibly a faulty hub or switch. To narrow down this problem, do a binary division of the network to help eliminate the source.
The best way to try to solve Cyclic Redundancy Check (CRC) errors is to do a binary division of the systems on the network to determine which system is sending bad data. Once you've done that, you can start replacing the hardware piece by piece. Because this error is on the receiving end, it's difficult to determine if the CRC is bad on a sent packet.