Troubleshooting a driver

The nicinfo utility is usually the first debug tool that you'll use (aside from ifconfig) when problems with networking occur. This will let you know whether or not the driver has properly negotiated at the link layer and whether or not it's sending and receiving packets.

Ensure that the slogger2 daemon is running, and then after the problem occurs, run the slog2info utility to see if the driver has logged any diagnostic information. You can increase the amount of diagnostic information that a driver logs by specifying the verbose command-line option to the driver. Many drivers support various levels of verbosity; you might even try specifying verbose=10.

Let's look at the output from the nicinfo utility. Here's a typical example:

Physical Node ID ........................... 000102 C510D4
Current Physical Node ID ................... 000102 C510D4
Current Operation Rate ..................... 100.00 Mb/s full-duplex
Active Interface Type ...................... MII
Active PHY Address ......................... 3
Power Management State ..................... Active
Maximum Transmittable data Unit ............ 1514
Maximum Receivable data Unit ............... 1514
Receive Checksumming Enabled ............... TCPv6
Transmit Checksumming Enabled .............. TCPv6
Hardware Interrupt ......................... 0x5
DMA Channel ................................ 0
I/O Aperture ............................... 0xd400 - 0xd47f
ROM Aperture ............................... 0
Memory Aperture ............................ 0xe6000000 - 0xe6000FFF
Promiscuous Mode ........................... Off
Multicast Support .......................... Enabled

Packets Transmitted OK ..................... 104
Bytes Transmitted OK ....................... 10067
Broadcast Packets Transmitted OK ........... 6
Multicast Packets Transmitted OK ........... 1
Memory Allocation Failures on Transmit ..... 0

Packets Received OK ........................ 1443
Bytes Received OK .......................... 168393
Broadcast Packets Received OK .............. 427970
Multicast Packets Received OK .............. 37596
Memory Allocation Failures on Receive ...... 0

Single Collisions on Transmit .............. 0
Multiple Collisions on Transmit ............ 0
Deferred Transmits ......................... 0
Late Collision on Transmit errors .......... 0
Transmits aborted (excessive collisions) ... 0
Transmits aborted (excessive deferrals) .... 0
Transmit Underruns ......................... 0
No Carrier on Transmit ..................... 0
Jabber detected ............................ 0
Receive Alignment errors ................... 0
Received packets with CRC errors ........... 0
Packets Dropped on receive ................. 0
Ethernet Headers out of range .............. 0
Oversized Packets received ................. 0
Frames with Dribble Bits ................... 0
Total Frames experiencing Collision(s) ..... 0

Note: The output from nicinfo depends on what the driver supports; not all fields are included for all drivers. However, the output always includes information about the bytes and packets that were transmitted and received.

When you're dealing with a network problem, start with these:

Physical Node ID
Hardware Interrupt
I/O Aperture
Packets Transmitted OK
Total Packets Transmitted Bad
Packets Received OK
Received packets with CRC errors

The information includes the following:

Physical Node ID

The physical node ID is also known as the Media Access Control (MAC) address. This value is unique to every network card, although some models do let you assign your own address. However, this is rare and generally found only on embedded systems.

If the value represented is FFFFFF FFFFFF or 000000 000000, there's likely something wrong with the setup of the hardware, or you need to assign a MAC address to the card. Check the hardware manual to see whether or not this is the case.

Note: If the hardware didn't get set up correctly, the MAC address may not always appear as shown above.

The first six digits of the MAC address are the vendor ID. Check the entries against the list at http://www.cavebear.com/archive/cavebear/Ethernet/vendor.html to see if the vendor ID is valid. Then check the card ID (the last 6 digits). The card ID should be something semi-random. A display similar to 444444 is likely incorrect.

Current Physical Node ID

The current physical node ID is shown if a card has been set up to “spoof” the ID of another card. Basically, a parameter is passed to the driver telling it that the node's ID is actually the value that appears. Depending on the card, some drivers will accept this. What spoofing does on a higher (software) level is filter out the packets that were meant for this node ID. This method is considerably slower than if you let the card filter out the packets on a hardware level. Because the card is set in promiscuous mode, it has to accept all packets that come in and use a software mode to sort them.

Another way of thinking about this is to compare it to a postal system, where if we wanted to “pretend” to be someone else, we would accept all mail from the Post Office. However, we would then have to sort all the mail. This would take a much longer time compared with the amount of time the Post Office would take to presort the mail, and give us only the mail addressed to us. For more information, see Promiscuous Mode, below.

Current Operation Rate

The media rate is the speed at which the network card operates. On most cards, it's either 10 Mb/s or 100 Mb/s. This display also shows what form of duplex the card uses. Most cards run at half or full-duplex transmission:

Full-duplex transmission means that data can be transmitted in both directions simultaneously.
Half-duplex data transmission means that data can be transmitted in both directions, but not at the same time.

The easiest way to illustrate this is to think of a road. If the road has two lanes, it's full-duplex, because cars can drive in both directions at the same time without obstructing the other lane. If the road has only a single lane, it's half-duplex, because there can be only one car on the road at a time.

When you examine the media rate, check the speed, the form of duplex, and what the hub supports. Not all hubs support full-duplex.

Active Interface Type

This is the type of interface used on the Ethernet adapter. This is usually UTP (unshielded twisted pair), STP (shielded twisted pair), Fiber, AUI (Attachment Unit Interface), MII, or BNC (coaxial).

Active PHY Address

This is an identifier that tells you which of the physical PHYs were used to interface to the network. The numbers range from 0 to 31 and change, depending on whether or not you specified a specific PHY or if you let the driver select the default (which varies from card to card).

Power Management State

This value tells you the NIC's current power status: Off, Standby, Idle, or Active. If you can't send or receive packets, make sure the status is Active; if it isn't, there may be a problem with power management on your system.

Maximum Transmittable data Unit (MTU)

The Maximum Transmittable data Unit (MTU) is the size of the largest frame length that can be sent on a physical media. This isn't commonly used for debugging; however, it may be useful for optimizing a network application. A value of 0 is invalid and is a good indicator that the card isn't set up correctly. The default value is 1514.

Maximum Receivable data Unit (MRU)

This is the MTU's complement; it affects the largest frame length that can be received. The default value is 1514.

Receive Checksumming Enabled, Transmit Checksumming Enabled

Not all cards support these options. If your adapter supports them, they tell your card which check-summing method to use: IPv4, TCPv4, UDPv4, TCPv6, or UDPv6.

Hardware Interrupt

The hardware interrupt is the network card's interrupt request line (IRQ). How an IRQ is assigned depends on the card; in the case of a PCI card, pci-server assigns the IRQ.

DMA Channel

This is the DMA channel used for the card. This varies, depending on the card and on the channels it has available.

I/O Aperture

The I/O aperture is a hexadecimal value that shows the address in I/O space where the card resides. The I/O aperture uses the I/O address between the given values to locate and map the I/O ports. The range depends on the platform.

Memory Aperture

The memory aperture is a hexadecimal value that shows the address in memory where the card's memory is located. The memory aperture uses the memory address between the given values to locate and map memory. The range depends on the platform.

ROM Aperture

The ROM aperture is a hexadecimal range that shows the address of the card's ROM. The ROM aperture uses the memory address between the displayed values to locate and map memory.

Promiscuous Mode

When a card is placed in promiscuous mode, the card accepts every Ethernet packet sent on the network. This is quite taxing on the system but is a common practice for debugging purposes.

Also, when a card is placed in promiscuous mode, a network MAC address can be spoofed, (i.e., the card accepts all packets whether they're addressed to it or not). Then on a higher (software) level, you can accept packets addressed to whomever you please. Promiscuous mode is disabled by default.

Multicast Support

When you enable multicast mode, you can mark a packet with a special destination, so that multiple nodes on the network may receive it. Multicast packets are also accepted.

Packets Transmitted OK

Before you look at this value, determine that some form of network transfer (ping, telnet, file transfer) was attempted. If a card isn't set up properly, the number of sent packets shown here is either very small or zero. If the card isn't displaying any sent packets, the cause is probably a driver problem. Check all the options you're passing to the driver; one or more may be incorrect.

Bytes Transmitted OK

This is the number of bytes of data sent on the network. This value increases with the number of packets transmitted on the network.

Total Packets Transmitted Bad

You can use this statistic to determine if you have faulty hardware. If all the sent packets are reported as bad, there's likely a hardware problem, but you might be using the wrong driver. Check the hardware for compatibility. If it looks as if it's hardware-related, try switching the hardware to see if the problem disappears.

Broadcast Packets Transmitted OK

This is the number of broadcast packets transmitted from the NIC.

Multicast Packets Transmitted OK

This is the number of multicast packets transmitted from the NIC.

Memory Allocation Failures on Transmit

Before transmitting data, the driver reserves system memory for a buffer to hold the data to be transmitted. Once the card is ready, the buffer is sent to it.

When a memory-allocation error occurs, the system is likely very low on memory. Make sure that there's sufficient memory on the system; if you continuously get this error, consider adding more memory. Another thing to check for is memory leaks on the system, which may be slowly consuming system memory.

Packets Received OK

This value states how many packets were successfully received from the network card. If a card is having problems receiving data, check the cables and the hub connection. Problems receiving data might be related to the driver. It's possible the driver can be properly set up and able to send data, but may not be able to receive. Usually when data is received but doesn't get sent, the driver is the cause. Check the driver's setup to make sure it's initialized correctly. Use slog2info to check the system log for clues.

Bytes Received OK

This is the number of bytes of data received from the network. This value increases with the number of packets received.

Single Collisions on Transmit

This is the number of collisions that were encountered while trying to transmit frames.

The NIC checks for a carrier sense when it knows that the network hasn't been used for a while, and then starts to transmit a frame of data. The problem occurs when two network cards check for the carrier sense and start to transmit data at the same time. This error is more common on busy networks.

When the NICs detect a collision, they stop transmitting and wait for a random period of time. The time periods are different for each NIC, so in theory, when the wait time has expired, the other NIC will have already transmitted or will be still waiting for its time to expire, thus avoiding further collisions.

You can reduce this type of problem by introducing a full-duplex network.

Multiple Collisions on Transmit

This error is due to an attempted transmission that has had several collisions, despite backing off several times. This occurs more frequently on busy half-duplex networks. If there are a lot of these errors, try switching to a full-duplex network, or if the network is TCP/IP based, try introducing a few switches instead of hubs.

Deferred Transmits

Commonly found on half-duplex networks, this value doesn't mean that there are problems. It means that the card tried to send data on the network cable, but the network was busy with other data on the cable. So, it simply waited for a random amount of time. This number can get high if the network is very busy.

Late Collision on Transmit errors

Late-collision errors that occur when a card has transmitted enough of a frame that the rest of the network should be aware that the network is currently in use, yet another system on the network still started to transfer a frame onto the line. They're the same as regular collision errors, but were just detected too late.

Depending on the protocol, these types of errors can be detrimental to the protocol's overall throughput. For example, a 1% packet loss on the NFS protocol using the default retransmission timers is enough to slow the speed down by approximately 90%. If you experience low throughput with your networking, check to make sure that you aren't getting these types of errors. Typically, Ethernet adapters don't retransmit frames that have been lost to a late collision.

These errors are a sign that the time to propagate the signal across the network is longer than the time it takes for a network card to place an entire packet on the network. Thus, the offending system doesn't know that the network is currently in use, and it proceeds to place a new frame on the network.

The nodes that are trying to use the network at the same time detect the error after the first slot time of 64 bytes. This means that the NIC detects late collisions only when transmitting frames that are longer than 64 bytes. The problem with this is that, with frames smaller than 64 bytes, the NIC can't detect the error. Generally, if you experience late collisions with large frames on your network, you're very likely also experiencing late collisions with small frames.

These types of errors are generally caused by Ethernet cables that are longer than that allowed by the IEEE 802.3 specification, or are the maximum size permitted by the particular type of cable, or by an excessive amount of repeaters on the network between the two nodes.

Another thing to note is that these errors may actually be caused by a node on the network that has faulty hardware and is sending damaged frames that look like collision fragments. These damaged frames can sometimes appear to a network card to be a late collision.

Transmits aborted (excessive collisions)

This error occurs if there are excessive collisions on the network. The network card gives up on transmitting the frame after 16 collisions. This generally means that the network is jammed and is too busy.

Note: Routers also give up on transmitting a frame if they experience excessive collisions, but instead of alerting the original transmitter, routers simply discard the frame.

If these sort of errors are being experienced, see if the network can be reduced, or introduce a strategically placed switch into the network to help eliminate the number of packets that are being placed on the entire network. Switching to a full-duplex network also resolves these problems.

Transmits aborted (excessive deferrals)

Aborted transmissions due to excessive deferrals mean that the NIC gave up trying to send the frame, due to an extremely busy network. You can resolve this type of problem by switching to a full-duplex network.

Transmit Underruns

Chips with a DMA engine may see this error. The DMA engine copies packet data into a FIFO, from which the transmitter puts the data on the wire. On lower-grade hardware, the DMA might not be able to fill the FIFO as fast as the data is going on the wire, so an underrun occurs, and the transmit is aborted.

No Carrier on Transmit

When the NIC is about to transfer a frame, it checks first to make sure that it has carrier sense (much like before you dial the phone, you check to make sure you have a dial tone). While the NIC is transmitting the frame, it listens for possible collisions or any errors. These errors occur when a NIC is transmitting a frame on the network, and it notices that it doesn't see its own carrier wave (much like when you are dialing a number on the phone and you can hear the dial tones being pressed).

These errors are caused by plugging and unplugging cables on the network and by poor optical power supplied to the Fiber Optic Transceiver (FOT).

Jabber detected

You typically see this error only on a 10 Mbit network. It means that a network card is continuing to transmit after a packet has been sent. This error shouldn't occur on faster networks, because they allow a larger frame size.

Receive Alignment errors

A receive-alignment error means that the card has received a damaged frame from the network. When one of these errors occurs, it also triggers an FCS (Frame Check Sequence) error. These errors occur if the received frame size isn't a multiple of eight bits (one byte).

These errors are commonly due to faulty wiring, cable runs that are out of the IEEE 802.3 specification, a faulty NIC, or possibly a faulty hub or switch. To narrow down this problem, do a binary division of the network to help eliminate the source.

Received packets with CRC errors

An entry in this field indicates the number of times, on a hardware level, the card received corrupt data. This corruption could be caused by a faulty hub, cable, or network card.

The best way to try to solve Cyclic Redundancy Check (CRC) errors is to do a binary division of the systems on the network to determine which system is sending bad data. Once you've done that, you can start replacing the hardware piece by piece. Because this error is on the receiving end, it's difficult to determine if the CRC is bad on a sent packet.

Packets Dropped on receive

This usually means you got an overrun while receiving a packet. This has to do with DMA and the FIFO, like a Transmit Underrun, except in this case, the DMA engine can't copy the packet into memory as fast as the data is coming from the network, and the packet gets dropped. Like the Transmit Underrun, this is generally due to poor hardware.

Ethernet Headers out of range

This entry indicates the number of packets whose Ethernet type/length field isn't valid.

Oversized Packets received

An oversized packet is simply a received packet that was too big to fit in the driver's Receive buffer.

Frames with Dribble Bits

Dribble bits are extra bits of data that were received after the Ethernet CRC. They're commonly caused by faulty hardware or by Ethernet cabling that doesn't conform to the 802.3 specifications.

Total Frames experiencing Collision(s)

This is the total number of frames that have experienced a collision while trying to transmit on the network. This can sometimes be high, depending on how busy the network is. A busy network experiences these types of errors more often than a quiet one.