Traceroute

From SonicWiki
Jump to: navigation, search

Introduction

(From the Linux MAN page) Traceroute tracks the route packets taken from an IP network on their way to a given host. It utilizes the IP protocols time to live (TTL) field and attempts to elicit an ICMP TIME_EXCEEDED response from each gateway along the path to the host.

Traceroute-ttl.png

Each router through which a packet travels is represented on a new line, with a number before its PTR record, IP address, and the latency to that router. When referring to the routers a packet travels through to reach its destination, one frequently uses the term "hops".

Not all traceroute tools use the same protocol to elicit a response. Unix based traceroute programs use UDP, Windows native traceroute client uses ICMP, others use TCP. Depending on what protocol is used, the destination host may or may not respond. Not getting a response from the last hop (seeing Request Timed Out) is normal, not a cause for concern. If you're interested in asking for a response from your destination host specifically, use ping.

Traceroute can be used to detect:

  • Network Congestion
  • Routing Loops
  • No connectivity to specific internet hosts

Syntax

On Windows, the command from a shell prompt is "tracert". For troubleshooting potential network latency issues, pathping is useful as it performs continuous traceroute for 200 seconds.

On Apple and Unix based systems, the command is $traceroute from a terminal window. For troubleshooting latency issues, mtr is ideal as it will perform a continuous traceroute, but it is not installed by default.

Naming Info in Traceroutes

Not all routers have a PTR record, but those that do can tell you a lot about where your route is going and what equipment it's being routed through. For example, if you saw:

xe-11-1-0.edge1.NewYork1.Level3.net

You can first tell that this is a Level3 edge router located in New York. As a loose guideline, these abbreviations are usually applied to these types of routers:

  • Core routers - CR, Core, GBR, BB, CCR, EBR
  • Peering routers – BR, Border, Edge, IR, IGR, Peer
  • Customer routers - AR, Aggr, Cust, CAR, HSA, GW

Latency

Distance

Some latency between routers, even on fiber-optic networks, will always be present, limited by the speed of light. Light travels through fiber at 200,000km/s, meaning that it take roughly 400 miliseconds for light to travel around the earth once through fiber with no additional delay. The further, geographically, that your data must travel, the more latent it will be. So if you run traceroute and see this:

3 xe-3-0-0.cr1.nyc3.us.nlayer.net (69.22.142.74) 6.570ms
4 xe-0-0-0.cr1.lhr1.uk.nlayer.net (69.22.142.10) 74.144ms

Your data just went across the Atlantic Ocean and back (remember the latency is for the round trip), so that latency is normal.

Congestion

In some cases (this is rare to find in day-to-day troubleshooting) a router may be overloaded by an increase in traffic, causing delays in data being sent through that router. If this occurs in a traceroute, you will see increased latency in every hop after the affected router.

For example:

Tracert-congestion.png

Here congestion between the second and third hop would cause the end user to see a very slow connection. You may notice that the latency occurs somewhere between the local network and the ISP's WAN. In some cases, this sort of latency is caused by issues other than network congestion. That is discussed here.

Rate Limiting and De-prioritization

Two reasons why latency through a specific router may artificially increase are rate-limiting and de-prioritization. In these cases, though it takes longer for the router to respond to your traceroute, packets traveling through this router to other hosts are not delayed, meaning your traceroute will not show similar latency to hops beyond this router.

  • Rate-limiting: Most routers limit the amount of ICMP responses they will send within a given period of time to limit their susceptibility to denial of service attacks.
  • De-prioritization: Most routers handle an incredible amount of data per second and understand that ICMP is not crucial data. They will therefore process other internet traffic before handling ICMP. Traceroute traffic is almost always the last in line.

Latency False Alarm

In this example, latency increases significantly on the 5th hop but latency appears is fine after that. This indicates that the router de-prioritized its ICMP response but did not delay routing traffic on to other hosts. If congestion were an issue, we would expect to see roughly equal or higher latency on each subsequent hop after hop 5.

There are different routers listed on hop 5 are because the the traffic took multiple paths to its destination. This is okay.

$ traceroute google.com
traceroute to google.com (74.125.239.132), 30 hops max, 60 byte packets
1  fxp6.fw.noc.sonic.net (64.142.23.33)  0.290 ms  0.269 ms  0.473 ms
2  64-142-122-41.static.sonic.net (64.142.122.41)  1.547 ms  2.319 ms  2.872 ms
3  2.ge-1-1-0.gw.sr.sonic.net (209.204.191.36)  1.146 ms  1.144 ms  1.134 ms
4  265.ge-7-1-0.gw.pao1.sonic.net (64.142.0.198)  3.185 ms  3.181 ms  3.170 ms
5  0.ge-6-1-6.gw.equinix-sj.sonic.net (64.142.0.206)  68.973 ms 0.xe-6-0-0.gw.equinix-sj.sonic.net (64.142.0.185)  68.930 ms 0.ge-6-1-6.gw.equinix-sj                                                           .sonic.net (64.142.0.206)  68.953 ms
6  eqixsj-google-gige.google.com (206.223.116.21)  5.868 ms  5.466 ms  8.422 ms
7  216.239.49.170 (216.239.49.170)  5.709 ms  5.701 ms  6.426 ms
8  66.249.95.31 (66.249.95.31)  7.087 ms  8.512 ms  8.510 ms
9  nuq05s02-in-f4.1e100.net (74.125.239.132)  7.052 ms  6.995 ms  6.980 ms

Packet Loss

When a computer running a traceroute does not receive a response from a specific router, that is represented as an asterisk.

Tracert-1pl.jpg

In this example the first packet to the second hop was dropped and the next two packets were received with very reasonable latency for a broadband connection. Keep in mind that every packet to a hop further down the route must have successfully passed the 2nd hop because there was no further packet loss. We can then assume that that one dropped packet was likely due to de-prioritization or rate limiting and not indicative of a problem.

In the next example you can see repeated instances of packet loss after a particular hop, accompanied by latency as well. However, the latency occurs as the information travels from South Africa to Great Britain, so most of this latency is likely due to the long distance.

Tracert-pl.png


Some routers simply do not respond to traceroute requests. Often this is the case with customers computers. When this happens, traceroute will attempt to run for a maximum of 30 hops before quitting. Usually, the route ends long before this, but as the destination host is not responding to ICMP requests, the program is unaware and will keep going until it reaches 30 hops.

In this example, you can see problematic packet loss (in yellow), and the traceroute running out of responsive hosts (in white). The traceroute would have kept going but the user aborted the task using control-c.

Tracert-timedout.jpg

Tools for Troubleshooting Latency & Loss

Because a traceroute only sends three packets for each hop in the route, you are limited by a small sample size that does not always give you an accurate picture of a line's overall health. You could be not seeing loss or latency if it is only happening intermittently, or more likely, you could see some packet loss or latency due to router prioritization where a repeated traceroute would show no consistent problem. To get around this, we suggest using either the mtr (not installed by default) or pathping shell commands (windows only) when troubleshooting network latency or loss.

Mtr-pl.png

In the screenshot (above) of mtr running on an Apple machine you can see significant packet loss after the first hop. You can see also see that the program has sent 135 packets to each hop, making us much more certain the packet loss issue is not a simple statistical aberration. The consistent packet loss to the 2nd hop and beyond indicates that the problem is either with the equipment at that second hop (97.75.128.1) or on the line between the 1st and 2nd hop.

In DSL and Fusion networks there is a lot of equipment and therefore many potential points of failure between those two networks.

Traceroutes and ISP Networks

In the example traceroute from the Sonic.net DSL customer shown below, the customer has their own Linksys router and a bridged ZTE modem . The customer runs the traceroute from their computer. The first hop is their Linksys router, which has a LAN IP address of 192.168.1.1:

1 4 ms 2 ms 5 ms 192.168.1.1 
2 10 ms 8 ms 9 ms 173-228-18-1.dsl.dynamic.sonic.net [173.228.18.1]
3 9 ms 6 ms 6 ms gig1-6.cr1.lsatca11.sonic.net [70.36.243.21]
4 7 ms 8 ms 17 ms 0.xe-5-1-0.gw.pao1.sonic.net [69.12.211.1]
5 8 ms 10 ms 11 ms xe-1-0-6.ar1.pao1.us.nlayer.net [69.22.130.85]
6 38 ms 8 ms  8 ms ae0-90g.cr1.pao1.us.nlayer.net [69.22.153.18]

What you cannot see in the traceroute, is that after the 1st hop (the Linksys router), the information must travel via ethernet to the customer's bridged ZTE modem, through the customer's internal wiring, out their MPOE and through the phone lines to the AT&T Central Office, through the DSLAM where it's finally routed to the Sonic.net gateway, the 2nd hop (173.228.18.1).

So, if you're seeing a Fusion customer complaining latency or loss starting at their WAN gateway IP (the .1 address most cases), you should check for physical issues, defective modems, etc. For 99 out of 100 of these cases, the issue is not with our gateway router.

LAN Troubleshooting

If the first hop in a traceroute an internal IP address (eg. 192.168.1.1), and the packet loss or latency starts there, the problem is likely between your computer and your router. This could be a problem with your computer, a poor wireless connection, a bad ethernet cable, or a failing router.

Saturated Connections and Traceroute

If your local connection is saturated because of usage (for example your roommate using bittorrent) you will see increased latency starting at the first hop. For accurate traceroute results, you should always instruct the customer to pause any internet activity while they run the traceroute.


Routing Loops

Routing loops occur when data packets are routed continuously through the same routers in an infinite loop. They are relatively rare to see now because many routing protocols build in protections against routing loops.

Routing-loop.png

In the example above, notice how hops 10 through 30 are the exact same two IP addresses. Such loops are rare and may be indicative of a serious problem.

Outside Resources

A Practical Guide to (Correctly) Troubleshooting with Traceroute [{Category:Connectivity]]