Powered by
Share this page on
Article provided by Wikipedia

Main article: "TCP window scale option

For more efficient use of high-bandwidth networks, a larger TCP window size may be used. The TCP window size field controls the flow of data and its value is limited to between 2 and 65,535 bytes.

Since the size field cannot be expanded, a scaling factor is used. The "TCP window scale option, as defined in RFC 1323, is an option used to increase the maximum window size from 65,535 bytes to 1 gigabyte. Scaling up to larger window sizes is a part of what is necessary for "TCP tuning.

The window scale option is used only during the TCP 3-way handshake. The window scale value represents the number of bits to left-shift the 16-bit window size field. The window scale value can be set from 0 (no shift) to 14 for each direction independently. Both sides must send the option in their SYN segments to enable window scaling in either direction.

Some routers and packet firewalls rewrite the window scaling factor during a transmission. This causes sending and receiving sides to assume different TCP window sizes. The result is non-stable traffic that may be very slow. The problem is visible on some sites behind a defective router.[18]

TCP timestamps[edit]

TCP timestamps, defined in RFC 1323, can help TCP determine in which order packets were sent. TCP timestamps are not normally aligned to the system clock and start at some random value. Many operating systems will increment the timestamp for every elapsed millisecond; however the RFC only states that the ticks should be proportional.

There are two timestamp fields:

a 4-byte sender timestamp value (my timestamp)
a 4-byte echo reply timestamp value (the most recent timestamp received from you).

TCP timestamps are used in an algorithm known as Protection Against Wrapped Sequence numbers, or PAWS (see RFC 1323 for details). PAWS is used when the receive window crosses the sequence number wraparound boundary. In the case where a packet was potentially retransmitted it answers the question: "Is this sequence number in the first 4 GB or the second?" And the timestamp is used to break the tie.

Also, the Eifel detection algorithm (RFC 3522) uses TCP timestamps to determine if retransmissions are occurring because packets are lost or simply out of order.

"Out-of-band data[edit]

It is possible to interrupt or abort the queued stream instead of waiting for the stream to finish. This is done by specifying the data as urgent. This tells the receiving program to process it immediately, along with the rest of the urgent data. When finished, TCP informs the application and resumes back to the stream queue. An example is when TCP is used for a remote login session, the user can send a keyboard sequence that interrupts or aborts the program at the other end. These signals are most often needed when a program on the remote machine fails to operate correctly. The signals must be sent without waiting for the program to finish its current transfer.[2]

TCP OOB data was not designed for the modern Internet. The urgent pointer only alters the processing on the remote host and doesn't expedite any processing on the network itself. When it gets to the remote host there are two slightly different interpretations of the protocol, which means only single bytes of OOB data are reliable. This is assuming it is reliable at all as it is one of the least commonly used protocol elements and tends to be poorly implemented. [19][20]

Forcing data delivery[edit]

Normally, TCP waits for 200 ms for a full packet of data to send ("Nagle's Algorithm tries to group small messages into a single packet). This wait creates small, but potentially serious delays if repeated constantly during a file transfer. For example, a typical send block would be 4 KB, a typical MSS is 1460, so 2 packets go out on a 10 Mbit/s ethernet taking ~1.2 ms each followed by a third carrying the remaining 1176 after a 197 ms pause because TCP is waiting for a full buffer.

In the case of telnet, each user keystroke is echoed back by the server before the user can see it on the screen. This delay would become very annoying.

Setting the "socket option TCP_NODELAY overrides the default 200 ms send delay. Application programs use this socket option to force output to be sent after writing a character or line of characters.

The RFC defines the PSH push bit as "a message to the receiving TCP stack to send this data immediately up to the receiving application".[2] There is no way to indicate or control it in "user space using "Berkeley sockets and it is controlled by "protocol stack only.[21]


TCP may be attacked in a variety of ways. The results of a thorough security assessment of TCP, along with possible mitigations for the identified issues, were published in 2009,[22] and is currently being pursued within the "IETF.[23]

Denial of service[edit]

By using a "spoofed IP address and repeatedly sending "purposely assembled SYN packets, followed by many ACK packets, attackers can cause the server to consume large amounts of resources keeping track of the bogus connections. This is known as a "SYN flood attack. Proposed solutions to this problem include "SYN cookies and cryptographic puzzles, though SYN cookies come with their own set of vulnerabilities.[24] "Sockstress is a similar attack, that might be mitigated with system resource management.[25] An advanced DoS attack involving the exploitation of the TCP Persist Timer was analyzed in "Phrack #66.[26]

Connection hijacking[edit]

TCP sequence prediction attack

An attacker who is able to eavesdrop a TCP session and redirect packets can hijack a TCP connection. To do so, the attacker learns the sequence number from the ongoing communication and forges a false segment that looks like the next segment in the stream. Such a simple hijack can result in one packet being erroneously accepted at one end. When the receiving host acknowledges the extra segment to the other side of the connection, synchronization is lost. Hijacking might be combined with Address Resolution Protocol ("ARP) or routing attacks that allow taking control of the packet flow, so as to get permanent control of the hijacked TCP connection.[27]

Impersonating a different IP address was not difficult prior to RFC 1948, when the initial sequence number was easily guessable. That allowed an attacker to blindly send a sequence of packets that the receiver would believe to come from a different IP address, without the need to deploy ARP or routing attacks: it is enough to ensure that the legitimate host of the impersonated IP address is down, or bring it to that condition using "denial-of-service attacks. This is why the initial sequence number is now chosen at random.

TCP veto[edit]

An attacker who can eavesdrop and predict the size of the next packet to be sent can cause the receiver to accept a malicious payload without disrupting the existing connection. The attacker injects a malicious packet with the sequence number and a payload size of the next expected packet. When the legitimate packet is ultimately received, it is found to have the same sequence number and length as a packet already received and is silently dropped as a normal duplicate packet—the legitimate packet is "vetoed" by the malicious packet. Unlike in connection hijacking, the connection is never desynchronized and communication continues as normal after the malicious payload is accepted. TCP veto gives the attacker less control over the communication, but makes the attack particularly resistant to detection. The large increase in network traffic from the ACK storm is avoided. The only evidence to the receiver that something is amiss is a single duplicate packet, a normal occurrence in an IP network. The sender of the vetoed packet never sees any evidence of an attack.[28]

Another vulnerability is "TCP reset attack.

TCP ports[edit]

TCP and UDP use "port numbers to identify sending and receiving application end-points on a host, often called "Internet sockets. Each side of a TCP connection has an associated 16-bit unsigned port number (0-65535) reserved by the sending or receiving application. Arriving TCP packets are identified as belonging to a specific TCP connection by its sockets, that is, the combination of source host address, source port, destination host address, and destination port. This means that a server computer can provide several clients with several services simultaneously, as long as a client takes care of initiating any simultaneous connections to one destination port from different source ports.

Port numbers are categorized into three basic categories: well-known, registered, and dynamic/private. The well-known ports are assigned by the "Internet Assigned Numbers Authority (IANA) and are typically used by system-level or root processes. Well-known applications running as servers and passively listening for connections typically use these ports. Some examples include: "FTP (20 and 21), "SSH (22), "TELNET (23), "SMTP (25), "HTTP over SSL/TLS (443), and "HTTP (80). Registered ports are typically used by end user applications as "ephemeral source ports when contacting servers, but they can also identify named services that have been registered by a third party. Dynamic/private ports can also be used by end user applications, but are less commonly so. Dynamic/private ports do not contain any meaning outside of any particular TCP connection.

"Network Address Translation (NAT), typically uses dynamic port numbers, on the ("Internet-facing") public side, to "disambiguate the flow of traffic that is passing between a public network and a private "subnetwork, thereby allowing many IP addresses (and their ports) on the subnet to be serviced by a single public-facing address.


TCP is a complex protocol. However, while significant enhancements have been made and proposed over the years, its most basic operation has not changed significantly since its first specification RFC 675 in 1974, and the v4 specification RFC 793, published in September 1981. RFC 1122, Host Requirements for Internet Hosts, clarified a number of TCP protocol implementation requirements. A list of the 8 required specifications and over 20 strongly encouraged enhancements is available in RFC 7414. Among this list is RFC 2581, TCP Congestion Control, one of the most important TCP-related RFCs in recent years, describes updated algorithms that avoid undue congestion. In 2001, RFC 3168 was written to describe Explicit Congestion Notification ("ECN), a congestion avoidance signaling mechanism.

The original "TCP congestion avoidance algorithm was known as "TCP Tahoe", but many alternative algorithms have since been proposed (including "TCP Reno, "TCP Vegas, "FAST TCP, "TCP New Reno, and "TCP Hybla).

TCP Interactive (iTCP) [29] is a research effort into TCP extensions that allows applications to subscribe to TCP events and register handler components that can launch applications for various purposes, including application-assisted congestion control.

"Multipath TCP (MPTCP) [30][31] is an ongoing effort within the IETF that aims at allowing a TCP connection to use multiple paths to maximize resource usage and increase redundancy. The redundancy offered by Multipath TCP in the context of wireless networks enables the simultaneous utilisation of different networks, which brings higher throughput and better handover capabilities. Multipath TCP also brings performance benefits in datacenter environments.[32] The reference implementation[33] of Multipath TCP is being developed in the Linux kernel.[34] "Multipath TCP is used to support the Siri voice recognition application on iPhones, iPads and Macs [35]

"TCP Cookie Transactions (TCPCT) is an extension proposed in December 2009 to secure servers against denial-of-service attacks. Unlike SYN cookies, TCPCT does not conflict with other TCP extensions such as "window scaling. TCPCT was designed due to necessities of "DNSSEC, where servers have to handle large numbers of short-lived TCP connections.

"tcpcrypt is an extension proposed in July 2010 to provide transport-level encryption directly in TCP itself. It is designed to work transparently and not require any configuration. Unlike "TLS (SSL), tcpcrypt itself does not provide authentication, but provides simple primitives down to the application to do that. As of 2010, the first tcpcrypt IETF draft has been published and implementations exist for several major platforms.

"TCP Fast Open is an extension to speed up the opening of successive TCP connections between two endpoints. It works by skipping the three-way handshake using a cryptographic "cookie". It is similar to an earlier proposal called "T/TCP, which was not widely adopted due to security issues.[36] As of July 2012, it is an IETF Internet draft.[37]

Proposed in May 2013, Proportional Rate Reduction (PRR) is a TCP extension developed by "Google engineers. PRR ensures that the TCP window size after recovery is as close to the "Slow-start threshold as possible.[38] The algorithm is designed to improve the speed of recovery and is the default congestion control algorithm in Linux 3.2+ kernels.[39]

TCP over wireless networks[edit]

TCP was originally designed for wired networks. Packet loss is considered to be the result of "network congestion and the congestion window size is reduced dramatically as a precaution. However, wireless links are known to experience sporadic and usually temporary losses due to fading, shadowing, hand off, "interference, and other radio effects, that are not strictly congestion. After the (erroneous) back-off of the congestion window size, due to wireless packet loss, there may be a congestion avoidance phase with a conservative decrease in window size. This causes the radio link to be underutilized. Extensive research on combating these harmful effects has been conducted. Suggested solutions can be categorized as end-to-end solutions, which require modifications at the client or server,[40] link layer solutions, such as Radio Link Protocol ("RLP) in cellular networks, or proxy-based solutions which require some changes in the network without modifying end nodes.[40][41]

A number of alternative congestion control algorithms, such as "Vegas, "Westwood, Veno, and Santa Cruz, have been proposed to help solve the wireless problem.["citation needed]

Hardware implementations[edit]

One way to overcome the processing power requirements of TCP is to build hardware implementations of it, widely known as "TCP offload engines (TOE). The main problem of TOEs is that they are hard to integrate into computing systems, requiring extensive changes in the operating system of the computer or device. One company to develop such a device was "Alacritech.


A "packet sniffer, which intercepts TCP traffic on a network link, can be useful in debugging networks, network stacks, and applications that use TCP by showing the user what packets are passing through a link. Some networking stacks support the SO_DEBUG socket option, which can be enabled on the socket using setsockopt. That option dumps all the packets, TCP states, and events on that socket, which is helpful in debugging. "Netstat is another utility that can be used for debugging.


For many applications TCP is not appropriate. One problem (at least with normal implementations) is that the application cannot access the packets coming after a lost packet until the retransmitted copy of the lost packet is received. This causes problems for real-time applications such as streaming media, real-time multiplayer games and "voice over IP (VoIP) where it is generally more useful to get most of the data in a timely fashion than it is to get all of the data in order.

For historical and performance reasons, most "storage area networks (SANs) use "Fibre Channel Protocol (FCP) over "Fibre Channel connections.

Also, for "embedded systems, "network booting, and servers that serve simple requests from huge numbers of clients (e.g. "DNS servers) the complexity of TCP can be a problem. Finally, some tricks such as transmitting data between two hosts that are both behind "NAT (using "STUN or similar systems) are far simpler without a relatively complex protocol like TCP in the way.

Generally, where TCP is unsuitable, the "User Datagram Protocol (UDP) is used. This provides the application "multiplexing and checksums that TCP does, but does not handle streams or retransmission, giving the application developer the ability to code them in a way suitable for the situation, or to replace them with other methods like "forward error correction or "interpolation.

"Stream Control Transmission Protocol (SCTP) is another protocol that provides reliable stream oriented services similar to TCP. It is newer and considerably more complex than TCP, and has not yet seen widespread deployment. However, it is especially designed to be used in situations where reliability and near-real-time considerations are important.

"Venturi Transport Protocol (VTP) is a patented "proprietary protocol that is designed to replace TCP transparently to overcome perceived inefficiencies related to wireless data transport.

TCP also has issues in high-bandwidth environments. The "TCP congestion avoidance algorithm works very well for ad-hoc environments where the data sender is not known in advance. If the environment is predictable, a timing based protocol such as "Asynchronous Transfer Mode (ATM) can avoid TCP's retransmits overhead.

"UDP-based Data Transfer Protocol (UDT) has better efficiency and fairness than TCP in networks that have high "bandwidth-delay product.[42]

"Multipurpose Transaction Protocol (MTP/IP) is patented proprietary software that is designed to adaptively achieve high throughput and transaction performance in a wide variety of network conditions, particularly those where TCP is perceived to be inefficient.

Checksum computation[edit]

TCP checksum for IPv4[edit]

When TCP runs over "IPv4, the method used to compute the checksum is defined in RFC 793:

The checksum field is the 16 bit one's complement of the one's complement sum of all 16-bit words in the header and text. If a segment contains an odd number of header and text octets to be checksummed, the last octet is padded on the right with zeros to form a 16-bit word for checksum purposes. The pad is not transmitted as part of the segment. While computing the checksum, the checksum field itself is replaced with zeros.

In other words, after appropriate padding, all 16-bit words are added using "one's complement arithmetic. The sum is then bitwise complemented and inserted as the checksum field. A pseudo-header that mimics the IPv4 packet header used in the checksum computation is shown in the table below.

TCP pseudo-header for checksum computation (IPv4)
Bit offset 0–3 4–7 8–15 16–31
0 Source address
32 Destination address
64 Zeros Protocol TCP length
96 Source port Destination port
128 Sequence number
160 Acknowledgement number
192 Data offset Reserved Flags Window
224 Checksum Urgent pointer
256 Options (optional)

The source and destination addresses are those of the IPv4 header. The protocol value is 6 for TCP (cf. "List of IP protocol numbers). The TCP length field is the length of the TCP header and data (measured in octets).

TCP checksum for IPv6[edit]

When TCP runs over "IPv6, the method used to compute the checksum is changed, as per RFC 2460:

Any transport or other upper-layer protocol that includes the addresses from the IP header in its checksum computation must be modified for use over IPv6, to include the 128-bit IPv6 addresses instead of 32-bit IPv4 addresses.

A pseudo-header that mimics the IPv6 header for computation of the checksum is shown below.

TCP pseudo-header for checksum computation (IPv6)
Bit offset 0–7 8–15 16–23 24–31
0 Source address
128 Destination address
256 TCP length
288 Zeros Next header
320 Source port Destination port
352 Sequence number
384 Acknowledgement number
416 Data offset Reserved Flags Window
448 Checksum Urgent pointer
480 Options (optional)

Checksum offload [edit]

Many TCP/IP software stack implementations provide options to use hardware assistance to automatically compute the checksum in the "network adapter prior to transmission onto the network or upon reception from the network for validation. This may relieve the OS from using precious CPU cycles calculating the checksum. Hence, overall network performance is increased.

This feature may cause "packet analyzers that are unaware or uncertain about the use of checksum offload to report invalid checksums in outbound packets that have not yet reached the network adapter.[43] This will only occur for packets that are intercepted before being being transmitted by the network adapter; all packets transmitted by the network adaptor on the wire will have valid checksums.[44] This issue can also occur when monitoring packets being transmitted between virtual machines on the same host, where a virtual device driver may omit the checksum calculation (as an optimisation), knowing that the checksum will be calculated later by the VM host kernel or its physical hardware.

See also[edit]


  1. ^ Vinton G. Cerf; Robert E. Kahn (May 1974). "A Protocol for Packet Network Intercommunication" (PDF). IEEE Transactions on Communications. 22 (5): 637–648. "doi:10.1109/tcom.1974.1092259. Archived from the original (PDF) on March 4, 2016. 
  2. ^ a b c d e f g h i "Comer, Douglas E. (2006). Internetworking with TCP/IP:Principles, Protocols, and Architecture. 1 (5th ed.). Prentice Hall. "ISBN "0-13-187671-6. 
  3. ^ "TCP (Linktionary term)". 
  4. ^ "RFC 791 – section 2.1". 
  5. ^ "RFC 793". 
  6. ^ "RFC 1323, TCP Extensions for High Performance, Section 2.2". 
  7. ^ "RFC 2018, TCP Selective Acknowledgement Options, Section 2". 
  8. ^ "RFC 2018, TCP Selective Acknowledgement Options, Section 3". 
  9. ^ "RFC 1323, TCP Extensions for High Performance, Section 3.2". 
  10. ^ RFC 793 section 3.1
  11. ^ RFC 793 Section 3.2
  12. ^ "Tanenbaum, Andrew S. (2003-03-17). Computer Networks (Fourth ed.). Prentice Hall. "ISBN "0-13-066102-3. 
  13. ^ "TCP Definition". Retrieved 2011-03-12. 
  14. ^ Mathis; Mathew; Semke; Mahdavi; Ott (1997). "The macroscopic behavior of the TCP congestion avoidance algorithm". ACM SIGCOMM Computer Communication Review. 27.3: 67–82. 
  15. ^ Paxson, V.; Allman, M.; Chu, J.; Sargent, M. (June 2011). "The Basic Algorithm". Computing TCP's Retransmission Timer. "IETF. p. 2. sec. 2. RFC 6298. https://tools.ietf.org/html/rfc6298#section-2. Retrieved October 24, 2015. 
  16. ^ Stone; Partridge (2000). "When The CRC and TCP Checksum Disagree". Sigcomm. 
  17. ^ "RFC 879". 
  18. ^ "TCP window scaling and broken routers [LWN.net]". 
  19. ^ Gont, Fernando (November 2008). "On the implementation of TCP urgent data". 73rd IETF meeting. Retrieved 2009-01-04. 
  20. ^ Peterson, Larry (2003). Computer Networks. Morgan Kaufmann. p. 401. "ISBN "1-55860-832-X. 
  21. ^ Richard W. Stevens (2006). November 2011 TCP/IP Illustrated. Vol. 1, The protocols Check |url= value ("help). Addison-Wesley. pp. Chapter 20. "ISBN "978-0-201-63346-7. 
  22. ^ Security Assessment of the Transmission Control Protocol (TCP) at the "Wayback Machine (archived March 6, 2009)
  23. ^ Security Assessment of the Transmission Control Protocol (TCP)
  24. ^ Jakob Lell. "Quick Blind TCP Connection Spoofing with SYN Cookies". Retrieved 2014-02-05. 
  25. ^ Some insights about the recent TCP DoS (Denial of Service) vulnerabilities
  26. ^ "Exploiting TCP and the Persist Timer Infiniteness". 
  27. ^ "Laurent Joncheray, Simple Active Attack Against TCP, 1995". 
  28. ^ John T. Hagen; Barry E. Mullins (2013). "TCP veto: A novel network attack and its application to SCADA protocols". Innovative Smart Grid Technologies (ISGT), 2013 IEEE PES. 
  29. ^ TCP Interactive (iTCP)
  30. ^ RFC 6182
  31. ^ RFC 6824
  32. ^ Raiciu; Barre; Pluntke; Greenhalgh; Wischik; Handley (2011). "Improving datacenter performance and robustness with multipath TCP". Sigcomm. 
  33. ^ "MultiPath TCP - Linux Kernel implementation". 
  34. ^ Raiciu; Paasch; Barre; Ford; Honda; Duchene; Bonaventure; Handley (2012). "How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP". USENIX NSDI. 
  35. ^ Bonaventure; Seo (2016). "Multipath TCP Deployments". IETF Journal. 
  36. ^ Michael Kerrisk (2012-08-01). "TCP Fast Open: expediting web services". "LWN.net. 
  37. ^ Y. Cheng, J. Chu, S. Radhakrishnan, A. Jain (2012-07-16). TCP Fast Open. "IETF. I-D draft-ietf-tcpm-fastopen-01. https://tools.ietf.org/html/draft-ietf-tcpm-fastopen-01. 
  38. ^ "RFC 6937 - Proportional Rate Reduction for TCP". http://tools.ietf.org/html/rfc6937.  External link in |website= ("help);
  39. ^ Grigorik, Ilya (2013). High-performance browser networking (1. ed.). Beijing: O'Reilly. "ISBN "1449344763. 
  40. ^ a b "TCP performance over CDMA2000 RLP". Retrieved 2010-08-30 
  41. ^ Muhammad Adeel & Ahmad Ali Iqbal (2004). "TCP Congestion Window Optimization for CDMA2000 Packet Data Networks". International Conference on Information Technology (ITNG'07): 31–35. "doi:10.1109/ITNG.2007.190. "ISBN "978-0-7695-2776-5. 
  42. ^ Yunhong Gu, Xinwei Hong, and Robert L. Grossman. "An Analysis of AIMD Algorithm with Decreasing Increases". 2004.
  43. ^ "Wireshark: Offloading". Wireshark captures packets before they are sent to the network adapter. It won't see the correct checksum because it has not been calculated yet. Even worse, most OSes don't bother initialize this data so you're probably seeing little chunks of memory that you shouldn't. New installations of Wireshark 1.2 and above disable IP, TCP, and UDP checksum validation by default. You can disable checksum validation in each of those dissectors by hand if needed. 
  44. ^ "Wireshark: Checksums". Checksum offloading often causes confusion as the network packets to be transmitted are handed over to Wireshark before the checksums are actually calculated. Wireshark gets these “empty” checksums and displays them as invalid, even though the packets will contain valid checksums when they leave the network hardware later. 

Further reading[edit]

External links[edit]



) )