D. J. Bernstein
TCP/IP

Two file descriptors for sockets

Consider a typical UNIX pipeline:
     generate-data | process-data | consume-data
The process-data program has two pipes, one for reading data and one for writing data. When generate-data finishes and closes the first pipe, process-data sees EOF; it then finishes its processing and closes the second pipe, at which point consume-data sees EOF and finishes consuming the data.

What if, instead of running a process-data program on this machine, we want to run it on another machine? No problem. We make a TCP connection to a process-data server on the remote machine. On the local machine, we have generate-data writing to the TCP connection, and consume-data reading from the TCP connection. When generate-data closes the socket, the local machine sends FIN through TCP to the remote machine; process-data sees EOF; it finishes its processing and closes the connection; the remote machine sends FIN through TCP to the local machine; consume-data sees EOF and finishes consuming the data.

All of this would have worked perfectly if a TCP connection, just like a UNIX pipe, had been represented as two UNIX file descriptors: a reading descriptor and a writing descriptor. (Technically, a reading ofile and a writing ofile.) The kernel sends FIN on the TCP connection when the writing descriptor closes. It closes the reading descriptor when it receives FIN on the TCP connection.

But the BSD socket designers instead decided to merge the two obvious fds into one read-write fd. When the generate-data program finishes, the same fd is still open in the consume-data program, so the kernel has no idea that it should send a FIN.

The BSD socket designers introduced a shutdown(fd,1) system call to send FIN through a TCP connection. This is exactly the sort of device-specific garbage that UNIX fds were designed to avoid. The generate-data program works with pipes but doesn't work with TCP connections.