A Brief History of Voice over IP
Voice over IP was initially introduced in the mid-1990s and has evolved to become the predominant method for speech communications over both mobile and terrestrial networks. Although VoIP was introduced in the 1990s, it had its roots in packet voice systems developed in the early 1970s and on work done in sampling, digitizing and compressing speech that dated back to the early 1900s. In this brief history, we will explore the early beginnings of packet voice and its path to today’s VoIP technology.
Methods for time division multiplexing of telegraph systems were developed in the mid-1800s by Moses Farmer as a way of sending multiple telegraph conversations over a single wire at the same time. In 1903, Willard M. Miner experimented with the use of the same commutator to transmit multiple telephone conversations over a single wire by sending samples of the speech waveform. He found that about 4,000 samples per second were needed to recover intelligible speech from the received samples; we know now that this would give an audio bandwidth of a little less than 2kHz.
During the 1920s, Harry Nyquist worked on the more general problem of transmitting sampled signals or pulses and determined that the highest frequency recovered from a sampled signal was half the sample rate. Generally for telephony we would like to communicate with an audio bandwidth of at least 3,500Hz and hence we typically use 8,000 samples per second. Wideband codecs used on today’s mobile networks use 16,000 samples per second, giving significantly better speech quality.
In 1937, Alec Reeves developed a method for converting analog speech samples into digital pulses. This technique allows speech to be transmitted over much greater distances without distortion. The adoption of Pulse Code Modulation (PCM) for telephony took another 20 years to take hold, and became widely used in the 1960s and 70s. Early PCM codecs were implemented in the 1950s using vacuum tubes, which were replaced with semiconductor devices in the mid-1970s.
If the number of bits required to transmit speech can be reduced, then more telephone calls can be carried over the same network. Differential Pulse Code Modulation (DPCM), developed in 1952, was able to compress speech by a factor of 2 to 4. This was improved in 1973 – Adaptive Differential Pulse Code Modulation (ADPCM) – which formed the basis of many speech compression systems. Other early speech compression algorithms included Linear Predictive Compression (LPC) (1955) and Continuously Variable Slope Delta (CVSD) (1970).
In the early 1970s, a packet voice system was developed for use with ARPANET (the forerunner of today’s Internet). It used the Network Voice Protocol (NVP) to communicate “Voice Parcels” and was first demonstrated in 1974 between USC/ISI and MIT Lincoln Labs. NVP incorporated a header with a timestamp and sequence number and was the precursor to the Real-time Transport Protocol (RTP), which is used for Voice over IP and IP Videoconferencing today.
There was extensive research into packet voice technology during the mid-late 1970’s covering the design of packet voice systems and detailed analysis of many aspects of system design, for example optimum packet size, effects of delay and echo, choice of speech codec..
The 1980s saw a proliferation of packet voice systems and applications. These included PCME (Packet Circuit Multiplication Equipment), Voice over ATM, Fast Packet, Voice over Frame Relay and desktop phones using packet voice.
In the second half of the 1980s there was a large international standardization focus on Broadband ISDN and the ATM (Asynchronous Transfer Mode) protocol. This was intended to support both data and voice – and “Voice over ATM” was a specific area of focus. ATM used short packets or cells with a 48-byte payload, intended to provide a compromise between efficiency and low delay. Early pre-standard implementation included Stratacom’s Fast Packet system.
Research into the effects of lost cells or packets on speech quality and on methods for concealing the impact of lost packets was being conducted by the mid-late 1980s. This included the use of repeating the last received packet, the use of punctured FEC codes, and silence insertion.
Early work on telephones and desktop applications that used packet voice started in the mid-1980s. The Etherphone, developed by Xerox Parc in the 1987/88 timeframe, was an Ethernet-connected desktop phone sending packet voice over a local area network. RASCAL (1989) supported voice over Ethernet for gaming applications. NEVOT was a desktop application developed by Henning Schulzrinne in the early 1990s that was able to support packet voice for local phone calls and also conference calls.
The late 1980s and early 1990s saw the development of Frame Relay, which used larger variable length packets that provided much more efficient data transport than ATM. This was applied to packet voice – Voice over Frame Relay (VoFR) – by Dowty Communications and others and standardized by the Frame Relay Forum.
NetFone (1991) was a voice/ meeting communications system using 32kbit compressed voice over UDP between Sun workstations.
The IETF Real-time Transport Protocol – RTP – was developed by Henning Schulzrinne and Steve Casner and first published in draft form by the IETF in 1992. RTP was based on the earlier Network Voice Protocol (NVP) and other experimental packet voice protocols.
In the mid-1990s, numerous Voice over IP designers and products emerged. One example is Selsius, who developed an early IP phone and were acquired by Cisco in 1998. Selsius also developed the SCCP “Skinny Client Control Protocol” signaling protocol, which is still used by Cisco devices although SIP support has grown over the last 20 years.
Early drafts of the Session Initiation Protocol (SIP) were published by the IETF in December 1996, becoming an RFC in March 1999. During the last 20 years there have been numerous updates and extensions to the SIP protocol.
For VoIP to be used in mission-critical networks and services, it is important that service quality can be measured reliably and accurately. The key protocols for achieving this are RTCP XR (RFC 3611) and the SIP Voice Quality Reporting protocol (RFC 6035), which enable voice quality to be directly measured inside VoIP endpoints (typically using VQmon) and reported to central management systems.
Voice over IP is now widely adopted and is at the core of most modern telecommunications networks. VoIP is typically implemented in mobile or desktop phones and packet voice is in use throughout the connection. It took over 100 years from the early experiments with sampled speech by Willard Miner to the early 2000s when VoIP adoption started to grow. As with many “revolutionary” technologies that suddenly appear, this has been a migration involving many incremental steps.
Earliest Beginnings: Sampled Speech
Methods for time division multiplexing of telegraph systems were developed in the mid-1800s by Moses Farmer as a way of sending multiple telegraph conversations over a single wire at the same time. In 1903, Willard M. Miner experimented with the use of the same commutator to transmit multiple telephone conversations over a single wire by sending samples of the speech waveform. He found that about 4,000 samples per second were needed to recover intelligible speech from the received samples; we know now that this would give an audio bandwidth of a little less than 2kHz.
During the 1920s, Harry Nyquist worked on the more general problem of transmitting sampled signals or pulses and determined that the highest frequency recovered from a sampled signal was half the sample rate. Generally for telephony we would like to communicate with an audio bandwidth of at least 3,500Hz and hence we typically use 8,000 samples per second. Wideband codecs used on today’s mobile networks use 16,000 samples per second, giving significantly better speech quality.
Digitizing Speech: PCM
In 1937, Alec Reeves developed a method for converting analog speech samples into digital pulses. This technique allows speech to be transmitted over much greater distances without distortion. The adoption of Pulse Code Modulation (PCM) for telephony took another 20 years to take hold, and became widely used in the 1960s and 70s. Early PCM codecs were implemented in the 1950s using vacuum tubes, which were replaced with semiconductor devices in the mid-1970s.
If the number of bits required to transmit speech can be reduced, then more telephone calls can be carried over the same network. Differential Pulse Code Modulation (DPCM), developed in 1952, was able to compress speech by a factor of 2 to 4. This was improved in 1973 – Adaptive Differential Pulse Code Modulation (ADPCM) – which formed the basis of many speech compression systems. Other early speech compression algorithms included Linear Predictive Compression (LPC) (1955) and Continuously Variable Slope Delta (CVSD) (1970).
Packet Voice - Origins
In the early 1970s, a packet voice system was developed for use with ARPANET (the forerunner of today’s Internet). It used the Network Voice Protocol (NVP) to communicate “Voice Parcels” and was first demonstrated in 1974 between USC/ISI and MIT Lincoln Labs. NVP incorporated a header with a timestamp and sequence number and was the precursor to the Real-time Transport Protocol (RTP), which is used for Voice over IP and IP Videoconferencing today.
There was extensive research into packet voice technology during the mid-late 1970’s covering the design of packet voice systems and detailed analysis of many aspects of system design, for example optimum packet size, effects of delay and echo, choice of speech codec..
Packet Voice systems of the 1980s and Early 90s
The 1980s saw a proliferation of packet voice systems and applications. These included PCME (Packet Circuit Multiplication Equipment), Voice over ATM, Fast Packet, Voice over Frame Relay and desktop phones using packet voice.
In the second half of the 1980s there was a large international standardization focus on Broadband ISDN and the ATM (Asynchronous Transfer Mode) protocol. This was intended to support both data and voice – and “Voice over ATM” was a specific area of focus. ATM used short packets or cells with a 48-byte payload, intended to provide a compromise between efficiency and low delay. Early pre-standard implementation included Stratacom’s Fast Packet system.
Research into the effects of lost cells or packets on speech quality and on methods for concealing the impact of lost packets was being conducted by the mid-late 1980s. This included the use of repeating the last received packet, the use of punctured FEC codes, and silence insertion.
Early work on telephones and desktop applications that used packet voice started in the mid-1980s. The Etherphone, developed by Xerox Parc in the 1987/88 timeframe, was an Ethernet-connected desktop phone sending packet voice over a local area network. RASCAL (1989) supported voice over Ethernet for gaming applications. NEVOT was a desktop application developed by Henning Schulzrinne in the early 1990s that was able to support packet voice for local phone calls and also conference calls.
The late 1980s and early 1990s saw the development of Frame Relay, which used larger variable length packets that provided much more efficient data transport than ATM. This was applied to packet voice – Voice over Frame Relay (VoFR) – by Dowty Communications and others and standardized by the Frame Relay Forum.
NetFone (1991) was a voice/ meeting communications system using 32kbit compressed voice over UDP between Sun workstations.
Voice over IP
The IETF Real-time Transport Protocol – RTP – was developed by Henning Schulzrinne and Steve Casner and first published in draft form by the IETF in 1992. RTP was based on the earlier Network Voice Protocol (NVP) and other experimental packet voice protocols.
In the mid-1990s, numerous Voice over IP designers and products emerged. One example is Selsius, who developed an early IP phone and were acquired by Cisco in 1998. Selsius also developed the SCCP “Skinny Client Control Protocol” signaling protocol, which is still used by Cisco devices although SIP support has grown over the last 20 years.
Early drafts of the Session Initiation Protocol (SIP) were published by the IETF in December 1996, becoming an RFC in March 1999. During the last 20 years there have been numerous updates and extensions to the SIP protocol.
For VoIP to be used in mission-critical networks and services, it is important that service quality can be measured reliably and accurately. The key protocols for achieving this are RTCP XR (RFC 3611) and the SIP Voice Quality Reporting protocol (RFC 6035), which enable voice quality to be directly measured inside VoIP endpoints (typically using VQmon) and reported to central management systems.
Voice over IP is now widely adopted and is at the core of most modern telecommunications networks. VoIP is typically implemented in mobile or desktop phones and packet voice is in use throughout the connection. It took over 100 years from the early experiments with sampled speech by Willard Miner to the early 2000s when VoIP adoption started to grow. As with many “revolutionary” technologies that suddenly appear, this has been a migration involving many incremental steps.