The Universal Serial Bus (USB) standard has been around for quite some time now. Considering how quickly things change in the tech world, it’s a marvel to see how USB has flourished and maintained its universality with most mobile and host devices. And yet, making USB devices that fit the specifications for the standard still manages to be quite a difficult procedure.
The USB specification is made up of thousands of pages spread over dozens of distinct documents, and although there are great tutorials on the subject (including our own specification summaries on USB Tips), the process is still unbelievably long and difficult to undertake successfully. To make matters worse, the application programming interface (API) offered for programming USB devices is often extremely complex and detailed.
Here we will attempt to describe how to program your own software based USB devices. It is not limited to standard class devices, but also presents a way to implement any device, whether it complies with a standard class or not.
The Concept of Universal Serial Bus (USB)
In order to really understand what USB is all about you first have to learn its enormous glossary of terminology. Essentially, USB separates the host from the device. There is only one host, connected to multiple devices. The host starts all traffic and schedules it on the USB Bus. A “device” is essentially just a physical compartment at the end of the USB cable that indentifies itself to the host by passing it a device descriptor and a configuration descriptor. These “descriptors” are just binary data that describe the capabilities of the USB device. For the most part, the configuration descriptor describes one or more of the interfaces, where each interface is a specific function of the device. A device can have multiple interfaces. As an example, a USB device that is made up of a keyboard with a built in speaker will offer an interface for playing audio and an interface for key presses.
Each interface is made up of a series of endpoints that are the communication channels between the host and the device. Endpoints are numbered between 0 and 15 and may be IN endpoints or OUT endpoints. These terms are completely relative to the host in question: OUT endpoints transport data directly to the device, and IN endpoints transport data to the host. These are the four endpoints:
- Bulk endpoints reliably transport data whenever it is required. Bulk data is acknowledged and therefore fault-tolerant.
- Isochronous endpoints are for transporting real-time data. A fixed bandwidth is allocated to them. The host allocates this bandwidth and will not allow an isochronous endpoint to be created if no bandwidth is available. In contrast, bulk endpoints have no guaranteed bandwidth.
- Interrupt endpoints are polled occasionally by the host and enable a device to report status changes.
- The control endpoint (endpoint 0) is used to perform general operations, such as obtaining descriptors, or performing a control-operation such as “change the volume” or “set the baud rate” on any of the interfaces.
Traffic over the USB bus is bidirectional. USB traffic is organized into frames. The frames are marked by the host sending a start of frame (SOF) every 125 µs (for high-speed USB) or every 1 ms (for Full Speed USB). Isochronous endpoints are allocated a transfer in every frame. Interrupt endpoints are polled once every so many frames, and bulk transfers may happen anytime when the bus is not in use.
Let’s look at the keyboard and built-in speaker example again. This particular setup has at least two endpoints: and isochronous OUT endpoint to transfer audio data to the speaker, and an interrupt IN endpoint to poll the keyboard. Let’s say that the speaker is a mono-speaker with a 48 kHz sample rate. The host then will send 6 samples of data every 125 µs (six samples/0.000125 seconds = 48,000 samples/second). If a sample occupies 16 bits, the host will reserve enough bandwidth to send a 96 bit OUT packet in every 125 µs frame. This consumes around 0.5% of the USB bandwidth. The remaining 99.5% is free for other interfaces or other USB devices on the same bus.
Determining and Finding New Device Capabilities
The host is responsible for initiating all USB traffic. When a device is plugged in, the host first requests the device descriptor. This descriptor is made up of two sets of information that inform the host of the basic functions of the device: the device class and the vendor ID/ product ID (VID/PID).
The class and subclass can be used to specify a device with generic capabilities. A USB speaker advertises itself as class Audio 2.0. A keyboard promotes itself as a HID class (human interface device) device. The first example mentioned before about a device with both a speaker and a keyboard advertises itself as a Composite device class. USB devices that go with a specific USB class enable cross vendor and cross platform compatible USB devices. The USB specification determines hundreds of device classes that allow the generic implementation of things like Ethernet dongles, mixing desks, or flash disk and allow operating systems to provide generic drivers for these classes.
There are quite few cases where the USB device does not fir a specific class or where the class specification is too constrained for a particular device. In that particular kind of case, the class of the device must be described as vendor specific. The operating system (OS) shall the n use the VID and PID to find a vendor specific driver. Once the device descriptor has been dealt with, the operating system then assigns the USB device a number, informs the USB device of the number (it is being enumerated), and requests the configuration descriptor that specifies each interface in detail. In an example mentioned earlier, the configuration descriptor will specify two interfaces: one interface of class USB Audio 2.0 with a single channel output endpoint running at 48 kHz only, the other interface of class HID that specifies a single keyboard with a specific keymap.
There are many cases where the USB device does not have any operating support and it should interact with a user program directly. In that case, a generic drive such as the open source libusb drive that allows an application program to communicate with any USB device can be specifically used. Usually, the device will be advertised as vendor specific. Through the libusb interface the user program can detect a device with a VID and PID that it wants to interact with, claim an interface, open an endpoint, and sen IN and OUT requests to that endpoint.
What Happens to the Data?
The enumeration of the device usually needs static descriptors to be sent to the host. The difficult bit is making the descriptors themselves. Serving them is easy, as that is the only real task required of the device at the time. After enumeration, data may arrive or be requested on all endpoints in quick succession. This requires an interface between the software that deals with the function of the USB device and the low level USB protocol. Prior to designing this interface, let’s take a look at how to deal with data on the various types of endpoints available.
The Bulk endpoints are the easiest to deal with first of all. Since each data transfer is acknowledged, it is possible to send a negative acknowledge (NAK) stating that the device is not yet ready to deal with the endpoint. For example, if software is dealing with some other part of the device, or if data is just no yet available (for example, a read from NAND flash memory is not yet completed), the low level USB driver can send a NAK.
However, sending NAKs has a downside. The only sensible option for the host is to retry the request, potentially creating a long series of requests that are aborted by NAKs. Ultimately, this wastes USB bandwidth that could have been used up by other endpoints or devices. Also, the host software is blocked until the device answers. In other words, the NAKs should be a last resort. It may be more appropriate to send partial data than to NAK an IN request. IN the case of an OUT request, little can be done. If there is no room to accept the data from, then a NAK is the only answer. However, it may be more appropriate to bring in a high level protocol that would allow and OUT request until there is space.
Isochronous endpoints are a lot harder to deal with because they are not acknowledged. The transmitter (in either direction) assumes that the data arrives. Since there is no acknowledgement on an isochronous endpoint, there is absolutely no possibility to send a NAK. So if the device is not ready, then the only thing left to do is drop the data from an OUT packet or to send no data for an IN packet.
Although this could seem a bit harsh at first, remember that the main reason behind an isochronous endpoint is to transmit real time data in a guaranteed time slice of the USB bus. If the device does not have room to store the OUT data, data is probably not dealt with in real time. Dropping is the next logical course of action. If no data is available to answer an IN request, then the device has not collected enough data. A sensible course of action is to transmit whatever data is present or possibly no data at all.
Assuming that the data can be processed or produced in real time, it is easy to compute the buffer requirements for an isochronous endpoint:
- For an OUT endpoint, the worst possible case is that the host posts one OUT request right at the end of a USB frame, and then immediately after the start of frame (SOF) it posts a second OUT request. This means that two OUT requests, carrying 250 µs of data, are received in quick succession. Hence, the buffering scheme must be able to buffer at least 250 µs worth of data. As long as the program does not consume data from this buffer until the SOF following the first packet, the buffer will never empty, providing a continuous data stream from host to device.
- For an IN endpoint, the worst case is similar. The host could perform two IN transfers in short succession just before and immediately after a SOF. This means the IN buffer needs to be at least 250 µs too, and the buffer should contain 125 µs at the start of each frame.
It is also important to look a comparing bulk and isochronous transfers from a perspective of coping with errors. In bulk transfers, the data itself is critical. The host and device can retry and slow down, as long as the data is transferred in the correct way, and this transfer must be acknowledged. For an isochronous transfer, the timing is everything. Either side can throw data away in this type of transfer, as long as the real time characteristics of data further along in the stream are adhered to.
The data-centric versus time-centric approach has a knock-on effect on the consequences of bit errors. A cyclic redundancy code (CRC) for error detection protects all USB traffic. A corrupted bulk transfer must be retried until the data is transferred without error. On the other hand, a corrupted isochronous transfer will simply be dropped. The transmitting side will be unaware that data was dropped. The receiving side may know that the transfer was dropped (if the header with the endpoint was not corrupted), but even then how many bytes the transfer contained may not be determined. When streaming real-time video or audio this is important, since there will be an unknown gap in the stream that has to be filled with best effort.
Interrupt endpoints ask about the current state. This may be data that is not too time-critical (such as a key press), or it may be time-critical data (such as the X and Y location of a mouse or other pointing device). In the first case, a few microseconds of delay between typing the key and reporting it won’t hurt. However, when reporting mouse locations, irregular reporting may lead to unintended results.
Programming USB Devices
After analyzing how to deal with the different kinds of endpoints on the USB, we can then develop a programming model for software based USB devices. It is helpful to keep in mind how USB operates:
- There are one or more endpoints, for one or more interfaces, where traffic may arrive or depart at any time.
- Transfers on isochronous endpoints are time-critical.
- At most one transfer happens at a time.
The very first two points show a multi threaded programming structure, especially if more than a single interface is concerned, or if isochronous endpoints are being used. The basic software architecture assumes that there is some kind of USB device library and that for each of the endpoints we implement a thread that deals with USB transfers on that endpoint. Other parts of the system, not directly connected to the USB device library, are implemented using additional threads.
The USB software architecture is specifically designed for handling multiple endpoints. Keep in mind that one thread per endpoint may not be required and probably isn’t the best method to go about it. Seeing as only one transaction happens at a time (the third point), we can then create a version of the system that relies on fewer threads in the system. Let’s say that we want to implement a synchronous protocol over two endpoints where the host will always transmit data over bulk OUT endpoint, prior to receiving data on an associated IN endpoint. This protocol requires only a single thread that handles OUT and IN transactions in order on that endpoint.
This sort of optimization procedure isn’t without its inherent risks. Using a single thread per endpoint can ultimately lead to the host program aborting and restarting between the OUT and IN transaction. In this particular case, the sequence of transactions seen on the device will be:
- OUT, IN, OUT, IN, OUT, OUT, IN
Then the thread dealing with the OUT transactions must swallow the extra OUT. When optimized away to a program that sequentially consumes OUT and IN sequentially, this program must be written so that at any time it may expect the protocol to reset.
The third point allows a further optimization to take place. A single thread can deal with all bulk traffic on all interfaces, optimizing multiple endpoints into a single thread. The single thread receives a request (IN or OUT) on any endpoint, dealing with that request, whereupon it moves on to the next request, possibly on a different endpoint. If the next request arrives before the last request has been dealt with completely, the USB device library sends NAKs, temporarily holding up the host. This optimization has one disadvantage, which is that the single thread must keep state for each endpoint and is effectively context switching on each request.
Multiple endpoints can also be optimized into a single thread. This same kind of optimization cannot be applied to isochronous endpoints. If we had a single thread dealing with all of the isochronous data, it would involved FIFOs for each endpoint from which the thread will read data or post data. These FIFOs will increase latency, which is often undesirable. For the rest of this tutorial, we will discuss the software architecture and optimizations. Once example uses vendor specific drivers and mostly bulk endpoints (JTAG over USB), and the other shows a standard USB class with mostly isochronous endpoints (Audio over USB).
JTAG over USB
For debugging programs on embedded processors, it is common to use a protocol such as JTAG for accessing the internal state of the processor and to use a program such as gdb to run on a PC to interpret and modify state, set breakpoints, single step, and so on. USB can be used to provide a cross-platform portable transport layer between the PC and JTAG wires.
These devices are often called JTAG keys. In addition to JTAG, they often contain a UART for text I/O from the embedded program. JTAG keys do not follow any standard USB class. Hence, the descriptor labels them as vendor-specific, and it is up to us to define an endpoint structure that is fit for purpose. One endpoint structure would use six endpoints:
- Two endpoints that control the USB device itself (endpoint 0 IN and OUT, required by USB)
- An IN and OUT endpoint for JTAG traffic
- An IN and OUT endpoint for UART traffic
Since there is no USB standard in this case, we can go ahead and define the protocol for the JTAG traffic and decide on a set of commands such as “send a clock with TMS high” or “read the program counter.” For the host, our program can use libusb (an open source USB driver library_ to search for a device with our VID and PID, claim the interface, and then us e the libusb interface to send IN and OUT transaction to both the JTAG and UART endpoints. Given that all endpoints are for bulk traffic, they can all be mapped onto a single thread and have two separate threads to deal with the state machines for JTAG traffic and UART traffic. Figure 5 shows a sample implementation. The JTAG over USB employs multiple endpoints. A JTAG interface can be implemented using USB hardware and a standard 20-pin JTAG connector.
Audio over USB
Finally, let’s discuss an example of a standard USB device. In this case we’ll be talking about Audio over USB. The Audio 2.0 class standard enables interoperability of devices on platforms. So someone could buy a USB microphone or USB speakers (for example) and plug it into any computer that supports Audio over USB. The number of channels, sampling rate, and sample depth can be varied to support anything from low channel count consumer devices to high quality, high channel count professional audio.
Devices that are more complex also are supported. The descriptor has a specific syntax for describing mixers, volume controls, equalizers, clocks, resampling, MIDI, and many other functions, although not all of those functions are recognized by all operating systems.
On the host side, all USB traffic carrying audio samples is directed to the USB Audio driver, which in turn interacts through some general kernel sound interface with the program using audio, such as Skype. Other data, such as MIDI, can be handled through a separate interface by a separate driver. The device is designed to use USB Audio Class 2.0, and the standard specifies the endpoints that we need to use. FI the application has to support MIDI, stereo in, and stereo out with a clock controlled by the device, then the standard dictates that there will be seven endpoints:
- Two endpoints that control the USB device itself (endpoint 0 IN and OUT, required by USB)
- An isochronous IN endpoint for the I2S analog-to-digital converter (ADC)
- An isochronous OUT endpoint for the I2S digital-to-analog converter (DAC)
- An isochronous IN endpoint for feedback on the clock speed
- A bulk IN endpoint and bulk OUT endpoint for MIDI
The endpoints for the ADC and DAC have one IN and OUT transaction every microframe, every 125 µs. Assuming that the DAC and ADC operate with a 96-kHz sample rate, 12 samples are sent in each direction every 125 µs. Note that there are two independent oscillators: the device controls the 96-kHz sample rate, and the host controls the 125-µs microframe rate.
As these clocks are independent, they will drift relative to each other, and there won’t always be 12 samples in each transfer. The vast majority of the transfers will have 12 samples, but sometimes there will be 13 or 11 samples.
The device uses the third isochronous endpoint to inform the host of the current speed. It is sampled once every few milliseconds and reports the current sample rate in terms of samples per microframe. The MIDI endpoints carry MIDI data as and when available. The standard provides flexibility, allowing us to easily add more audio channels or audio processing.
Figure 6 shows the software architecture for this device. Unlike the previous example, there is little that can be optimized. The class specification dictates the endpoint structure. With three isochronous endpoints, it is advisable to have three processes ready to accept and provide data on these endpoints. The only optimization that is feasible is for a single thread to handle Endpoint 0 and the MIDI endpoints.
USB devices comprise many interfaces that run concurrently and endpoints that are either bulk or isochronous. Bulk endpoints are for reliable data transport between host and device, whereas isochronous endpoints are for real-time data transport.
When programming USB device endpoints, it is easiest to see those endpoints as individual software threads. Some of those can be mapped onto a single thread, but the programmer has to understand the consequences. In particular, mapping multiple isochronous endpoints onto a single software thread will introduce an (unpredictable) latency in the real-time stream.