Mark Whitis's Website Home Page Linux Book: Linux Programming Unleashed My Resume Genealogical Data Contact Info Security About

d [HOME(Mark Whitis)] [Contact] [Resume] [Browser Friendl] [No Spam] [FEL] [DBD]

JTAG

[view

JTAG is a serial protocol, similar to SPI in some respects, that is used for boundary scan testing, in circuit emulation, and flash programming. It is standardized in IEEE 1149.1-1990. In boundary scan mode, all the I/O pins on all JTAG devices on a board may be connected together in one giant shift register with one or more bits per pin. There are control registers and ways to bypass the shift register for individual devices (replacing them with a one bit register).

The picture above shows a slightly ficticious chip with a simple 8 bit D-flip flop register with clock and output enable controls with a JTAG TAP (Test Access Port) controller and 18 boundary cells, one on each I/O pin, added. The boundary cells are connected together in a big daisy chain loop. I show only the data path in that loop, there are a number of control signals as well. Individual chips are then connected together in a longer loop. I say the chip is ficticious but there are similar parts: TI SN74BCT8374A is almost identical. The amount of test logic on such a simple device far exceeds the amount of operational logic.

JTAG is not an open standard in the sense that internet RFCs and USB are and the way all standards should be, where you can download the standard documents for free. Instead, they fund the standards process by selling copies of the standards documents. It would cost you around $400 to purchase a copy of the standards plus possibly a couple hundred more for books to explain them. This is prohibitively expensive for hobbyists, open source developers, some small businesses, and lots of other people who can't justify forking over that kind of money to buy a standard that they don't know if it will help them until they have read it. Add $100-$400 or so for a USB JTAG pod or cable . And fancy commercial software can cost close to $3000 a copy. There is some free software, with limited functionality, no-frills parallel cables can be had for around $20, and there is almost enough information on the net to do some serious stuff with JTAG, though you can spend weeks finding it. This page will help you in your quest.

A long time ago, I worked for a company specializing in built-in test. I wrote the software that analyzed the reliability of synthetic aperature radar systems, software that determined in what order to perform tests, built in diagnostics software for sonar systems, and the hardware device that actually injected simulated faults into the system. (I did other things, as well: I wrote the billing software, the timecard system, the purchase order system, adminstered the network and the PBX, and reverse engineered the proprietary network stack (including the firmware on the NIC card)) so I could write network utilities. We were among those who pushed for inclusion of boundary scan in ICs and the creation of the JTAG standard in the first place and one of my coworkers was on the standards committee. Unfortunately, I didn't get to work with JTAG at the time because the standard didn't exist and there weren't actually any parts that implemented boundary scan. 25 years later, JTAG is availible on far fewer chips than it should be. It is found on most FPGAs, CPLDs, better microcontrollers and processors, and some peripherals. It still isn't available on gates and most other discrete logic parts; there are a few overpriced registers and bus transceivers with JTAG, similar to the one pictured above. But it is several times cheaper to replace those parts with a CPLD and instead of needing glue logic to drive the logic chip, you can reprogram the CPLD to be compatible with your signals and maybe suck in some extra glue logic as well. Now that some of the programable logic vendors provide free (as in beer) software, small logic parts are mostly obsolete. And to take best advantage of JTAG for testability, you will want to use programable logic and microcontrollers instead of discrete logic. And JTAG is widely used, beyond its original intended purpose, for programming those microcontrollers and programmable logic devices.

What JTAG can do?

Here are things JTAG can do or potentially do:

Limitations

Attempts to use JTAG in the ways described above can be hampered by:

BSDL

A supplement to 1149.1-1990 defined the boundary scan description language (BSDL). BSDL is a subset of VHDL. IEEE 1532 extended BSDL to have information on flash/FPGA/CPLD programming. Another important improvement in 1532 is list which maps pin numbers to pin names. Note that for many programable logic devices, the BSDL files distributed by the vendor describe the device before configuration but the configuration may change circuitry between the boundary scan registers and the pins. Xilinx has a bsdlanno program to generate a BSDL file for a specific design that describes the device post-configuration. ur Xilinx has no-cost with source software, called J Drive, that reads 1532 files and programs devices. (registration required). Altera's equivalent is JAM

The PIN_MAP_STRING tends to be rather haphazardly split into lines with the string concatenation operator "&"..

BSDL parsers

BSDL was defined in an IEEE supplement and extended in IEEE STD 1532. Kenneth P. Parker and Stig Oresjo at HP (now Agilent) distributed the source to what was probably the original BSDL parser. It can also be found as a listing in an appendix of the Boundary Scan Handbook containing a copy of the paper A language for describing boundary scan devices (1991). The license is a little vague but appears permissive in intent. The language has changed since then. IEEE 1149.1-2001 lists some changes and IEEE 1532 lists some additional changes.

BSDL files include standard packages written in VHDL, depending on the version of BSDL used: Package STD_1149_1_2001 also found in an Actel BSDL format application note , Package STD_1149_1_1994 also available as a chapter from the Boundary Scan Handbook, 2nd ed: Analog and Digital for $25, and Package STD_1149_1_1990 also in appendix B of the first edition of The Boundary Scan Handbook. Package STD_1532_2002. The license on those files is unclear. They seem to be used as if there were no restrictions on copying and the standard pretty much says though shalt use exactly this text which could be taken as an implied license.

Mentor graphics has published a paper on ABSDL, an analog extension to BSDL for mixed-signal test. Optional package declaration is "use STD_1149-4_version.all" where "version" refers to an updated version of the standard that hasn't been released yet.

HSDL

HSDL is a BSDL derived syntax for defining board level details.

Test Description Language (TDL)

SVF, XSVF, JAM, STAPL

SVF (ascii) and XSVF (binary) are files created by xilinx tools that contain JTAG instructions for how to program a particular device. "player" software source code is available (see links below). They are pretty limited, they let you do SHIFT-IR and SHIFT-DR and specify the data to be transfered and the response to expect and specify fixed bit strings to send and match before and after but they don't let you get creative with other TAP controller states which you might need in a boundary scan application. JAM evolved into STAPL which is a JEDEC standard, available from JEDEC as JESD71 (direct linking not permitted). It is an acronym for Standard Test And Programming Language and is designed for programming and testing devices over JTAG. It also supports devices with human interfaces. It was designed in such a way, though that it is not practical for devices with limited memory. There is also apparently a JEDEC standard transfer format, that isn't very popular.

STV and SRV

STV is an acronym for Sequenctial Test Vector and SRV is an acronym for Sequential Response Vector.

Standards

Links to the actual standard documents can be found in the books section, below.

IEEE STD 1149.4 Analog JTAG

There are extensions to JTAG that allow signals to be routed to and from two analog test pins. The National STA400EP is an analog mux with analog boundary scan built in but it appears it is being discontinued.

The two internal analog buses, AB1 and AB2, are connected by a 2x2 crosspoint switch to the two test pins. AT1 and AT2; switching pins is used for calibration?. Switch impedance is not necessarily low. "4 wire" measurements of an external resistor are done by taking two measurements, one on each side. 30 node limit per internal bus (typical practical limit), can be increased to 900 by two level hierarchy. Parts with incompatable voltage ranges may need a separate bus or perhaps a voltage clamp.

IJTAG

Work is afoot to provide faster TAP access to internal logic by providing a higher speed TAP based on SERDES (such as used by gigabit ethernet, PCI express, SATA, etc.). This is oriented more towards factory test rather than board level test. IEEE P1687.

Topologies

Single Chip

[single chip connection]

TAP Connector connects directly to the JTAG pins on a single chip. A lot of microcontrollers require this since they also use the TAP for a non-jtag compliant On-Chip debug mode.

Single Scan Chain

[4 chips connected in a scan chain]

Multiple Chips are connected with TDO of one chip driving TDI of the next chip. TAP TDI connects to first chip and TAP TDO connects to last chip. TMS and TCK are connected in parallel to each device.

Separate chain for microcontroller

[4 chips connected in a scan chain]

Another configuration is to have a completely separate chain for a microcontroller. This is because the microcontroller often supports on-chip debug via the JTAG pins but it may use the port in non-standard ways or the software may be confused by other chips on the bus. The microcontroller has a pin that selects whether the TAP port is in JTAG mode or OCD mode.

Multiple chips with separate TDI and TDO

[4 chips connected in 4 scan chains]

TAP port has separate TDI and TDO pins for each chip but TMS and TCK are shared. Allows faster access to each chip.

Multiple scan chains sharing TMS and TCK

[2 chips each in 2 scan chains]

Similar to above but more than one chip on each TDI/TDO pair.

JTAG muxes and multidrop

[multidrop: two boards with three chips each connected to mux]

There are devices that provide access to multiple scan chains. These typically provide two levels of multiplexing. Second, a number of multdrop chips are wired in parallel on the backplane. Second, each device typically provides a number of TAP ports that can be switched into the chain. See National, Fairchild, and TI below. An article at embedded.com on scaling JTAG has more info on multi-drop multiplexors. TI uses a "linking shadow protocol" in which data on TDI in certain states other than Shift-IR/Data-IR is used to control the mux. Some of the others basically appear in the JTAG chain and control data is shifted into the instruction register. Until a device on the backplane bus has been selected, TDO is tristate. The linking shadow protocol is less consistent, perhaps, with JTAG philosophy but it could be advantagous when using software that doesn't understand port multiplexors as a utility program can be used to configure the multiplexor(s) and then the presence of the multiplexor is invisible to the software (provided it does not wiggle TDI in states where it is inappropriate to do so). Some multiplexors also have the ability for one of the slave ports to take over a master. This allows an embedded CPU to perform built in self test or reconfiguration while still allowing external test. Many of the multiplexors also allow the JTAG slave ports to be tristated allowing an external JTAG tester to be connected. This could be used, for example, to perform on chip debug on a microcontroller that normally appears in the JTAG chain.ses

JTAG Pinouts

In the descriptions, pins will be described as being connected to pins on a PC parallel port; this involves a buffer or at least a resistor, and often both. All connectors are 0.025" square pin headers with 0.1" spacing unless otherwise noted.

Xilinx

This pinout is used on the Spartan3, Spartan3E, and CPLD development kits sold by Xilinx (and designed by digilent). It is the pinout used by the Digilent (software compatible with Parallel III but different pinout) . Schematic for the JTAG3 can be found in figure A-9 (page 66) of Xilinx UG13- Spartan-3 Starter Kit Board User Guide (Now deleted). This cable claims to work from 1.5 to 5V but at lower voltages the driver running off the JTAG VDD may have trouble driving a logic 1 on the buffered TDO line to the PC and probably relies on a pullup resistor on the parallel port. Uses NL37WZ17 to drive signals (100 ohm series resistor) and NL17SZ126 to read back TDO.

      TMS  1     
      TDI  2
      TDO  3
      TCK  4
      GND  5
      VDD  6
The schematic for the Parallel Cable III can be found at here . It has another JTAG pinout:
      JTAG        Parallel port
      ----        ------------------------
      VCC   1        15  (vcc sense)
      GND   2          
      (key) 3
      TCK   4        3   (tristated by pin 5 high)      
      (key) 5  
      TDO   6       pin 6 drives ???? (open collector), read back on pin 13
      TDI   7       pin 2 drives  (tristated by pin 5 high)
      (key) 8
      TMS   9       4    (tristated by pin 5 high)

Pins 8 (D6), 11 (BUSY), and 12 (PE) are jumpered together on the parallel port and cable detect probably comes from this and by wiggling pin 6 and seeing if pin 13 responds.

The Parallel Cable III also has another header used for non-jtag modes. Some of the features like driving TDO and tristate enables probably have more to do with the other connector.

      VCC   1
      GND   2
      CCLK  3
      (key) 4
      (key) 5
      D/P   6
      DIN   7
      PROG  8
      (key) 9

Note that the two connectors are side by side, effectively making a 2x9 header and that numerous pins have been removed for keying purposes. The Parallel Cable III comes with flying leads http:


The Xilinx Parallel Cable IV


2mm connector
                 JTAG  Slave Serial SPI
                 ---  ------------ ----
       GND  1 2  Vref   Vref       Vref
       GND  3 4  TMS    PROG        SS
       GND  5 6  TCK    CCLK        SCK
       GND  7 8  TDO    DONE        MISO
       GND  9 10 TDI    DIN         MOSI
       GND 11 12 NC     NC          NC 
       GND 13 13 NC    INIT          NC
"Pin 1 is not  a true digital ground.  It must be connected to digital ground at the target system."
Note that this cable has logic on board with an internal 40Mhz clock and can send data at up to 5MHz, it is not a simple buffer.   It can also emulate a Parallel Cable III.   
(From Xilinx DS097)

Old text: The Spartan 3 board has another JTAG header for use with the Xilinx Parallel Cable IV and MultiPro Desktop Tool. This pinout for this 2mm connector is:

                JTAG        Serial
     GND  1  2  VCCAUX       Vref
     GND  3  4  TMS          PROG
     GND  5  6  TCK          CCLK
     GND  7  8  TDO-A        DONE
     GND  9 10  TDI          CCLK
     GND 11 12  (NC)         (NC)
     GND 13 14  (NC)         INIT

Altera

     TCK  1 2 GND
     TDO  3 4 ND
     TMS  5 6 N/C
     N/C  7 8 N/C
     TDI  9 10 GND 

Note that this pinout is A subset of the JTAG10PIN pinout used on atmel AVR micros.

ARM

Official 20 pin pinout

      VTref   1 2  Vsupply
      nTRST   3 4  GND
      TDI     5 6  GND
      TMS     7 8  GND
      TCK     9 10 GND
      RTCK   11 12 GND
      TDO    13 14 GND
      nSRST  15 16 GND
      DBGRQ  17 18 GND
      DBGACK 19 20 GND

All ground pins must be connected. DBGRQ and DBGACK are rarely used; they connect to lines on the ARM core that are rarely brought out to physical pins and are manipulated via other means (as bits in the jtag chain?). nSRST is the system reset (active low) and it is optional but recommended as Multi-ICE and RealView ICE uses is for debug of XScale processors, reset on startup, reset from the debugger, and reset detection and is needed for Auto Configuration on some processors. nTRST is optional, it resets the JTAG TAP controller state mahine and the embedded ICE logic. Resetting of the TAP controller can be accomplished by 5 TCK cycles with TMS high> AT91SAM7 CPUs don't bring nTRST to a pin. OpenOCD can be configured to work with or without nTRST and nSRST. Both reset signals are open drain. RTCK is an optional readback on the TCK pin and is used for adaptive clocking. Vsupply is used to power the JTAG dongle and VTref is used to set the thresholds. ARM's official jtag interfaces put a 10K pulldown resistor on these lines. ARM recommends 10K pullups/pulldowns with pullups on TMS, TDI, TDO, nSRST, and nTRST and a pulldown (odd) on TCK. Atmel's AT91SAM7 eval board uses pullup on TCK. Multi-ICE, RealView ICE, and OpenOCD support other devices on the JTAG chain, however manual configuration may be required. Multi-ICE uses analog switches (between ground and VTref) to drive the JTAG signals with 100 ohm series resistors.

K9JTAG

K9JTAG (schematic) : (software compatible with wiggler). Note that some people have reported that newer versions of the Macraiger software don't work with many homebrew wigglers. Someone reported that adding a jumper between pins 8 and 15 on the DB-25 fixed this. It was also reported that DB25 pin 6 needs to be connected through a buffer to IDC20 pin 3 to drive the nTRST signal (appears to refer to yet another ARM JTAG connector pinout). The k9spud schematic linked here appears to incorporate both. On the Atmel AT91RM9200, the jtag pins are used for either ICE or JTAG depending on the setting of JTAGSEL; the chip must be reset using NRST and NTRST after JTAGSEL is changed. There may be issues with other devices in the JTAG chain in ICE mode. It also appears that NTRST needs to be asserted during power on reset; perhaps a diode will suffice.
    TRST   1  2   TDO
    RTCK   3  4   TDI
    GND    5  6   TCK
    GND    7  8   TMS
    +3V    9  10   RESET

Altera USB blaster

Altera makes a cable called the USB blaster. Someone made a USB jtag with a EZ-USB FX2 partly emulating at FT245 that switched to using USB blaster protocol or Xilinx USB cable?, works with openwince and openocd. Looks like this software is actually running on the xilinx platform cable usb on the start kit. So this would have info on altera's protocol.

some protocol reverse engineering. IXO usb_jtag Byteblaster clone.

TI DSP

     TMS      1   2 nTRST
     TDI      3   4 GND
     Vcc      5   6 N/C
     TDO      7   8 Vcc
     TCK      9   10 GND
     TCK_ret 11   12 GND
     EMU0    13   14 EMU1

TI MSP430 Microcontrollers

On the olimex JTAG for MSP430xx cpus we see a different pinout:

    TDO   1 2 VCC_IN
   TDI    3 4 VCC_OUT
   TMS    5 6 NC
   TCK    7 8 TEST/VPP
   GND    9 10 NC
 RST/NMI 11 12 NC
   NC    13 14 NC

ARM

Multi-ICE (Amontec Jtagkey):

         VREF   1 2 VREF
         TRST_N 3 4 GND
         TDI    5 6 GND
         TMS    7 8 GND
         TCK    9 10 GND
         n/c   11 12 GND
         TDO   13 14 GND
         SRST_N 15 16 GND
         n/c   17 18 GND
         n/c   19 20 GND
(0.1" pitch)
JTAGKey to FTDIchip pin map (channel A):
  ADBUS0   TCK
  ADBUS1   TDI
  ADBUS2   TDO
  ADBUS3   TMS
  ADBUS4   JTAG output enable
  ADBUS5   VREF sense, negated (vref>1.26V?)
  ADBUS6   nSRST IN
  ADBUS7
  ACBUS0   nTRST
  ACBUS1   nSRST
  ACBUS2   nTRST driver enable
  ACBUS3   nSRST driver enable
Amontec also makes a smaller, less expensive version JTAGKey-Tiny, that plugs right into ARM 20 pin headers but is more difficult if you need to connect This pinout has been reported to be used on some other arm systems:
 (VTref) Vcc    1  2   GND
         nTRST  3  4   GND
         TDI    5  6   GND
         TMS    7  8   GND
         TCK    9  10  GND
         TDO   11  12  N/C (nSRST)
         Vcc   13  14  GND
There are apparently two different incompatible 14 pin pinouts, one for Embedded ICE and one for TI. Embedded ICE shown above, the TI below: TMS 1 2 nTRST TDI 3 4 GND VCC 5 6 N/C TDO 7 8 GND RTCK 9 10 GND TCK 11 12 GND EMU0 13 14 EMU1

MIPS

Many MIPS processor based systems use the EJTAG pinout.


nTRST  1  2 GND
TDI    3  4 GND
TDO    5  6 GND
TMS    7  8 GND
TCK    9 10 GND
nSRST 11 12 (key)
DINT  13 14 VCC

EJTAG

EJTAG refers to MIPS JTAG extensions for on chip debugging. "EJTAG provides run control, breakpoints on both data and instructions, real-time Program Counter trace". On some chips it provides direct memory access as well. EJTAG at hardwarehacking.com

Linksys/broadcom Routers

Note that some appear to use the standard ARM 20 pin, others use a 12 or 14 pin connector (looks MIPS EJTAG style).


 nTRST  1   2 GND
 TDI    3   4 GND
 TDO    5   6 GND
 TMS    7   8 GND
 TCK    9  10 GND
 nSRST 11  12 GND

 nTRST  1   2 GND
 TDI    3   4 GND
 TDO    5   6 GND
 TMS    7   8 GND
 TCK    9  10 GND
 nSRST 11  12 n/a
   n/a 13  14 Vcc

The 14 pin connector is based on the MIPS EJTAG standard. Notice there is no VDD on the 12 pin so this must be picked up somewhere if you are using a buffered cable. Also, the Hairydairymaid/lightbulb linksys de-brick utility fails with a buffered cable (or even an unbuffered cable with all pins connected) as it fails to set the nTRST line high (fixed now?).

Atmel AVR

Some AVR boards use this JTAG pinout (JTAGICE mkII)

  TCK  1 2 GND
  TDO  3 4 VTref
  TMS  5 6 nSRST
       7 8 (nTRST)
  TDI 9 10 GND
Some AVR controllers may have only debugwire (see below) Elmicro AVR-JTAG-USB (EUR$45) also uses this pinout; FT2232 based?

Flying Leads

Flying leads can be used to connect to different jtag connectors without having a bunch of adapters. Amp-modu or c-grid contacts can be used to mate with 0.100" contacts to add flying leads. They are a pain to connect and disconnect, though. A milmax SSQ style connector (long tail version) has female contacts and a wirewrap (0.100") posts; flying leads can be connected to such a connector to ease connecting and disconnecting. The part number for a 2x10 would probably be SSQ-110-03-G-D (gold tail) or SSQ-110-03-S-D (tin); available at digikey. If anyone knows where to get packages of precrimped on wires (both ends) bare amp-modu or c-grid contacts, let me know; they are very handy for making custom JTAG cables and many other custom cables using these upiquitous connectors. With insulation (heat shrink) suitable for use as flying leads, they are available at schmartboard ($10/5 or $38/100); seetron.com, linkins4.com, and superdroidrobots.com, and digikey (924964-ND) sell similar but are more expensive.

PCI/PCI express

PCI/PCI express slots have some pins set aside so a board can be inserted and get access to the motherboard JTAG chain. Apparently some motherboards don't connect it. And most POST display diagnostic cards aren't smart enough to bring the JTAG out to a header; I have yet to find one that does. Your best bet might be to tack solder some wires on a PCI card. Even www.uxd.com which makes a wide variety of POST cards, some with RS-232 serial output and one with I2C and even cards which self boot. SMBus may be accessable as well?

JTAG Details

A JTAG Test Access Port (TAP) consists of 4 signals, with an optional fifth signal:

Some JTAG ports have some additional non-standard signals. Microcontrollers often have a debug enable signal that switches between normal JTAG mode and OCD mode. Some ports have a RTCK signal which echos the TCK signal. In many cases TCK is tied to RTCK at the JTAG connector and this may be used to adjust for cable delay. On some ARM7TDMI based chips, this is used to synchronize TCK to the internal system clock. Thus the idea is for the JTAG pod to assert an edge on TCK and wait for the processor to return that edge on RTCK before proceeding. The philips LPC2468 microntroller has RTCK but the Ateml AT91SAM7XC256 does not, though both use the ARM7TDMI core. Many ports have a SRST signal, to reset the processor and other logic. It may also indicate to a debugger that the processor has been manually reset. Some processors have a separate reset input and reset output. Other processors will pull their reset line low when an internal reset is generated. This is a smarter design as it not only saves a pin but it allows one or more chips to hold the system reset low. On the LPC2400 family, RSTOUT is a 1.8V output while JTAG pins are 5V tolerant 3.3V signals, presenting level translation issues if one were to attempt to read this signal.

Most JTAG interface connectors have a Vcc/Vdd/Vref pin that is used to adjust the level translators to the appropriate voltage level for the system. In addition, many JTAG interfaces have a system reset line; you may want to sense the state of that line to signal a debugger when a chip has been reset. You may also want to measure the presence of Vref. Since some systems use the JTAG pins for other purposes as well, you may want to tristate the outputs. The voltage level varies depending on the technology being used. Multiple chips (or even boards) with JTAG support can be chained together with the TDO of each chip connected to the TDI of the next. The remaining signals are connected in parallel. Timing is based on the TCK pin so a PC parallel port (with appropriate voltage translation if needed) can be used as a controller. TDI/TMS data is latch on the rising edge of TCK. TDI/TDO/TMS typically change on the falling edge of TCK. You could probably do a full synchronous implementation where the data changes on the rising edge but you would have to be carefull with cable skew and hold types.

Note that the idea of having a TRST line may be less so the pod can reset the TAP controller, which is unnecessary since 5 TCKs with TMS high will accomplish the same result, and more so the board design can pull TRST low through a low value pull-down resistor which the POD can override. Thus, it may be necessary to implement TRST even if you have no need to use it.

In addition, some microcontrollers have a trace port (called ETM on ARM7TDMI). Since the internal address and databus are not usually brought out to pins the trace port can be used to monitor the flow of program execution. On the ARM7TDMI, this is about 8 signals clocked out at the CPU clock rate. By decoding those signals, you can tell when the next instruction in sequence has been executed and when a branch has been taken. A debugger with access to a copy of the code loaded into the micro, a which thus knows the size of each instruction and the destination of each branch can map the flow. Interrupts, however, could complicate this. On ARM7TDMI, a low on RTCK during reset may enable the ETM port. The ETM is configured through the JTAG interface. Use of ETM involves giving up some I/O lines.

JTAG TAP Controller

Each JTAG enabled chip has a TAP controller with a state machine with 16 states. The state of the TMS pin at each TCK pulse dictates the transitions between states. These states help control whether you are shifting through the instruction register or one of the data register and control the capture of data into the shift registers and latching of data from them.

Here is a diagram of the TAP controller state machine, produced using graphviz (requires browser/plugin SVG support):

[JTAG State machine diagram]

Each ellipse denotes a state and each arrow denotes a transition from one state to another. There are two transitions from each state, one taken when TMS=0 and one taken when TMS=1. The shift IR and Shift DR states are emphasized because that is where you will spend the most time (other than when idle). These are the states where data is shifted in on TDI and out on TDO and the machine will stay in those states as long as TMS=0. The Exit* and Pause* states are not terribly interesting. For either the data or instruction registers, you usually start by passing through the Capture_* state which loads data from the chip into the register, proceed to the shift* state where the captured data is shifted out while the new data from the PC is shifted in, and then make your way through a couple minor states to the update state where the new data is latched into the chip.

Here is the TAP controller state machine as a table, which is easily converted into C, VHDL, Verilog, or other languages.

Current State
TMS=0TMS=1
Test-Logic-ResetRun-Test/Idle(no change)
Run-Test/Idle(no change)Select-DR-Scan
Select-DR-ScanCapture-DRSelect-IR-Scan
Capture-DRShift-DRExit1-DR
Shift-DR(no change)Exit1-DR
Exit1-DRPause-DRUpdate-DR
Pause-DR(no change)Exit2-DR
Exit2-DRShift-DRUpdate-DR
Update-DRRun-Test/IdleSelect-DR-Scan
Select-IR-ScanCapture-IRTest-Logic-Reset
Capture-IRShift-IRExit1-IR
Shift-IR(no change)Exit1-IR
Exit1-IRPause-IRUpdate-IR
Pause-IR(no change)Exit2-IR
Exit2-IRShift-IRUpdate-IR
Update-IRRun-Test/IdleSelect-DR-Scan

Note that aside from the first two states, the remaining states are divided into two sets of 7 states which are almost identical except that one set deals with the instruction register and one deals with the data register. These two sets of states deal with three operations: capture, shift, and update. Capture loads data from the chip into the corresponding register.Data is latched on the rising edge of TCK when exiting the capture state. Shift moves the data out through TDO while new data is shifted in on the TDI pin. The update-xx states cause the data from the shift register to be transfered to the selected register. The update occurs on the falling edge of TCK while still in the update state.

The state machines main states are reset, run/idle, shift DR, and shift IR.

Data is shifted in on the rising edge of TCK and data out becomes valid on the falling edge of TCK (to be latched on the rising edge). I.E. Rising edge is the active edge for input and output and the transitions occur at the falling edge. TMS=0 to keep the controller in the shift-xx state until all the data is shifted out; TMS=1 when the last bit is shifted.

Five TCKs with TMS high will reset the state machine to the Test-Logic-Reset state from any other state.

C implementation


   #define TAP_TEST_LOGIC_RESET 0
   #define TAP_RUN_TEST_IDLE 8
   #define TAP_SELECT_DR_SCAN 1
   #define TAP_CAPTURE_DR  2
   #define TAP_SHIFT_DR_SCAN 3
   #define TAP_EXIT1_DR 4
   #define TAP_PAUSE_DR 5
   #define TAP_EXIT2_DR 6
   #define TAP_UPDATE_DR 7
   #define TAP_SELECT_IR_SCAN 9
   #define TAP_CAPTURE_IR 10
   #define TAP_SHIFT_IR 11
   #define TAP_EXIT1_IR 12
   #define TAP_PAUSE_IR 13
   #define TAP_EXIT2_IR 14
   #define TAP_UPDATE_IR 15
   int jtag_tap_controller_state_machine(int tms) 
    {
       static int state=TAP_TEST_LOGIC_RESET;
       int nextstate;
       switch(state) {
            case(TAP_TEST_LOGIC_RESET): if(!tms) {nextstate=TAP_RUN_TEST_IDLE;} else {  ; } break;
	    case(TAP_RUN_TEST_IDLE): if(!tms) { ; } else {nextstate=TAP_SELECT_DR_SCAN;} break;
	    case(TAP_SELECT_DR_SCAN): if(!tms) {nextstate=TAP_CAPTURE_DR;} else {nextstate=TAP_SELECT_IR_SCAN;} break;
	    case(TAP_CAPTURE_DR): if(!tms) {nextstate=TAP_SHIFT_DR;} else {nextstate=TAP_EXIT1_DR;} break;
	    case(TAP_SHIFT_DR): if(!tms) { ; }; else {nextstate=TAP_EXIT1_DR}; break;
	    case(TAP_EXIT1_DR): if(!tms) { nextstate=TAP_PAUSE_DR;} else {nextstate=TAP_UPDATE_DR;} break;
	    case(TAP_PAUSE_DR): if(!tms) { ; } else {nextstate=TAP_EXIT2_DR}; break;
	    case(TAP_EXIT2_DR): if(!tms) { nextstate=TAP_SHIFT_DR; } else {nextstate=TAP_UPDATE_DR;} break;
	    case(TAP_UPDATE_DR): if(!tms) {nextstate=TAP_RUN_TEST_IDLE; } else {nextstate=TAP_SELECT_DR_SCAN;}; break;
	    case(TAP_SELECT_IR_SCAN): if(!tms) {nextstate=TAP_CAPTURE_IR;} else {nextstate=TAP_TEST_LOGIC_RESET;} break;
	    case(TAP_CAPTURE_IR): if(!tms) {nextstate=TAP_SHIFT_IR;} else { nextstate=TAP_EXIT1_IR; } break;
	    case(TAP_SHIFT_IR): if(!tms) { ; } else {nextstate=TAP_EXIT1_IR;} break;
	    case(TAP_EXIT1_IR): if(!tms) { nextstate=TAP_PAUSE_IR; } else { nextstate=TAP_UPDATE_IR; } break;
            case(TAP_PAUSE_IR): 	if(!tms) { ; } else {nextstate=TAP_EXIT2_IR; } break;
	    case(TAP_EXIT2_IR): if(!tms) { nextstate=TAP_SHIFT_IR; } else {nextstate=TAP_UPDATE_IR; } break;
            case(TAP_UPDATE_IR): if(!tms) {nextstate=TAP_RUN_TEST_IDLE; } else {nextstate=TAP_SELECT_DR_SCAN; } break;
            default: nextstate=TAP_TEST_LOGIC_RESET;	    
       }
       state=nextstate;
       return(state);
    }

This code is an example, intended for illustrations, it hasn't been tested. "enum" types may be used instead of "#define". This function should be called once on the rising edge of TCK. While you probably don't want to implement a TAP controller in software on an actual hardware device (it would be very slow), simulating one is useful for keeping track of the state of a controller you are talking too, debugging, educational purposes, etc. This example can also be pretty easily converted to VHDL or Verilog for synthesis or simulation. Note that #defined values for the various states were chosen so that the similarly named instructions (which have similar state transitions) share 3 bits in common to facilitate logic minimization.

IEEE 1149 has the combinatorial form of the state machine logic expressed as a 4 sums of 17 products, each product having at most 4 inputs. Thus 21 4 input LUTs could be used to implemnt that logic (about double that when you include the sequential portion).

The instruction Register (required)

The size of the instruction register varies from chip to chip. Minimum is 2 bits. It contains a shift register, a storage register, and decode logic. The value 01 is loaded into the two least significant bits of the instruction shift register during the capture-ir state. The instruction register also functions as a status register. What is loaded into the other bits depends on the chip.

Required Instructions

BYPASS

The bypass instruction is all 1s. The bypass register is loaded with a 0 when the bypass instruction is executed and the chip enters the Shift-DR state. The contents of the bypass register are ignored during an update-DR. With multiple devices in the chain, you can selectively put some devices into bypass mode by loading a bypass instruction into some instruction registers and another instruction, such as sample/preload, into the others.

SAMPLE/PRELOAD

This instruction loads the state of each input pin into the boundary scan register and also the internal logic state of each output pin. This instruction does not interfere with normal functions. Sample and Preload may be separate or combined.

EXTEST
The EXTEST instruction is all zeros.

External test. This does interfere with normal function. It performs an external boundary scan test to test things like chip to chip connections. In EXTEST mode, each Update-DR state causes data in the boundary scan register to be driven to the output pins. The Capture-DR state causes data from the input pins and internal states for the output pins to be loaded into the boundary shift register (similar to sample/preload). Data is shifted on the rising edge of TCK.

Optional Public Instructions

IDCODE (Optional in 1149.1, Required in 1532)

This instruction allows the chip ID code to be read.

USERCODE (Optional in 1149.1, Required in 1532)
INTEST
RUNBIST
ISC_ENABLE
ISC_DISABLE
ISC_PROGRAM
ISC_NOOP
ISC_READ
ISC_ERASE
ISC_DISCHARGE
ISC_PROGRAM_USERCODE
ISC_PROGRAM_DONE
ISC_ERASE_DONE
ISC_PROGRAM_SECURITY
ISC_READ_INFO
ISC_DATA_SHIFT
ISC_ADDRESS_SHIFT
ISC_INCREMENT
ISC_SETUP
PROBE (1149.4)
EXTEST_PULSE(1149.6)
EXTEST_TRAIN(1149.6)
Other

Additional part specific instructions can be defined. These can be used for things like built in self test, programming flash memory, and On Chip Debug.

Private Instructions

Presumably used for things like factory test. If a chip has, for example, a CPU, RAM, FLASH, FPGA, IO, and bus interconnect blocks, there may be separate or combined scan chains around each block. With a block, there may be internal cells used to further partition the logic into testable domains.

Registers

[Internal register multiplexing]

A JTAG device contains multiple registers or chains accessable via the TAP port. If the TAP controller is in the Shift-IR state, the instruction register will be used. In the Shift-DR, the register inserted into the chain will depend on the currently active instruction in the instruction register.

Standard Data Registers

In addition, to the instruction register, each device has more than one data register. At any given time, data is shifted though either the instruction register or one of the data registers depending on the state of the TAP controller and the last instruction loaded. Registers are shifted LSB first.

The boundary-scan register (required)

The boundary-scan register typically contains one or more bits for each logic pin on the device. More than one bit is needed to deal with output enables and direction controls for some pins. All device inputs must be observable and all device outputs controllable through the boundary scan register. The register may provide access to internal states associated with each pin.

The bypass Register (required)

This is a single bit wide shift register connecting TDI and TDO. It is selected in bypass mode and is used to minimize the number of bits that need to be shifted when accessing other chips. Why a one bit delay? Without it, you would end up with an analog delay in TDI proportional to the number of chips and under some conditions, TCK would get there before TDI would.

Device ID register (optional)

This is a 32 bit register that contains information that can be used to identify the specific chips in the chain.

User-Defined (optional)

Boundary scan cell types

[Basic boundary scan cell]

A basic boundary scan cell contains two flip flops and some multiplexors. Boundary scan cells are chained together to make a long shift and store type register with preload. The first mux and the two flip flops shown above make up a basic cell of your garden variety shift and store register. The first flip flop is part of the shift register and the second part of the store register. The last MUX selects between normal operation (PI -> PO) and the boundary scan system taking over control of the signal. Input signals (to the cells, may be internal or external signals depending on the cell function). 1 to 3 basic cells are tied together to make the cell used on each I/O pin. The SI and SO pins (shift in and shift out) of adjacent cells are tied together in a log daisy chain. The PI and PO pins contain "parallel" data to be captured in Capure-DR state or output in Update-DR state. Depending on the state of the Mode signal, PI may be passed directly through to PO (normal operating mode for the chip). The mux on the left selects between shift modes (Shift-DR state) and preload (Capture-DR state). The UpdateDR signal is clocked when the TAP controller reaches Update-DR state, latching the data from the shift register into the store register. The first flip flop is sometimes called "CAP" and the second "UPD".

"Cell" will be used somewhat interchangably here to refer to either the simple cell or the 1,2, or 3 cluster of simple cells associated with a pin. The stanandard neglects to give the two separate names.

Before you can understand the interaction between the boundary scan logic cells and the I/O buffers, you need to understand the nature of the I/O buffers. There are four basic types: output only, input only, tristate output, and bidirectional. We will focus on the bidirectional buffer the other types are degenerate forms of the bidirectional Open drain and open source signals are also degenerate forms, with one of the driver FETs missing. The bidirectional pin is not simply a magic buffer on an internal tristate bus. It is very hard to make a bidirectional buffer work without a direction signal provided (see the section on level translation) and attempts to do so will offer poor performance in a variety of circumstances. Tristate buses are rarely used internal to the chip and in many cases they are prohibited. An internal tristate bus can have problems with capacitance, fanout, and the internal drivers, being much smaller than external I/O buffers, may be damaged by bus contention. So tristate buses are typically replaced with some sort of multiplexor arrangement. Typically, a bidirectional signal on an IC pin connects to the internal logic as three separate signals: input, output, and output enable.

Each cell has a general cell type (corresponding to the section headings below and denoting its function) and also references a CELL_INFO structure ("BC_0", "BC_1", "BC_2", etc.), which gives some more info on its structure, from the STD_1149_1_2001 header file or a user defined cell type. CELL_INFO tells you what signals the first mux connects to in various states; see section B.10.2.2 in 1149.1-2001. The comments preceeding the cell definitions contain obscure references to figures in the standard, i.e. "f11-18" means "Figure 11-18".

Each pin may have more than one cell type associated with it. An input pin will require 1 cell, a simple output 1 or 2 cells, a tristate output 2 or 3 cells, and a bidirectional pin 2 or 3 cells.

The picture below gives a cluster of three cells for a bidirectional pin. Other forms of pins can be approximated by simply removing the unnecessary logic. Don't forget, though, that even if a pin is output only, you probably still want to be able to capture the state at the pin so when minimizing the logic for an output, you can eliminate the input connection to the internal logic but keep the cell (possibly discarding the second flip flop and mux). Note: this is a zoomable image; click on it. Cells may not necessarily be daisy chained in the order shown. The first mux may vary from the diagram. Possible input sources are PI (parallel in), PO (parallel out OR the pin itself), UPD (the output of the second flip flop), CAP (the output of the first flip flop), X (unknown state), ZERO, or ONE in addition to the SI signal and may vary depending on whether you are in EXTEST, SAMPLE, or INTEST.

[Three

The configuration shown in the diagram above is all you really need for both internal and external test if you are designing a chip or for basic conceptual understanding. However, other configurations are used and may reflect corner cutting or other arbitrary choices. General purpose boundary scan software will need to parse the BSDL file.

OBSERVE_ONLY: Observe only cell

Allows test system to sample the state on that pin but not assert state to either the pin or external logic. It is like the INPUT cell in the diagram above except: PO goes nowhere and the second flip flop and mux can be eliminated.

OUTPUT2: 2 state output

Allows test system to sample the value being driven onto the pin by internal logic (used for internal tests) or drive a value on to the pin (external tests). OUTPUT3: 3 state output (2 bit)

One bit is used for the signal and one for the enable line. Four states can be read (drive high, drive low, tristate high, tristate low) and "driven". Can sample the output and enable signals from internal logic or drive the output pin (with tristate).

INPUT: Input Only /H3>

Can sample the input pin or drive a value into the internal logic.

BIDIR_IN, BIDIR_OUT: Bidirectional

This is a method of cutting corners and reducing shift chain size. The input and output cells are replaced, with the aid of muxes, with a single cell. You can't read the actual logic state on the pin when it is being driven, so you can't test the driver itself, unless the driver output is coupled back to the input instead of the PO pin. This allowed by newer versions of 1149.1 though the BSDL becomes ambiguous since the pins are still identified as reading back "PO" (normally the input to the driver), not the actual pin output..

CLOCK: observe only for clock pin

CONTROL: output enable or in/out direction control

CONTROLR: control cell with test-logic-reset initiated set/reset

INTERNAL: associated with internal logic

Other combinations

An Enhanced Cell

Here I have sketched up a concept for an advanced cell configuration for improved testability. This is a zoomable image. This cell cluster has programmable pull-up and pull-down and an input comparator with programable threshold. There are some ways to minimize it. A couple cells, one output only and one input only, can be combined. Also, instead of a separate pull up and pull down, it is possible to have two enables to allow the main driver to be strong or weak.

[Three

Internal user scan chains for programable logic

With a trivial amount of extra logic, a programmable logic vendor can provide a way to create internal scan chains. On Spartan 3 and 3E FPGAs, the BSCAN_SPARTAN3 block provides up to two internal scan chains using the USER1 and USER2 instructions. A better design would have simply provided one mux input and provided access to the instruction register bits so that a large number of scan chains, that look like any other scan chains, could be created thus being able to fully simulate a JTAG enabled device except for the instructions having an offset to account for the built in chains. As it is, the user could use the USER1 instruction to select a chain and the USER2 to access it. If the internal JTAG chains are not used, the programmable logic must select the bypass register. Vertex II, Vertex-II Pro, and Virtex-4 families also have the USER1 and USER2 access, with the Virtex-4 providing up to four user chains.

A better way to implement this is to provide a single extra input on the TDO mux. If a fuse is not programmed, this input connects to the bypass register to conform 1149.1. If, however, the fuse is programmed, it connects to user logic. All undecoded instructions select the extra fuse input on the mux. This way, all unused opcodes select the bypass register, as required by 1149.1. All bits of the instruction register are brought out to the user logic, as are the capture and update strobes. With an 8 bit instruction register size, there would be a large number of unused instructions. For example, all instructions with the MSB=1, except 11111111 (EXTEST) could be used. User logic would provide a 1 bit alternative bypass register or provide a signal to the main mux that selects the bypass register for all undecoded instructions. User logic implements an expansion mux and strobe logic for as many registers as needed. Programable logic must clearly identify instructions which access internal factory test logic (this is normally done in the BSDL file, anyway).

Another method is to have a configuration fuse control a mux on TDO such that the user logic is inserted in the chain after the programmable logic on board TAP controller after the device is configured. This provides 100% emulation of an ASIC design's tap controller without a separate TAP port. However, it is likely to confuse the configuration software and it is a nuisance to recover from if your onchip TAP is defective.

One use for such an internal chain is to provide access to an internal wishbone bus via the normal JTAG pins.

Some CPLDs allow the TAP pins to be used as USER I/O pins with a logic level on one pin selecting the mode. Putting a jumper on the selection pin (or asserting it with a spare signal on the pod) can allow access to an internal user TAP or SPI circuit.

Limitations of PC Parallel Port JTAG

Some newer PCs do not come with a parallel port at all and USB to printer port adapters are generally not suitable for bit banging. A long, highly capacitive cable is likely to be needed to reach the system under test, which slows down the rate you can send data. Both the parallel port interface itself and the JTAG device are too slow for a modern processor to issue output instructions at full speed so you need a delay for each bit clocked (actually two delays). Since this delay is too short to use the system timer, you will need to use delay loops which will waste CPU time. Your JTAG code will consume close to 100% of the CPU time until preempted, at which point there will be a pause in the JTAG bitstream until your process gains control again, so the flow of JTAG data will be sporadic. It is possible to use advanced modes (ECP) of the parallel port (which is often disabled in the BIOS) to send data more efficiently but this would necessitate the use of a CPLD in the parallel pod which is not normally the case (Xilinx parallel cable IV appears to be an exception). Suppose you set your delay to be 1us and your process gets 30% of the available timeslots. Then you will be sending 500Khz bursts (you need to write to the port twice for each tck) one third of the time for an effective data rate of only 150Khz and wasting 30% of your CPU time. Even the Parallel cable IV can only do 2.5Mhz tck rate.

Boundary Scan Issues

When a JTAG device is driving signals through series resistors and the device on the other side doesn''t have JTAG, many tests will not detect shorts between adjacent resistor network pins. If you are actively driving your pins, the shorts will not overpower the driven outputs. Driving adjacent pins with test paterns a pin is tristated, however, may show that it follows the values of adjacent pins.

One type of patern is known as a wagner pattern. This basically involves assigning consequitive binary numbers to each net as a sequential test vector (STV). One bit from each STV makes up a parallel test vector (PTV). This pattern does a good job of detecting shorted nets with far fewer vectors than a walking ones or zeros test but not as good a job of isolating faults. A walking bit test requires of order N^2 tclk cycles, where N is the number of nets, while the wagner pattern is logarithmic. The wagner pattern is improved by sending out complements of the original vectors. The Parrallel Response Vectors (PRVs) are transformed into Sequential Response Vectors (SRVs); pins with the same SRV are probably shorted. The wagner pattern will detect some opens but additional open's testing is needed. In addition to the wagner test, an all 1s and all 0s PTV will help distinquish between stuck at 1 and stuck at 0 faults vs shorted nets. The assumption that either wagner patterns or walking bit patterns will detect shorted nets assumes that one driver wins.

"Algorithmic Extraction of BSDL from 1149.1-compliant Sample ICS", Ramond, et all, international Test Conference 1995, pg 561-568 explains how to extract BSDL from a sample chip (doesn't identify all the internal instructions).

Various faults in the JTAG chain itself can cause problems and some of these can be diagnosed.

Short circuit testing is normally performed first so you can power down a device quickly to minimize possibility of damage. The shorter the duration, the lower the proability of immediate or delated failure. Either the driver transistors or the bond wires may be damaged. In practice, digitial IC's often withstand more than you might expect. Investigation of Device Damage Due to Electrical Testing focuses on overvoltage spikes; likely to be more of an issue when you are testing interconnected boards.

Beware of test patterns thay may enable the output of an IC (such as a RAM OE) onto nets resulting in driver contention that may interfere with a test or cause damage.

intermediate-state shorts, where the result of shorting two nets together leads to indetermnate results are a problem for many logic families. Using a varying number of drivers on a net may tip the mid-state into a detectable region but may also increase the chances of driver damage (no worse than a short to GND/VCC). Tristate/O and Tristate/1 combinations instead of 0/1 may help detect but may also produce a lot of false positives. onTAP claims to be good at detecting midstate faults. The harder such a fault is to detect, the less likely it may produce imediate damage. If two 5V drivers shorted together produces a 2.5V level, that indicates that they are sharing the heating evenly; on the other hand, if a strong driver is competing with a weak driver, it is more likely to result in a determinate output. Shorts to power or ground, unless they are resistive, are likely to result in a determinate output .

More BSDL info

Pretty much all the useful information is in strings assigned to contstants or attributes. These strings can contain arrays of structures and other complicated forms that require parsing. These strings are split into pieces in potentially random ways and concatenated with "&". So, you can pretty much forget about a single stage parser. You have to parse the VHDLness to get the strings, and their constants, and then parse the contents of the strings.

The CELL_INFO array contains multiple records of type (a,b,c) where a is the context in which the cell is used (INPUT, OUTPUT2, OUTPUT3, INTERNAL, CONTROL, CONTROLR, CLOCK, BIDIR_IN, BIDIR_OUT, or OBSERVE_ONLY), b denotes the capture instruction (EXTEST, SAMPLE, or INTEST), and c denotes the data source captured (PI, PO, CAP, UPD, X, ZERO, or ONE (explained above)).

The BOUNDARY_REGISTER is a string (usually split over multiple lines and concatenated with "&"):


       cell_number "(" cell_type, port_id, function, safe_bit, [ccell disval rslt] "),"
     

PIN_MAP contains a mapping from signal names to pin numbers, separated by commas. "FOO:9" means PIN 9 is FOO. "Q:(1,2,3,4)" means that signal Q1 is on pin 1, Q2 is on pin 2, Q3 is on pin3, and Q4 is on pin4. Give or take a little. Q in this case is a bit_vector(1 to 4). If it was a bit_vector(0 to 3) this would probably be Q0 to Q4. The bit vector information comes from the "port" of the entity describing the part and this is one of the few places where you find information in actual VHDL.

cell number is usually sequential. function is one of INPUT, OUTPUT2, OUTPUT3, CONTROL, CONTROLR, INTERNAL, CLOCK, BIDIR, OBSERVE_ONLY. port_id is ???. safe_bit is one of 0,1 or X. The last three have to do with disabling a pin. They denote which cell number is disabled, what value (0 or 1) results in disabling, and what state results from disabling (Z, WEAK0, WEAK1, PULL0, PULL1, or KEEPER).

Probing an unknown chain

You should be able to connect a JTAG cable to any board and have it automatically determine:

However, there are a number of problems with this. While two bits of the instruction register are required to hold the value of "01" after a Capture-IR, the value of the other bits is undefined. Thus, you can't tell whether the 0 to 1 transition on TDO is an indication of the start/end of a new device or just garbage data. Also, the device identification register is optional. And there is no central database mapping device ID register values to actual part numbers. When the TAP is reset, it connects the 32 bit ID code register to the chain if it exists, otherwise the 1 bit bypass register. Since bypass is always initialized to 0 in capture IR and the least significant bit of the ID code is always 1, we can tell whether we have an identified or unidentified device. We still need to know the length of the instruction register. If the ID code matches a known device, we can get that info from the BSDL file (preferred) or the data sheet. We can do some exploring based on the fact that all 1s in the instruction register puts a device in bypass. Thus we can try to probe devices in the reverse order of the chain by sending out a lot of 1s after the instruction being tried. PERL Device::JTAG::PP is one example of this. By manipulating the TAP controller in such a way that we shift data through the chain but never latch the DR data into the device, we can measure the length of the chain for each instruction. If we sequence through instructions until we get a register length of 1 at a possible word boundary, we have probably found the BYPASS instruction (all 1s) and thus the instruction register size.

There are many reasons to probe an unknown chain. A generic program such as a debugger or serial flash loader may support specific chips and not care about other chips. It needs to know the configuration of the chain so it can put the other devices in bypass mode and add the appropriate number of dummy bits to the beginnng and end of the data stream. The inability to reliably probe an unknown chain is probably one primary reason a lot of microcontrollers need to be on their own separate chain to support debugging. Another reason may be a nuisance patent. Probing an unknown chain can also be used in a production environment to automatically select the appropriate test or download routines when a device is connected.

Automatic identification of boards which have the same chips on them is possible in a number of ways. You can change the order of the devices on the scan chain. You can use a common set of otherwise unused pins on a CPLD or other JTAG device which are hardwired on the PCB to a unique ID code. You can program an analog voltage on a JTAG ADC voltage monitor chip with a pair of resistors. Also, many JTAG programable logic devices have a user ID register that is programmed when the programmable logic is programmed; this is used to identify the unique firmware on an otherwise standard chip. Once a chip has been programmed once, using JTAG without auto ID or by programming the chip prior to part placement, the board can then be identified. These forms of auto ID can be used to automatically select the right firmware or guard against operator error during factory in circuit programming and test. They can also be used to identify boards installed into different slots in an embedded system.

SDIO

Note that it would be relatively simple to convert the SDIO slot on PDAs to a JTAG test controller, provided you have documentation on how to program the SD card port on your device.

Simulation of SN74BCT8244A boundary scan

Taken from the TI Scan Educator program. This is an 8 bit tristate bus driver with two enable pins. It contains an 18 cell boundary scan register with a two cell TI SCOPE specifc boundary control register, and a one cell bypass register.

Loading the sample/preload instruction


Initial state:
  TDI=0 TMS=1 TCK=0 (TDO=Z), test-logic-reset state

Now move from test-logic-reset to Shift-IR state with TMS={0,1,1,0,0)
  TDI=0 TMS=0 TCK=0 (TDO=Z)
  TDI=0 TMS=0 TCK=1 (TDO=Z)   
  (state now run-test/idle)
  TDI=0 TMS=0 TCK=0 (TDO=Z)

  TDI=0 TMS=1 TCK=0 (TDO=Z)
  TDI=0 TMS=1 TCK=1 (TDO=Z)
  (state now Select-DR-Scan)
  TDI=0 TMS=1 TCK=0 (TDO=Z)
  
  TDI=0 TMS=1 TCK=0 (TDO=Z)
  TDI=0 TMS=1 TCK=1 (TDO=Z)
  (state now Select-IR-Scan)
  TDI=0 TMS=1 TCK=0 (TDO=Z)

  TDI=0 TMS=0 TCK=0 (TDO=Z)
  TDI=0 TMS=0 TCK=1 (TDO=Z)    
  (status value captured into instruction shift register)
  (state now Capture-IR)
  TDI=0 TMS=0 TCK=0 (TDO=Z)

  TDI=0 TMS=0 TCK=0 (TDO=Z)
  TDI=0 TMS=0 TCK=1 (TDO=Z)
  (state now shift-IR)
  TDI=0 TMS=0 TCK=0 (TDO=1)

  SAMPLE/PRELOAD is 10000010  (shifted in LSB first)
  TMS=0 TDI={0,1,0,0,0,0,0}
  TMS=1 TDI={1}   // state transition on last bit






Parallel Port Pinouts

Note that this denotes the logical connections, not physical. Normally, there needs to be a buffer/level translator or at least a resistor. This table is concerned with software driver programming.

Parallel Pin Xilinx Parallel 3 Digilent JTAG3 Wiggler Clone
1 - Strobe
2 - Data 0 TDI drive (tristated by pin 5 high) Drives TDI, inverted Drives nSRST (inverted OC)
3 - Data 1 TCK drive (tristated by pin 5 high) Drives TCK, inverted Drive TMS
4 - Data 2 TMS (tristated by pin 5 high) Drives TMS, inverted Drives TCK
5 - Data 3 Tristate TCK, TDI, and TMS TDI drive
6 - Data 4 TDO drive (OC)
7 - Data 5
8 - Data 6 8,11,12 shorted
9 - Data 7 9,11,12 shorted
10 - Ack
11 - Busy 8,11,12 shorted 9,11,12 shorted Read TDO
12 - Paper Out 8,11,12 shorted 9,11,12 shorted
13 - Select TDO/DONE readback TDO readback, not inverted
14 - Auto Feed
15 - Error VCC Sense
16 - Init
17 - Select In
18-25 - Ground Ground (20,25) Ground(18)

Speed Issues

JTAG spec allows up to 25 megabit per second transfers. According to hardwarehacking.org, "With a parallel port cable, however, you will be lucky to achieve more than about 400,000 bits-per-second.". The open source windrvr replacement notes it is limited to about 200Khz on the parallel cables. Note that there is some confusion in speeds, as they may sometimes refer to the port update rate and sometimes the tck rate (half that).

Bit banging vs bit stream

An FT232RL chip would probably acheive about the same speed as a parallel cable. Speed could be doubled by running open loop. An FT2232C/D chip in MPSSE mode would probably get around 5.6 megabits per second.

Latency

Many poorly structured JTAG programs are written in such a way that they are very sensitive to latency and will slow down by many orders of magnitude when run over USB or IP. The debrick utility is an example of this. All I/O is done by calling clockin(), once for each bit. It reads the TDO line each time. USB 2.0 high speed is a high bandwidth link but it is also high latency. You get 1 I/O operation each millisecond (8 per millisecond in high speed). An efficient USB high speed pod could conceivably pump about 320 megabits per second of data (far more than most chips can handle) but writing two values to TCK and TDI and then reading TDO will take a millisecond or more, thus reducing the speed to under 1Khz (or 8khz for high speed). The solution to this is to delay parsing data which is read back. One approach is to pass a pointer to where to write the results and don't expect it to be updated immediately. Then write a bunch of data before reading the results. Even if you delay the read until you have written one word of flash, writing 3MB of flash will take 50 minutes. Higher performance over USB full speed could be acheived using templates stored on the microcontroller.

Sliding window

Use a sliding window approach to reading results. Thus you may send multiple vectors before you check the results of the first vector. Here is some oversimplified code:


   #define MAX_WINDOW 16
   int i;
   int response_count = 0;
   bit_vector_t stimulus_vector; 
   bit_vector_t response_vector[MAX_WINDOW];   
   for(i=0; i= (MAX_WINDOW-2) && ! vector_ready(response_count) {
       // spin
      }
      response_vector[i%MAX_WINDOW] = allocate_response_vector();
      generate_stimulus_vector(&stimulus_vector);
      send_vector(i, stimulus_vector, response_vector);
      if(vector_ready(response_count)) {
         check_response_vector(response_vector[response_count%MAX_WINDOW]);
         free_response_vector(response_vector[response_count%MAX_WINDOW]);
         response_count++;
      }
   }

Also, you can use feed forward to reduce latency issues. In this case, you send the data to be transfered, the data you expect back, and a mask indicating which data to ignore. This is also good to use as a primary programming API as the library can automate checking and it has more options as to what commands to send over the wire. It also lends itself to recording data files that can be played back later.

Compresion

Compression could allow higher JTAG speeds without needing a USB high speed. In flash programming, only 1 of the 4 address bytes changes 99% of the time. In typical sweeping 1 and 0 test vectors, only 1 or 2 bytes out of the entire vector needs to change. In most cases, the compression algorithm will use the last vector as a start. algorithm needs to be simple enough that it can be implemented on a typical microcontroller without slowing down to less than USB speeds. A very simple algorithm is to transmit 1 extra byte for each block of 8 bytes. The bits in that byte indicate which bytes change (and will be transmitted). Speed improvement up to 8x is possible, depending on the pattern. Another simple algorithm would be to send a offset as well as a length allowing a contiguous subset of bytes to be transmitted. Run length encoding could be used but that will have more cpu overhead.

Flash programming

Flash programming can involve a lot of JTAG traffic. In some cases, such as programming a non-jtag parallel flash chip by manipulating the pins on a CPU, many hundreds of bits may need to be sent over JTAG for every byte programmed so there can be a couple orders of magnitude of inefficiency. For JTAG flash devices, the overhead is probably less than an order of magnitude.

Xilinx CPLD/FPGA programming

Programming these devices appears to be very efficient as Xilinx appears to have set things up so that you just have to dump the entire bitstream into the TDI pin without interuption with a little setup and teardown before and after. This means that the number of JTAG operations is just a little more than the number of bits to be programmed.

Optoisolation

Some applications require opto-isolation. Getting fast enough isolators can be a problem. magnetic isolators may be preferable. An NVE IL261 ($9 digikey) is a 110Mbps part with 4 channels in one direction and 1 in the opposite. An ADUM1401CRWZ is similar (3 in one direction, 1 in the other) Note: consider an isolator that has inputs which line up with outputs and comes in a package in which low value resistors are also available.

Level Translation

The state of level translation ICs leaves a lot to be desired. Relying on pullup resistors can slow things down. 10K ohm pulling up 20pf gives a time constant of 200ns.

Test vectors

If you have 1000 pins, with 3 bits per pin, walking a 1 through each pin and reading the results would be about 3 million operations. Repeat for 0.

Shift instructions

Shift instructions should not be used to access bits. They are inefficient as they repeat the same calculation with the same results many times. For an 8 bit port:


   unsigned char keepmask;
   unsigned char tdimask;
   unsigned char tmsmask;
   unsigned char tdomask;
   unsigned char tckmask;
   unsigned char writexormask;
   unsigned char readxormask;

   value = inb(port) & keepmask;
   if(tdi) value |= tdimask;
   if(tms) value |= tmsmask;
   if(tck) value |= tckmask;
   value ^= writexormask;
   tdo = !! ((inb(port) ^ readxormask ) & tdomask);  
      // !! converts to 0 or 1 type boolean, as opposed to any bit set 
      (optional, depending on how used).
   outb(port, value);

   // Optional, if we are doing a full tck cycle.
   value ^= tckmask;
   outb(port, value);
  

Use a bootloader

Instead of trying to burn a flash using inefficient jtag sequences, download a tiny bootloader. The bootloader can load the file using USB, ethernet, or high speed TTL serial. On the ARM SAM7 cpus, there is a bootloader built into the chip, no jtag is even needed, though jtag could probably be used to trigger the bootloader without changing a jumper and cycling power. Some processors and FPGAs have an internally accessable register that is accessable via JTAG instructions. This can be used to communciate with a boot loader, and may work similarly (but faster) to having a serial port. A JTAG on chip debugger can be used to write to external flash memory locations much faster than boundary scan. And some chips have an onchip DMA mode (MIPS ejtag). A FPGA bootloader "program" can be used as well.

Interleaving sequences for multiple devices

Sequences for multiple devices can be interleaved but it can get tricky as one device may need a TAP controller state transition that corrupts the state of another device. Merging can save protocol overhead and it can take advantage of delays (such as during flash programming). External boundary scan for multiple devices can easily be merged; you shift the data into all devices simultaneously instead of sequentially loading each one. They are really parts of a larger test vector, anyway. Programming multiple similar flash devices with the same or different data should be possible if they use the same programming sequence but it gets more complicated if they use different sequences. Merging internal test data could become difficult. Simple vector sequences might be ok, but loading a new instruction, for example, might adversely affect the state of other devices. ISC_ILLEGAL_EXIT in the BSDL fileists instructions that will disrupt operation.

Ethernet

Ethernet pods can be useful in many situations. Ethernet provides 1500V of electrical isolation. Ethernet does not require the computer and the development system to be in the same room (though that is handy for debugging). Ethernet lets your download the software from your office while you are walking to the lab. An ethernet pod with an minimal LCD interface lets people on the production line program parts without a computer; the pod can communicate with a server which will serve images and tests. The pod can be server (with human interface) or a client. The daemon can interact with the user through the pod.

Protocol and API suggestions

These are my notes. I am considering implementing what you see written here. They may be

There is a need for a new well designed universal cross platform jtag suite (library, drivers, and applications) with a permissive license (NOT GPL) that can be used by any software, commercial or open source, with any pod, on any target, whether it is connected via parallel port, USB, or ethernet. Much of the existing software is infested with the GPL license, even at the library level. An orwellian copyleft license is focused on denying "bad" behavior (without regard to how many good activities are also denied) rather than allowing "good" behavior.

Just a few GPL incompatabilities:

A JTAG library needs to have a license that allows the code to be used in either a commercial pod or a permisively licensed one (and i don't give a damn whether or not it allows use in a GPLed pod), or otherwise embedded in a target system. And a JTAG front end client (command line or gui) needs to allow proprietary extensions (either from a vendor or an end user) and needs to be compatible with permissively licensed software. Even LGPL would be of somewhat limited value to the JTAG community.

I despise the GPL. Back when I started giving away code for free, free code was really free. Then Stallman came along an screwed it up. GPL tries to prevent people from doing "bad" things and it doesn't care who it hurts in the process and how many "good" things are prevented. Permissive licenses allow people to do "good" things and don't care if other people do "bad" things. The latter philosphy is vastly superior. GPL has little effect on psychopathic corporations but it can have an enormous effect on altruistic small businesses. A commercial library may cost a developer a $100. A GPL library can cost him his livelyhood. And those of us who are trying to give away as much as we can are far more affected by this than the freeloaders. The GPL "commons" is itself a freeloader. It steals from the public domain and permissive license commons but gives nothing back. If you want to make a real contribution to humanity, don't use GPL, use a permissive license (such as three clause BSD).

We all take from the creative commons. We should all try to give something back.

Protocol

Here are some thoughts on an API and protocol structure. The idea is to keep the pod code simple, though some pods may implement more functionality. Most vectors involve a lot of clocking TDI with TMS static with some TMS stuff at the beginning and end. The TMS stuff can be almost bitbanged, i.e. we send a byte down the wire for each bit clocked (as opposed to two bytes). So, a typical vector could consist of a few JTAGBITBANGs or individual state transitons for setup, a STREAMSERIAL or COMPRESSSERIAL for the vector, and at the API level, many of the functions will have a default implementation that calls other simpler functions if an unimplemented function is called. JTAGBITBANGs can optionally be used, with some loss of efficiency, to handle extra TDI bits and the beginning and end of a VECTOR (such as bypass bits) without realligning the data.

Lengths will usually be in bits; streams data will be padded to an even number of bytes; the extra bits will be ignored.

Note that this structure will work for dumb parallel port pods, serial pods, dumb USB pods, smart USB pods, SPI pods, ethernet pods, and other configurations. Code which calls this API can be linked with a stripped down version to embed in a target system for on the fly reprogramming (or field updates). A driver can be written to program a device through the on-chip-debug or serial debugger on any micro, though it will be very slow. The same function names and calling conventions can be used above and below the stream encoder/decoder allowing various functions to be used at various levels (including embedded). This API will work with drivers which are linked as shared libraries/DLL, kernel drivers, standalone programs (pipe stdin/stdout), or TCP/IP deamons. Anywhere you have either a procedure call interface or a binary stream capability. As a demonstration of this, a parallel port driver can be built such that it can be called from a pipe or TCP/IP daemon. In this way, all the non-microcontroller specific code can be debugged without the overhead of running on a target system.

The protocol functions could be divided into three files:

For operations that function on I/O ports, there should be various word typedefs. High level software will always use a 32 bit word, even for 8 bit ports, so that the same code can talk to 8, 16, and 32 bit micros.

API should allow concurrent access to multiple JTAG streams so that multiple boards can be tested together.

Stack

Multilayered stack (simplified)

Also, utility functions such as a BSDL parser, SVF/XSVF parser, chip database, etc.

A notation for low level JTAG bit twiddling

Some thoughts on a human readable syntax for jtag commands. What is shown is at a low level, without showing each manipulation of the clock line. This notation could be used for the ASCII level of the protocol above (or equivalent binary commands), for debugging output, and for manual input.


Setup pin assignments
SETUP PIN[0],OUTPUT,TMS
SETUP PIN[1],INPUT,TDO
SETUP PIN[2],OUTPUT,TDI
SETUP PIN[3],OUTPUT,TMS
These lines set a control line without wiggling clock
TDI=0
TDI=1
TMS=0
TMS=1
TCK=0
TCK=1
TRST=0
TRST=1
SRST=0
SRST=1
These next ones clock one or more bits of data on the TMS or TDO line, leaving the other line intact:
TMS{11111}          # JTAG Reset
    # equivalent to TMS=1 TCK=1 TCK=0 TMS=1 TCK=1 TCK=0 TMS=1 TCK=1 TCK=0 TMS=1 TCK=1 TCK=0 TMS=1 TCK=1 TCK=0
TDO{0101010101010101} # a typical vector for shift DR or shift IR modes
   # equivalent to TDO=0 TCK=1 TCK=0 TDO=1 TCK=1 TCK=0 TDO=0 TCK=1 TCK=0 ...
Thus:
TCK=0              # set TCK to zero, ready for next low to high transition
TDI=0              # Set TDI to known state
TMS{11111}    # Test-Logic-Reset
TMS{01100}         # Test-Logic-Reset -> Run-Test/Idle -> Select-DR-Scan -> Select IR Scan -> Capture-IR -> Shift-IR
TMS=0                  # stay in Shift-IR state  (Redundant)
TDI{00001111}     # shift in instruction 00001111
(response 00000001)
TMS{11100}        # Shift-IR ->Exit1-IR -> Update-IR ->Select-DR-Scan -> Capture-DR -> Exit1-DR
TDI{01010101010101010101010101010101}   # shift in data
(response 010001000100101010101010101010001)
TMS{11}           # shift-IR -> exit1-DR -> Update-DR 

Software Compatibility

Compatability with existing software can be accomplished by:

Use of network mode

The protocol described here could be used to communcate with an ethernet enabled pod. It could also be used to communicate with a server application running on top of a parallel or USB pod. Thus, you could for example download code from your development station while you walk to the lab. This mode could also assist software developers as you can test software using hardware located at a user's site. And since it is designed to be forgiving of latency, you can do this at a considerable distance. A USB over ethernet gateway could be used to debug some pod issues.

POD documentation

A JTAG pod needs the following level of technical documentation to be considered acceptable:

When you withhold important technical documentation, you are hurting your customers more than your competitors.

Database

We need to start a public domain database with:

Other ISP/On chip debug protocols

Those developing jtag pods may wish to support some of these.

Freescale BDM

This is an open collector one wire protocol used on HSC12, 9S12, and coldfire CPUs. BDM is documented. Timing is problematic for bit banged software implementations. A brief description (from memory): the speed of the protocol depends on the clock rate (PLL not XTAL) the cpu is using. At 16Mhz, the pod generates a 1 Mhz clock rate which is modulated to 25% or 75% duty cycle to send ones and zeros. When it is time for the processor to send a response, the pod generates pulses with 25% duty cycle and the processor selectively stretches them to 75%. There are timeouts that further complicate matters. The BDM pulse frequency is 1/16th of the CPU clock. When sending, you pull the line low for 3 or 4 cycles to send a 1 or 12 or 13 cycles to send a 0. The pin is sampled on the 10th cycle. You have up to 512 cycles after the start of the last pulse to send the next pulse before the CPU times out so the minimum frequency is about 31.250Khz at 16Mhz. The file S12BDMV4.pdf in the 9S12DP512 Zipped Datasheets documents one variant of this protocol. BDM normally uses a 2x3 pin (0.100") male header on the circuit board:


     BKGD  1 2  GND
           3 4  RESET
           5 6  VCC

Note that programming a FT232R in RS-232 serial mode might let you interact with BDM (with an external resistor) at some clock rates. Sending the character FF will send a 1 clock pulse, FE a two clock, FC, 3 clock, F8, 4 clock, etc. So setting the baud rate to 1/4 the cpu frequency and sending FF and FC characters (and receiving similar characters) might be a way to bit bang the data across.

AVR Debugwire and ISP

debugWIRE is a one pin (RESET) debugging protocol used on some ATMEL 8 bit AVR microcontrollers. It is disabled by default and must be enabled using ISP. Atmel does not publish the debugwire protocol (only the protocol used to talk to their debug pods).


  (TDO) MISO  1 2 VCC  (VTref)
  (TCK) SCK   3 4 MOSI  (TDI)
  (nSRST)RESET 5 6 GND   (GND)

Signal names in parenthesis are the line on the JTAG10PIN connector these signals are wired to when adapting a JTAGICE MKII to use ISP/debugwire on a 6 pin connector.

Many AVR micros can be reprogrammed using the SPI port while reset is held low. It appears that in ISP programming mode, the on chip SPI is in slave mode and thus the directions of the MOSI and MISO pins are reversed from application mode.Alternatively, RESET may be pulled up to 12V for high voltage programming (faster?). It is likely that these pins are used for other purposes on the board so you might have to fight with other drivers (hopefully with series resistors).

It appears that the ISP protocol is used for programming only and the debugwire protocol is used for debugging only.

Spider programmer page has schematics for AVR ISP

Microchips PIC protocol

This is apparently a nightmare as the specs are not released and they change drastically from chip to chip.

Dallas One wire protocol

This protocol is not an ISP protocol. It is normally used to communcate with sensors and other one wire devices.

Pods

Xilinx platform cable USB

Some folks have made modified code to run on the version of this cable that is built into some starter kits. Nobody seems to by the standalone cable, for obvious reasons. xup and the ixo usb_jtag projects are ones that have downloaded modified code. The code is pretty primative and slow.

Chips

This section lists some chips which can be used to make USB JTAG pods.

FTDI FT232RL

Dumb USB-to-serial adapter (full speed) with bit bang mode. Very low total parts count, just a few external caps and the connectors. Internal crystal. Internal USB resistors. Internal configuration eeprom. 1.5V-5V VCCIO. Lacks the multiprotocal serial engine of the FT232RL so you can't send 8 bits per byte and instead need to send two bytes per.

FTDI FT2232D

Dumb dual port USB-to-serial adapter (full speed) with bit bang mode. 16 times faster than the FT232RL. Needs external serial configuration eeprom, external crystal, and more caps than the FT232RL. 5V/3.3V IO so level converters are needed.

Atmel AT91SAM7XC256

USB full speed, ethernet MAC (requires external PHY), 48MIPS? ARM7 32 bit core, SPI, JTAG TAP You could probably acheive around 3.2Mhz bidirection or 6.4Mhz single directional (ignoring TDO) without compression (USB bus limited) and maybe faster with compression, depending on the operation. Using ethernet, should be able to achieve 25Mbps. By switching a jumper, you can activate an internal boot program that allows downloading by USB, among other sources. Requires around 32 external components including voltage regulator, crystal, decoupling caps, USB resistors, PLL filter, JTAG connector, JTAG pullup resistors, USB connector, and jumpers (TST, ERASE, JTAGSEL) to make a basic USB micro before you add the application specific stuff (level translation) assuming you want full download/debug capability via JTAG and USB. Additional components, including PHY and connector, required for ethernet.

AVR32 UC3A chips

32 bit CPU, USB on-the-go full speed, ethernet MAC, 512K flash, 80 Drystone MIPS, AVR32 core, SPI, peripherals similar to SAM7

Cypress EZ USB FX2LP CY7C64713, CY7C64714

USB full speed, 8051 core, lacks JTAG TAP, lacks on chip debug, 16K RAM, no flash (program has to be downloaded from PC every time unless you use external RAM, Internal USB endpoint FIFOs, I/O FIFO with state machine


Port GPIO Master    Slave    Other
---- ---- ------    -------  ---------
SCL                          (I2C)
SDA                          (I2C)
D+                           (USB)
D-                           (USB)
XTALIN                       (oscillator)
XTALOUT                      (oscillator)
RESET#                       
WAKEUP#
PE0/T0OUT
PE1/T1OUT
IFCLK                        external 5-48mhz FIFO Clock (or uses internal 30/48Mhz clock
CLKOUT                       8051 clock (osc PLL divided down)

?         RDY0      SLRD    read strobe for fifo in slave mode
?         RDY1      SLWR    write strobe for fifo in slave mode

?         CTL0      FLAGA
?         CTL1      FLAGB
?         CTL2      FLAGC

PA0  GPIO                    INT0# interrupt
PA1  GPIO                    INT1# interrupt
PA2  GPIO GPIO      SLOE     fifo handshake? (page 113)
PA3  GPIO GPIO      GPIO     WU2 Power mangement (page 89)
PA4  GPIO GPIO      FIFOADR0 fifo handshake?
PA5  GPIO GPIO      FIFOADR1 fifo handshake?
PA6  GPIO GPIO      PKTEND   fifo handshake?
PA7  GPIO GPIO      PA7/FLAGD/SLCS# fifo handshake?

PB0  GPIO FD[0]     FD0[0]   (fifo data)
PB1  GPIO FD[1]     FD0[1]   (fifo data)
PB2  GPIO FD[2]     FD0[2]   (fifo data)
PB3  GPIO FD[3]     FD0[3]   (fifo data)
PB4  GPIO FD[4]     FD0[4]   (fifo data)
PB5  GPIO FD[5]     FD0[5]   (fifo data)
PB6  GPIO FD[6]     FD0[6]   (fifo data)
PB7  GPIO FD[7]     FD0[7]   (fifo data)

PD0  GPIO FD[8]     FD0[8]   (fifo data)
PD1  GPIO FD[9]     FD0[9]   (fifo data)
PD2  GPIO FD[10]    FD0[10]  (fifo data)
PD3  GPIO FD[11]    FD0[11]  (fifo data)
PD4  GPIO FD[12]    FD0[12]  (fifo data)
PD5  GPIO FD[13]    FD0[13]  (fifo data)
PD6  GPIO FD[14]    FD0[14]  (fifo data)
PD7  GPIO FD[15]    FD0[15]  (fifo data)

100 and 128 pin packages provide two additional ports

Use T0OUT to drive IFCLK? Auto reload division up to divide by 255, otherwise manual reload needed. Prescaled by 4 or 12 giving 12 or 4 Mhz which is a bit slow for full speed jtag. Maybe we can program the GPIF state machine to make a higher clock from the 48Mhz clock? Chip lacks SPI which could be used for fast programmed I/O. Connecting SLWR to tck may be helpful in using pod to snoop a JTAG interface. May not have individual data direction control on the FIFO data lines, which is a real problem. Not sure we can use two fifos, one for input and one for output. Indeed the FIFO looks really limited for such applications. There is no SPI to allow data to be clocked out by hardware 8 bits at a time. You can see why people use a CPLD with this chip.

It looks like 25Mhz would be possible with this chip with an external CPLD to drive two internal FIFOs in slave mode. Without the CPLD, you aren't going to get anywhere close. Despite the fancy programable state machine, this chip is a dog. The fact that it uses an 8051 core was your first clue.

Cypress EZ USB FX1 CY7C68013a

Same as the FX2 only limited to USB full speed.

NXP LPC2888

(sampling) 1MB Flash, USB high speed, 64K RAM, ARM7TDMI 60Mhz, JTAG OCD but no JTAG boundary scan?,

MC9S12UF32

USB high speed, 32Kx8 flash, 3.5Kx8 RAM +1.5K for USB fifos, 64-LQFP or 100-LQFP, about $9 digikey, 8bit HCS12 CPU, similar chip with ethernet but no USB availible. No JTAG. BDM used for debugging. Controllers for SD, compact flash, IDE, etc. 3.3V/5V I/O. UART but no SPI, though the SD card port might support SPI. NOPE, SD controller does not support SPI though it may have the 4 bit equivalent. Operation up to 20Mhz. Queue controller for transfering data to peripherals. Does not appear to have ability to load firmware over USB. However, once the device has been programed once with a boot loader, this could be done (I have written such a loader for non-usb versions of this chip).

ST STM32F103VBT6

ARM Cortex 32 bit CPU, 128K Flash, 20K RAM, USB full speed, two ADC, seven timers, 2xI2C, 3xUAR, 2 SPI (18Mhz), CAN, no ethernet, LQFP100, JTAG, serial wire debug (SWD), $9 digikey, smaller versions available. Does not support flash programming via USB unless you have previously installed a bootloader via JTAG. Five 16 bit GPIO ports with haphazard pinout.

NXP LPC2888

ARM7TDMI microcontroller with 1MB flash, 64KB RAM, USB High Speed $12.64 digikey, no ethernet. Dual DAC, Dual ADC, No SPI? but has SD/MMC card interface. No PWM? Only available in BGA (3 rows of balls). Samples only. Webmaster needs to be taken out and shot: JAVA, FLASH, unauthorized opening of windows, content is a mess.

NXP LPC2468

Ethernet, USB full speed device/host OTG, dual ports. Only one port can be a device port at a time? but either can be a host port. OTG requires external OTG transceiver (eats 11 pins). $13.46 digikey, 208-LQFP. 512KB flash, 64KB RAM + 16KB RAM for ethernet. 2 PWM controllers with 6 each? RTC with 2KB SRAM 8 channel 10 bit ADC, 10 bit DAC, 4 timer/counters with 8 capture inputs and 10 compare outputs. 2 SSP ports, 1 shared with (legacy) SPI controller though both can function as SPI. 3 I2C. 4 UARTS, 64 GPIO, 4 UARTS. 2xCANbus. SD/MMC. -40 to +85C. LPC2478 is similar but has LCD interface. 5V tolerant inputs (may not be true on pins that function as A/D). JTAG OCD and boundary scan. Forcing P2[10]/EINT0 low during reset activates on chip bootloader. Three oscillators: Main Osc, RTC osc, and internal 4MHZ 1% RC osc (starts using internal osc which is used by bootloader). Can use 32Khz xtal instead of high frequency XTAL for main PLL. Bootloader appears to support RS-232 but not USB or ethernet. USB requires multiple of 48Mhz PLL frequency (tight tolerance) LCD function on LPC2478 may lose some USB OTG functions depending on the type of display used. UARTS are 16550 compatable with 16 byte FIFOs.

My Personal JTAG Projects

As time and resources allow, I am working on a ton of JTAG related projects. They are highly interdependent so they are being done in parallel.

Mini JTAG pod

I have designed a very simple compact FT232RL based JTAG adapter. 1.5? to 5V. No PCB yet. Probably be around 400kbps. May be USB flash drive sized.

FPGA based pod

The pod logic can be programmed into a FPGA for very high performance.

JTAG software suite

As described here

A bunch of boards with JTAG TAP ports. Micros, FPGA, CPLD, I/O boards, etc.

Level Translation

One of the big issues in making a JTAG pod is level translation. The state of level translation ICs leaves a LOT to be desired. You would think that there would be a lot of chips that you could apply 1.2-5V power and logic on port A and 1.2-5V (or 0V) power and logic on port B. Well, that isn't the case. Many level translators assume that one power bus will always be higher than the other. You also need to deal with hot plugging where signal lines might get connected before Vref. And you can have any combination of the POD and the target being powered up or down. TI has a voltage translator selection application note that is informative; it would be more useful if there were actually good parts to choose from. Pullup resistors should not be used to pull the output of a driver higher than its supply voltage. Also, many level translators don't work at 5V. Many level translators have output enable or direction signals but don't be surprised if the input is connected to the wrong supply voltage. Many 5V devices have TTL not CMOS levels, which must be taken into account when doing voltage translation. Very few translators go from 1.2 to 5V on either port, let alone both ports with either VCC higher. And good luck finding a suitable part that also has a second source.

Maxim has a level translation tips though not adequate for a serious pod.

A typical JTAG pod might have 3.3V logic and need to interface to 1.2 to 5V logic. This means the other side of the translator could have higher or lower supply voltage. If the POD uses 5V logic, conversion may be simpler.

The state of single directional voltage translation leaves enough to be desired; when dealing with bidirectional signals it gets worse. Bit programmable bidirectional signals, such as would be found on a GPIO port or on a JTAG pod that allows flexible pin assignments, are particularly problematic for level translation. It is one thing to translate a bidirectional signal when you have a direction signal to work with and another when the translator has to guess. Suppose you have a high level on a port A input. Is it high because the micro drove it high or is it high because the translator is driving it high because it thinks port B is being driven high. Voltage clamps with passive pullups don't have this problem but they have the usual issues with pullups.

Also consider whether you should have opto-isolators

Nuisance Patents

Nuisance patents can be a problem. A search for "JTAG" at the patent office lists 3136 patents. Many of those may have little to do with JTAG and simply cover a device that happens to include JTAG but there are probably a lot of patents that should never have been issued. All of the patents I have looked at were patently ridiculous. Consider, for example, #7265578, which covers programming a non-jtag SPI memory by manipulating the JTAG addressable pins on an FPGA. Obviously, the patent office didn't ask any test engineers if this was "Sufficiently obvious to anyone trained in the art" or they would have been rolling on the floor laughing. It is also extremely inefficient; if you think programming a parallel flash this way is slow, wait till you try doing an SPI flash.

Which JTAG cable should I get

In the short run, this may be determined by the software you want to use. If you are stuck with a proprietary app, you may have a limited choice of cables, none of which are satisfactory. Using proprietary vendor software, you are pretty much stuck with a separate pod for each brand of chip you use, which is completely unaccepatable. If you are using third party proprietary software, you may have a bit more choice. In the long term, you want a cable with schematics, protocol documentation, and preferably micro firmware and CPLD source where applicable. A pinout matching your board is also a consideration if you don't want to make an adapter or use flying leads.

Xilinx Impact software is proprietary trash. The only USB cable it supports doesn't work with other software. If you are using Xilinx Impact, you should probably use a parallel III compatible cable as these are the only ones supported by both Xilinx and open source software. Long term, look at ditching impact or just use it to generate .bit or .svf files. Impact uses the Jungo Windriver which not only causes lots of problems on linux, it is a gigantic security hole (it allows user space access to PCI devices, allowing an intruder to write to protected memory). An open source project has replaced the windriver, with some support from xilinx. But it only is able to emulate a parallel port cable so there are severe latency issues if you were to try to interface a USB cable that way.

Avoid supporting products without technical documentation as this is supporting reprehensible business practices and it will cause you all manner of trouble.

The digilent JTAG3 parallel cable is cheap and has schematics which can be downloaded. There is no micro, programmable logic, or protocol to need documenting. If your next computer doesn't have a parallel port, it will be useless. It does not support TRST or SRST. It supposedly works from 1.5 to 5V, although drive on TDO might be rather weak if you use a low voltage JTAG device. Compatible with a xilinx parallel cable and supported by most open source software.

The digilent USB cable is proprietary crap but the protocol has been reverse engineered.

The AVR ICE and AVR ICE MKII USB pods have documented protocols. There is also an inexpensive clone on ebay. Won't work with impact but may work with open source alternatives.

The Amontec JTAGkey and JTAGkey tiny are almost documented. No schematics (at least on the website) but they are based on the FT2232 chip, come with openocd driver source, etc. Won't work with impact but may work with open source alternatives depending on what you are trying to do.

The xilinx platform cable USB is a train wreck. All technical documentation is withheld. Don't buy it. If you got stuck with the embedded version on the Spartan 3E development board, there is a project that downloads replacement EZ-USB FX2 firmware and CPLD firmware to it so it can be used with other software though this is a skeleton implementation and is at least 4x slower than the vendor supplied firmware.

There are a variety of parallel wiggler or parallel cable III clones. Quality varies. Some are unbuffered.

ASIX PRESTO requires external level translators, lacks sufficient technical documentation online, but does work with openocd and supports a variety of devices.

Olimex ARM-USB-OCD and JTAG USB OCD Tiny (available from sparkfun in the US), works with open OCD. Unlike olimex development board products, schematics are lacking.

At the time of this writing, OpenOCD supports parallel port wigglers, FT2232 based devices, Amontec JAG accellerator, Amontec JTAGkey, Amontec JTAGkey tiny, Olimexx ARM-USB-OCD, everve signalyzer, ASIX presto, and USBJTAG.

Semiconductor Manufacturer's BSDL files

Ok, guys. There is a right way to distribute BSDL files. If you are a visitor who works for a semiconductor company, check if your site is in compliance. Access should be provided in each of three locations (and no login should be required):

BSDL files should be available as an individual file which can be hyperlinked to and as part of an archive. And please use a sensible extension, i.e. ".bsdl". ".txt" is a bad idea.

A lot of defective search engines on semicondoctor manufacturer sites that only return about 10 pages at a time.

A google search of the form "site:www.actel.com filetype:bsd OR filetype:bsdl OR filetype:bsm" may return useful results. google doesn't seem to handle filetype:zip well at all. "STD_1149_1_2001 OR STD_1149_1_1994 OR STD_1149_1_1990 OR STD_1532_2002" might help Or ' "use STD_1149_1_2001.all" OR "use STD_1149_1_1994.all" OR "use STD_1149_1_1990.all" OR "use STD_1532_2002.all"'.

Xilinx

Home Page. BSDL Files are distributed as Zip files for each family. Xilinx login required. Most Xilinx programable logic devices support JTAG.

Altera

Home Page. Individual BSDL Files can be downloaded in 1149.1 or 1532 format.

Atmel

Home Page. AVR BSDL files (.zip) BSDL files for ARM CPUs are in individual zip files containing one file for each package listed on the product info page. A search for BSDL on the site is not informative. 32 bit ARM and AVR32 CPUs and Programmable logic have JTAG. 8 bit AVR device vary.

Actel

Home Page. Individual BSDL Files can be downloaded. They appear to be outdated 1149.1 files and at least one was missing the required "use STD_1149..." statement. Power pins are included in pin map but not identified as such.

National Semiconductor

Home Page. General JTAG related stuff here. Includes SCANSTA111 3 port JTAG Multiplexer, SCANSTA112 7 port jtag multiplexor, SCANSTA476 1149.1 Analog Voltage monitor, and SCANSTA101 Embedded JTAG Master. BSDL files are availible as individual files for a couple dozen products in 1149.1 format. National Semiconductor SCANSTA111 is one example ($8.59 digikey) It hangs on a backplane with up to 121 slot addresses, and provides 3 local scan chains. Also provides 16 bit LFSR signature compactor. 3.0-3.6V operation in a 48 pin BGA or TSSOP package. 0 to 3 of the internal scan chains can be scanned. Can also be connected in a hierarchy. They neglected the issue of level translation, however. SCANSTA112 provides 7 ports (TQFP-100 or FBGA-100). National also has the SCANSTA476 which is a 8 input JTAG A/D, used to monitor voltage levels on PCBs ($5.35 digikey). The SCAN90CP02 is a LVDS crosspoint switch with JTAG. Check out the writings of Bob Pease.

AMD

Home Page. BSDL files for embedded devices BSDL files for ATI division chips seem to be missing entirely. And BSDL files for desktop CPUs, if they are there at all are hard to find. And they have a lousy search engine (only 10 results per page).

Intel

Home Page. Intel appears to list BSDL files on the product page. BSDL files for single chips are distributed in BSDL files and the bsdl files contain the rather unfortunate ".txt" extention. Worse, one I looked at (for an itanium) appeared to use an external BSDL cell definition file. In other words, their files are a mess and likely to require manual processing.

Freescale

Home Page. BSDL files are strewn about the web site. BSDL files for Power CPUs

Semtech

Home Page. Makes pin drivers up to 1GHz, some with internal DACs and window comparators.Interfacing issues for Semtech CMOS Pin Drivers.

TI/burr-brown

Home Page. BSDL files are availible for a small handfull of chips in individual zip files. One I looked at contained a single file in the zip archive with the extension ".bsm". 1840 matches for JTAG. Boundary Scan Logic devices. Includes Mulitdrop TAP transceivers, scan path linkers, etc. They sell a number of bus transveivers with JTAG access, but CPLDs are considerably cheaper. TI has a variety of level translators

Maxim/DALLAS

Home Page. BSDL files. One file per device, either .txt or .zip. Or bundled together in one zip file Has FET clamp with booster level translators, signal protector FET clamp, translators, and SPI translaors. Lots of matches for JTAG on their site. JTAG enabled T1/E1 chips. DS4550E is a 9bit JTAG/I2C I/O expander with $1.91 qty 74 digikey with 64 byte EEPROM; I/O pins power up in programmed state. TINI JTAG library mentioned elsewhere.

Analog Devices

Home Page. BSDL files on three web pages, individual .txt files. Level Translators up to 1.5Gbps Makes Pin drivers AD53040G ($36 each qty 1 digikey)

Linear Technologies

Home Page.

No BSDL files found.

Cypress

BSDL files strewn about the site. Search for BSDL returns 600 matches. Usually on a page titled chipnumber-BSDL

NXP/Philips/VLSI Tech

Home Page. Strewn about site. Generally on Products -> (category) -> Support -> Models -> SPICE, IBIS, and behavioral models

Intersil/Xicor/Harris

Apparently only a couple BSDL files available. Makes pin driver ICs.

Broadcom

Home Page. No search results for BSDL.

Conexant/Brooktree/Rockwell

Home Page. No search results for BSDL.

Via

Home Page. No search results for BSDL.

Fairchild

Home Page. BSDL files would be here if they existed (spice/IBIS models). Have two app notes mentioning BSDL files. Fairchild SCANPSC110F 3 port multidrop JTAG multiplexor ($11.31 digikey in 28SOIC).

Genesys Logic

Home Page. No search results for BSDL or JTAG.

Genesis Microchip

Home Page. Broken search engine.

Avago/Agilent/HP

Home Page. Found one BSDL file, as a PDF!!!!!

Hitachi

Home Page. No search results for BSDL.

Holtek

Home Page No search results for BSDL.

Hynix/Hyundai/LG

Worthless Home Page.

IBM Microelectronics

Home Page. 79 matches for BSDL. No master page.

IDT

Home Page. 5230 matches for BSDL. No organization immediately apparent. Looks like files are named partnumber.bsdl.txt inside a zipfile for each part.

Infineon/Siemens

Home Page 52 matches for BSDL. No apparent organization. Not even listed on product page.

Micronas/ITT

Lattice

Home Page. BSDL files by family. Login required.

LSI Logic

Home Page. Broken search engine. VC1053 is a 1 to 2.5Gbps SERDES with JTAG intended for Fiber Channel, iSCSI, InfiniBand, and gigabit ethernet. 316 pin BGA. No datasheet??? I wouldn't even mention it if multiprotocol SERDES chips weren''t hard to find.

Luminary Micro

Home Page. BSDL files found on individual product pages.

Micrel

Home Page No matches for BSDL.

Microchip

Home Page No significant matches for BSDL. A couple PDFs mentioned BSDL files.

Micron

Home Page. Only one component (two voltages) has BSDL files.

Zarlink/Mitel

Home Page 141 disorganized hits for BSDL. One page links to a number of VSDL files for cesop family. Some parts have BSDL files listed on individual part page.

Mitsubishi

Home Page. No search results for BSDL.

Mosel Vitelic

Content free Home Page.

NEC

Home Page. 175 matches for BSDL mostly links to individual files with extension bsdl. They are all contained in the directory http://www.necel.com/memory/en/model/models/bsdl/ but you can't go directly to that page. The models page links to other pages that list BSDL files.

OKI

Main home page is defective (unusable) but the English Home Page works. No search results for BSDL.

ON Semiconductor

Home Page No search results for BSDL.

OPTi

seless home page.

PLX Technology

Home Page. 56 matches for BSDL. Looks like BSDL files only availible from product page and then only if you are logged in, and in some cases have signed an NDA. You can't even read the datasheets without login.

PMC Sierra

BSDL files QuadPHY 1.2-3.2Gbit/s SERDES with JTAG supporting 10Gb ethernet, Fiber channel, PCI Expres, Infiniband, Serial Rapid I/O, OC-48, OBSAO RP3, etc. A few nonstandard JTAG TAP pins. Apparently doesn't support SATA or SAS. Product preview.

Quick Logic

Home Page. Search for BSDL only matched a couple family data sheets. FPGAs. Development tool runs under linux but is not free and doesn't include VHDL or Verilog.

Rabbit Semiconductor

Home Page. No matches for BSDL.

Renesas

Home Page. 17 matches for BSDL. They link to download pages for particular chips (that don't even link to the chips themselves). No download without login.

Samsung

Home Page. No matches for BSDL.

SST

ST Microelectronics

Home Page Only matches for BSDL were for their FlashLink Programming Cable What claims to be a "source file" for that programmer is not, it is a windoze executable.

SiS

Home Page. No matches for BSDL or JTAG.

Sipex/Exar

Home Page. No matches for BSDL or JTAG.

TDK

Home Page. Passives only.

Toshiba

Home Page. No matches for BSDL. 9 Matches for JTAG.

ZMD

Home Page. No matches for BSDL or JTAG.

Zoran

Home Page. No matches for BSDL, 6 for JTAG. Looks like application specific printer SoCs with ARM CPU.

Zetex

Home Page. No matches for BSDL or JTAG.

Yamaha

Home Page No matches for BSDL, 1 for JTAG (an IEEE1394 chip).

Digikey Parts

List does not include many micros, programable logic, pods, or development boards with a few exceptions