JTAG is a serial protocol, similar to SPI in some respects, that is used for boundary scan testing, in circuit emulation, and flash programming. It is standardized in IEEE 1149.1-1990. In boundary scan mode, all the I/O pins on all JTAG devices on a board may be connected together in one giant shift register with one or more bits per pin. There are control registers and ways to bypass the shift register for individual devices (replacing them with a one bit register).
The picture above shows a slightly ficticious chip with a simple 8 bit D-flip flop register with clock and output enable controls with a JTAG TAP (Test Access Port) controller and 18 boundary cells, one on each I/O pin, added. The boundary cells are connected together in a big daisy chain loop. I show only the data path in that loop, there are a number of control signals as well. Individual chips are then connected together in a longer loop. I say the chip is ficticious but there are similar parts: TI SN74BCT8374A is almost identical. The amount of test logic on such a simple device far exceeds the amount of operational logic.
JTAG is not an open standard in the sense that internet RFCs and USB are and the way all standards should be, where you can download the standard documents for free. Instead, they fund the standards process by selling copies of the standards documents. It would cost you around $400 to purchase a copy of the standards plus possibly a couple hundred more for books to explain them. This is prohibitively expensive for hobbyists, open source developers, some small businesses, and lots of other people who can't justify forking over that kind of money to buy a standard that they don't know if it will help them until they have read it. Add $100-$400 or so for a USB JTAG pod or cable . And fancy commercial software can cost close to $3000 a copy. There is some free software, with limited functionality, no-frills parallel cables can be had for around $20, and there is almost enough information on the net to do some serious stuff with JTAG, though you can spend weeks finding it. This page will help you in your quest.
A long time ago, I worked for a company specializing in built-in test. I wrote the software that analyzed the reliability of synthetic aperature radar systems, software that determined in what order to perform tests, built in diagnostics software for sonar systems, and the hardware device that actually injected simulated faults into the system. (I did other things, as well: I wrote the billing software, the timecard system, the purchase order system, adminstered the network and the PBX, and reverse engineered the proprietary network stack (including the firmware on the NIC card)) so I could write network utilities. We were among those who pushed for inclusion of boundary scan in ICs and the creation of the JTAG standard in the first place and one of my coworkers was on the standards committee. Unfortunately, I didn't get to work with JTAG at the time because the standard didn't exist and there weren't actually any parts that implemented boundary scan. 25 years later, JTAG is availible on far fewer chips than it should be. It is found on most FPGAs, CPLDs, better microcontrollers and processors, and some peripherals. It still isn't available on gates and most other discrete logic parts; there are a few overpriced registers and bus transceivers with JTAG, similar to the one pictured above. But it is several times cheaper to replace those parts with a CPLD and instead of needing glue logic to drive the logic chip, you can reprogram the CPLD to be compatible with your signals and maybe suck in some extra glue logic as well. Now that some of the programable logic vendors provide free (as in beer) software, small logic parts are mostly obsolete. And to take best advantage of JTAG for testability, you will want to use programable logic and microcontrollers instead of discrete logic. And JTAG is widely used, beyond its original intended purpose, for programming those microcontrollers and programmable logic devices.
Here are things JTAG can do or potentially do:
Attempts to use JTAG in the ways described above can be hampered by:
A supplement to 1149.1-1990 defined the
boundary scan description language (BSDL). BSDL is a subset of VHDL.
IEEE 1532 extended BSDL
to have information on flash/FPGA/CPLD programming. Another important
improvement in 1532 is list which maps pin numbers to pin names.
Note that for many programable logic devices, the BSDL files
distributed by the vendor describe the device before configuration
but the configuration may change circuitry between the boundary scan
registers and the pins. Xilinx has a
bsdlanno
program
to generate a BSDL file for a specific design that describes the device
post-configuration. ur
Xilinx has no-cost with source software, called
J Drive, that reads 1532 files and programs devices. (registration required). Altera's equivalent is JAM
The PIN_MAP_STRING tends to be rather haphazardly split into lines with the string concatenation operator "&"..
BSDL was defined in an IEEE supplement and extended in IEEE STD 1532. Kenneth P. Parker and Stig Oresjo at HP (now Agilent) distributed the source to what was probably the original BSDL parser. It can also be found as a listing in an appendix of the Boundary Scan Handbook containing a copy of the paper A language for describing boundary scan devices (1991). The license is a little vague but appears permissive in intent. The language has changed since then. IEEE 1149.1-2001 lists some changes and IEEE 1532 lists some additional changes.
BSDL files include standard packages written in VHDL, depending on the version of BSDL used: Package STD_1149_1_2001 also found in an Actel BSDL format application note , Package STD_1149_1_1994 also available as a chapter from the Boundary Scan Handbook, 2nd ed: Analog and Digital for $25, and Package STD_1149_1_1990 also in appendix B of the first edition of The Boundary Scan Handbook. Package STD_1532_2002. The license on those files is unclear. They seem to be used as if there were no restrictions on copying and the standard pretty much says though shalt use exactly this text which could be taken as an implied license.Mentor graphics has published a paper on ABSDL, an analog extension to BSDL for mixed-signal test. Optional package declaration is "use STD_1149-4_version.all" where "version" refers to an updated version of the standard that hasn't been released yet.
HSDL is a BSDL derived syntax for defining board level details.
SVF (ascii) and XSVF (binary) are files created by xilinx tools that contain JTAG instructions for how to program a particular device. "player" software source code is available (see links below). They are pretty limited, they let you do SHIFT-IR and SHIFT-DR and specify the data to be transfered and the response to expect and specify fixed bit strings to send and match before and after but they don't let you get creative with other TAP controller states which you might need in a boundary scan application. JAM evolved into STAPL which is a JEDEC standard, available from JEDEC as JESD71 (direct linking not permitted). It is an acronym for Standard Test And Programming Language and is designed for programming and testing devices over JTAG. It also supports devices with human interfaces. It was designed in such a way, though that it is not practical for devices with limited memory. There is also apparently a JEDEC standard transfer format, that isn't very popular.
STV is an acronym for Sequenctial Test Vector and SRV is an acronym for Sequential Response Vector.
Links to the actual standard documents can be found in the books section, below.
There are extensions to JTAG that allow signals to be routed to and from two analog test pins. The National STA400EP is an analog mux with analog boundary scan built in but it appears it is being discontinued.
The two internal analog buses, AB1 and AB2, are connected by a 2x2 crosspoint switch to the two test pins. AT1 and AT2; switching pins is used for calibration?. Switch impedance is not necessarily low. "4 wire" measurements of an external resistor are done by taking two measurements, one on each side. 30 node limit per internal bus (typical practical limit), can be increased to 900 by two level hierarchy. Parts with incompatable voltage ranges may need a separate bus or perhaps a voltage clamp.
Work is afoot to provide faster TAP access to internal logic by providing a higher speed TAP based on SERDES (such as used by gigabit ethernet, PCI express, SATA, etc.). This is oriented more towards factory test rather than board level test. IEEE P1687.
TAP Connector connects directly to the JTAG pins on a single chip. A lot of microcontrollers require this since they also use the TAP for a non-jtag compliant On-Chip debug mode.
Multiple Chips are connected with TDO of one chip driving TDI of the next chip. TAP TDI connects to first chip and TAP TDO connects to last chip. TMS and TCK are connected in parallel to each device.
Another configuration is to have a completely separate chain for a microcontroller. This is because the microcontroller often supports on-chip debug via the JTAG pins but it may use the port in non-standard ways or the software may be confused by other chips on the bus. The microcontroller has a pin that selects whether the TAP port is in JTAG mode or OCD mode.
TAP port has separate TDI and TDO pins for each chip but TMS and TCK are shared. Allows faster access to each chip.
Similar to above but more than one chip on each TDI/TDO pair.
There are devices that provide access to multiple scan chains. These typically provide two levels of multiplexing. Second, a number of multdrop chips are wired in parallel on the backplane. Second, each device typically provides a number of TAP ports that can be switched into the chain. See National, Fairchild, and TI below. An article at embedded.com on scaling JTAG has more info on multi-drop multiplexors. TI uses a "linking shadow protocol" in which data on TDI in certain states other than Shift-IR/Data-IR is used to control the mux. Some of the others basically appear in the JTAG chain and control data is shifted into the instruction register. Until a device on the backplane bus has been selected, TDO is tristate. The linking shadow protocol is less consistent, perhaps, with JTAG philosophy but it could be advantagous when using software that doesn't understand port multiplexors as a utility program can be used to configure the multiplexor(s) and then the presence of the multiplexor is invisible to the software (provided it does not wiggle TDI in states where it is inappropriate to do so). Some multiplexors also have the ability for one of the slave ports to take over a master. This allows an embedded CPU to perform built in self test or reconfiguration while still allowing external test. Many of the multiplexors also allow the JTAG slave ports to be tristated allowing an external JTAG tester to be connected. This could be used, for example, to perform on chip debug on a microcontroller that normally appears in the JTAG chain.ses
In the descriptions, pins will be described as being connected to pins on a PC parallel port; this involves a buffer or at least a resistor, and often both. All connectors are 0.025" square pin headers with 0.1" spacing unless otherwise noted.
This pinout is used on the Spartan3, Spartan3E, and CPLD development kits sold by Xilinx (and designed by digilent). It is the pinout used by the Digilent (software compatible with Parallel III but different pinout) . Schematic for the JTAG3 can be found in figure A-9 (page 66) of Xilinx UG13- Spartan-3 Starter Kit Board User Guide (Now deleted). This cable claims to work from 1.5 to 5V but at lower voltages the driver running off the JTAG VDD may have trouble driving a logic 1 on the buffered TDO line to the PC and probably relies on a pullup resistor on the parallel port. Uses NL37WZ17 to drive signals (100 ohm series resistor) and NL17SZ126 to read back TDO.
TMS 1
TDI 2
TDO 3
TCK 4
GND 5
VDD 6
The schematic for the Parallel Cable III can be found
at
here
. It has another JTAG pinout:
JTAG Parallel port
---- ------------------------
VCC 1 15 (vcc sense)
GND 2
(key) 3
TCK 4 3 (tristated by pin 5 high)
(key) 5
TDO 6 pin 6 drives ???? (open collector), read back on pin 13
TDI 7 pin 2 drives (tristated by pin 5 high)
(key) 8
TMS 9 4 (tristated by pin 5 high)
Pins 8 (D6), 11 (BUSY), and 12 (PE) are jumpered together on the parallel port and cable detect probably comes from this and by wiggling pin 6 and seeing if pin 13 responds.
The Parallel Cable III also has another header used for non-jtag modes. Some of the features like driving TDO and tristate enables probably have more to do with the other connector.
VCC 1
GND 2
CCLK 3
(key) 4
(key) 5
D/P 6
DIN 7
PROG 8
(key) 9
Note that the two connectors are side by side, effectively making a 2x9 header and that numerous pins have been removed for keying purposes. The Parallel Cable III comes with flying leads http:
2mm connector
JTAG Slave Serial SPI
--- ------------ ----
GND 1 2 Vref Vref Vref
GND 3 4 TMS PROG SS
GND 5 6 TCK CCLK SCK
GND 7 8 TDO DONE MISO
GND 9 10 TDI DIN MOSI
GND 11 12 NC NC NC
GND 13 13 NC INIT NC
"Pin 1 is not a true digital ground. It must be connected to digital ground at the target system."
Note that this cable has logic on board with an internal 40Mhz clock and can send data at up to 5MHz, it is not a simple buffer. It can also emulate a Parallel Cable III.
(From Xilinx DS097)
Old text: The Spartan 3 board has another JTAG header for use with the Xilinx Parallel Cable IV and MultiPro Desktop Tool. This pinout for this 2mm connector is:
JTAG Serial
GND 1 2 VCCAUX Vref
GND 3 4 TMS PROG
GND 5 6 TCK CCLK
GND 7 8 TDO-A DONE
GND 9 10 TDI CCLK
GND 11 12 (NC) (NC)
GND 13 14 (NC) INIT
TCK 1 2 GND
TDO 3 4 ND
TMS 5 6 N/C
N/C 7 8 N/C
TDI 9 10 GND
Note that this pinout is A subset of the JTAG10PIN pinout used on atmel AVR micros.
VTref 1 2 Vsupply
nTRST 3 4 GND
TDI 5 6 GND
TMS 7 8 GND
TCK 9 10 GND
RTCK 11 12 GND
TDO 13 14 GND
nSRST 15 16 GND
DBGRQ 17 18 GND
DBGACK 19 20 GND
All ground pins must be connected. DBGRQ and DBGACK are rarely used; they connect to lines on the ARM core that are rarely brought out to physical pins and are manipulated via other means (as bits in the jtag chain?). nSRST is the system reset (active low) and it is optional but recommended as Multi-ICE and RealView ICE uses is for debug of XScale processors, reset on startup, reset from the debugger, and reset detection and is needed for Auto Configuration on some processors. nTRST is optional, it resets the JTAG TAP controller state mahine and the embedded ICE logic. Resetting of the TAP controller can be accomplished by 5 TCK cycles with TMS high> AT91SAM7 CPUs don't bring nTRST to a pin. OpenOCD can be configured to work with or without nTRST and nSRST. Both reset signals are open drain. RTCK is an optional readback on the TCK pin and is used for adaptive clocking. Vsupply is used to power the JTAG dongle and VTref is used to set the thresholds. ARM's official jtag interfaces put a 10K pulldown resistor on these lines. ARM recommends 10K pullups/pulldowns with pullups on TMS, TDI, TDO, nSRST, and nTRST and a pulldown (odd) on TCK. Atmel's AT91SAM7 eval board uses pullup on TCK. Multi-ICE, RealView ICE, and OpenOCD support other devices on the JTAG chain, however manual configuration may be required. Multi-ICE uses analog switches (between ground and VTref) to drive the JTAG signals with 100 ohm series resistors.
TRST 1 2 TDO
RTCK 3 4 TDI
GND 5 6 TCK
GND 7 8 TMS
+3V 9 10 RESET
Altera makes a cable called the USB blaster. Someone made a USB jtag with a EZ-USB FX2 partly emulating at FT245 that switched to using USB blaster protocol or Xilinx USB cable?, works with openwince and openocd. Looks like this software is actually running on the xilinx platform cable usb on the start kit. So this would have info on altera's protocol.
some protocol reverse engineering. IXO usb_jtag Byteblaster clone.
TMS 1 2 nTRST
TDI 3 4 GND
Vcc 5 6 N/C
TDO 7 8 Vcc
TCK 9 10 GND
TCK_ret 11 12 GND
EMU0 13 14 EMU1
TDO 1 2 VCC_IN
TDI 3 4 VCC_OUT
TMS 5 6 NC
TCK 7 8 TEST/VPP
GND 9 10 NC
RST/NMI 11 12 NC
NC 13 14 NC
Multi-ICE (Amontec Jtagkey):
VREF 1 2 VREF
TRST_N 3 4 GND
TDI 5 6 GND
TMS 7 8 GND
TCK 9 10 GND
n/c 11 12 GND
TDO 13 14 GND
SRST_N 15 16 GND
n/c 17 18 GND
n/c 19 20 GND
(0.1" pitch)
JTAGKey to FTDIchip pin map (channel A): ADBUS0 TCK ADBUS1 TDI ADBUS2 TDO ADBUS3 TMS ADBUS4 JTAG output enable ADBUS5 VREF sense, negated (vref>1.26V?) ADBUS6 nSRST IN ADBUS7 ACBUS0 nTRST ACBUS1 nSRST ACBUS2 nTRST driver enable ACBUS3 nSRST driver enableAmontec also makes a smaller, less expensive version JTAGKey-Tiny, that plugs right into ARM 20 pin headers but is more difficult if you need to connect This pinout has been reported to be used on some other arm systems:
(VTref) Vcc 1 2 GND
nTRST 3 4 GND
TDI 5 6 GND
TMS 7 8 GND
TCK 9 10 GND
TDO 11 12 N/C (nSRST)
Vcc 13 14 GND
There are apparently two different incompatible 14 pin pinouts, one for Embedded ICE and one for TI. Embedded ICE shown above, the TI below:
TMS 1 2 nTRST
TDI 3 4 GND
VCC 5 6 N/C
TDO 7 8 GND
RTCK 9 10 GND
TCK 11 12 GND
EMU0 13 14 EMU1
Many MIPS processor based systems use the EJTAG pinout.
nTRST 1 2 GND
TDI 3 4 GND
TDO 5 6 GND
TMS 7 8 GND
TCK 9 10 GND
nSRST 11 12 (key)
DINT 13 14 VCC
EJTAG refers to MIPS JTAG extensions for on chip debugging. "EJTAG provides run control, breakpoints on both data and instructions, real-time Program Counter trace". On some chips it provides direct memory access as well. EJTAG at hardwarehacking.com
Note that some appear to use the standard ARM 20 pin, others use a 12 or 14 pin connector (looks MIPS EJTAG style).
nTRST 1 2 GND
TDI 3 4 GND
TDO 5 6 GND
TMS 7 8 GND
TCK 9 10 GND
nSRST 11 12 GND
nTRST 1 2 GND
TDI 3 4 GND
TDO 5 6 GND
TMS 7 8 GND
TCK 9 10 GND
nSRST 11 12 n/a
n/a 13 14 Vcc
The 14 pin connector is based on the MIPS EJTAG standard.
Notice there is no VDD on the 12 pin so this must be picked up somewhere if you are using a buffered cable. Also, the Hairydairymaid/lightbulb linksys de-brick utility fails with a buffered cable (or even an unbuffered cable with all pins connected) as it fails to set the nTRST line high (fixed now?).
TCK 1 2 GND
TDO 3 4 VTref
TMS 5 6 nSRST
7 8 (nTRST)
TDI 9 10 GND
Some AVR controllers may have only debugwire (see below)
Elmicro AVR-JTAG-USB (EUR$45) also uses this pinout; FT2232 based?
Flying leads can be used to connect to different jtag connectors without having a bunch of adapters. Amp-modu or c-grid contacts can be used to mate with 0.100" contacts to add flying leads. They are a pain to connect and disconnect, though. A milmax SSQ style connector (long tail version) has female contacts and a wirewrap (0.100") posts; flying leads can be connected to such a connector to ease connecting and disconnecting. The part number for a 2x10 would probably be SSQ-110-03-G-D (gold tail) or SSQ-110-03-S-D (tin); available at digikey. If anyone knows where to get packages of precrimped on wires (both ends) bare amp-modu or c-grid contacts, let me know; they are very handy for making custom JTAG cables and many other custom cables using these upiquitous connectors. With insulation (heat shrink) suitable for use as flying leads, they are available at schmartboard ($10/5 or $38/100); seetron.com, linkins4.com, and superdroidrobots.com, and digikey (924964-ND) sell similar but are more expensive.
PCI/PCI express slots have some pins set aside so a board can be inserted and get access to the motherboard JTAG chain. Apparently some motherboards don't connect it. And most POST display diagnostic cards aren't smart enough to bring the JTAG out to a header; I have yet to find one that does. Your best bet might be to tack solder some wires on a PCI card. Even www.uxd.com which makes a wide variety of POST cards, some with RS-232 serial output and one with I2C and even cards which self boot. SMBus may be accessable as well?
A JTAG Test Access Port (TAP) consists of 4 signals, with an optional fifth signal:
Most JTAG interface connectors have a Vcc/Vdd/Vref pin that is used to adjust the level translators to the appropriate voltage level for the system. In addition, many JTAG interfaces have a system reset line; you may want to sense the state of that line to signal a debugger when a chip has been reset. You may also want to measure the presence of Vref. Since some systems use the JTAG pins for other purposes as well, you may want to tristate the outputs. The voltage level varies depending on the technology being used. Multiple chips (or even boards) with JTAG support can be chained together with the TDO of each chip connected to the TDI of the next. The remaining signals are connected in parallel. Timing is based on the TCK pin so a PC parallel port (with appropriate voltage translation if needed) can be used as a controller. TDI/TMS data is latch on the rising edge of TCK. TDI/TDO/TMS typically change on the falling edge of TCK. You could probably do a full synchronous implementation where the data changes on the rising edge but you would have to be carefull with cable skew and hold types.
Note that the idea of having a TRST line may be less so the pod can reset the TAP controller, which is unnecessary since 5 TCKs with TMS high will accomplish the same result, and more so the board design can pull TRST low through a low value pull-down resistor which the POD can override. Thus, it may be necessary to implement TRST even if you have no need to use it.
In addition, some microcontrollers have a trace port (called ETM on ARM7TDMI). Since the internal address and databus are not usually brought out to pins the trace port can be used to monitor the flow of program execution. On the ARM7TDMI, this is about 8 signals clocked out at the CPU clock rate. By decoding those signals, you can tell when the next instruction in sequence has been executed and when a branch has been taken. A debugger with access to a copy of the code loaded into the micro, a which thus knows the size of each instruction and the destination of each branch can map the flow. Interrupts, however, could complicate this. On ARM7TDMI, a low on RTCK during reset may enable the ETM port. The ETM is configured through the JTAG interface. Use of ETM involves giving up some I/O lines.
Each JTAG enabled chip has a TAP controller with a state machine with 16 states. The state of the TMS pin at each TCK pulse dictates the transitions between states. These states help control whether you are shifting through the instruction register or one of the data register and control the capture of data into the shift registers and latching of data from them.
Here is a diagram of the TAP controller state machine, produced using graphviz (requires browser/plugin SVG support):
Each ellipse denotes a state and each arrow denotes a transition from one state to another. There are two transitions from each state, one taken when TMS=0 and one taken when TMS=1. The shift IR and Shift DR states are emphasized because that is where you will spend the most time (other than when idle). These are the states where data is shifted in on TDI and out on TDO and the machine will stay in those states as long as TMS=0. The Exit* and Pause* states are not terribly interesting. For either the data or instruction registers, you usually start by passing through the Capture_* state which loads data from the chip into the register, proceed to the shift* state where the captured data is shifted out while the new data from the PC is shifted in, and then make your way through a couple minor states to the update state where the new data is latched into the chip.
Here is the TAP controller state machine as a table, which is easily converted into C, VHDL, Verilog, or other languages.
| TMS=0 | TMS=1 | |
|---|---|---|
| Test-Logic-Reset | Run-Test/Idle | (no change) |
| Run-Test/Idle | (no change) | Select-DR-Scan |
| Select-DR-Scan | Capture-DR | Select-IR-Scan |
| Capture-DR | Shift-DR | Exit1-DR |
| Shift-DR | (no change) | Exit1-DR |
| Exit1-DR | Pause-DR | Update-DR |
| Pause-DR | (no change) | Exit2-DR |
| Exit2-DR | Shift-DR | Update-DR |
| Update-DR | Run-Test/Idle | Select-DR-Scan |
| Select-IR-Scan | Capture-IR | Test-Logic-Reset |
| Capture-IR | Shift-IR | Exit1-IR |
| Shift-IR | (no change) | Exit1-IR |
| Exit1-IR | Pause-IR | Update-IR |
| Pause-IR | (no change) | Exit2-IR |
| Exit2-IR | Shift-IR | Update-IR |
| Update-IR | Run-Test/Idle | Select-DR-Scan |
Note that aside from the first two states, the remaining states are divided into two sets of 7 states which are almost identical except that one set deals with the instruction register and one deals with the data register. These two sets of states deal with three operations: capture, shift, and update. Capture loads data from the chip into the corresponding register.Data is latched on the rising edge of TCK when exiting the capture state. Shift moves the data out through TDO while new data is shifted in on the TDI pin. The update-xx states cause the data from the shift register to be transfered to the selected register. The update occurs on the falling edge of TCK while still in the update state.
The state machines main states are reset, run/idle, shift DR, and shift IR.
Data is shifted in on the rising edge of TCK and data out becomes valid on the falling edge of TCK (to be latched on the rising edge). I.E. Rising edge is the active edge for input and output and the transitions occur at the falling edge. TMS=0 to keep the controller in the shift-xx state until all the data is shifted out; TMS=1 when the last bit is shifted.
Five TCKs with TMS high will reset the state machine to the Test-Logic-Reset state from any other state.
#define TAP_TEST_LOGIC_RESET 0
#define TAP_RUN_TEST_IDLE 8
#define TAP_SELECT_DR_SCAN 1
#define TAP_CAPTURE_DR 2
#define TAP_SHIFT_DR_SCAN 3
#define TAP_EXIT1_DR 4
#define TAP_PAUSE_DR 5
#define TAP_EXIT2_DR 6
#define TAP_UPDATE_DR 7
#define TAP_SELECT_IR_SCAN 9
#define TAP_CAPTURE_IR 10
#define TAP_SHIFT_IR 11
#define TAP_EXIT1_IR 12
#define TAP_PAUSE_IR 13
#define TAP_EXIT2_IR 14
#define TAP_UPDATE_IR 15
int jtag_tap_controller_state_machine(int tms)
{
static int state=TAP_TEST_LOGIC_RESET;
int nextstate;
switch(state) {
case(TAP_TEST_LOGIC_RESET): if(!tms) {nextstate=TAP_RUN_TEST_IDLE;} else { ; } break;
case(TAP_RUN_TEST_IDLE): if(!tms) { ; } else {nextstate=TAP_SELECT_DR_SCAN;} break;
case(TAP_SELECT_DR_SCAN): if(!tms) {nextstate=TAP_CAPTURE_DR;} else {nextstate=TAP_SELECT_IR_SCAN;} break;
case(TAP_CAPTURE_DR): if(!tms) {nextstate=TAP_SHIFT_DR;} else {nextstate=TAP_EXIT1_DR;} break;
case(TAP_SHIFT_DR): if(!tms) { ; }; else {nextstate=TAP_EXIT1_DR}; break;
case(TAP_EXIT1_DR): if(!tms) { nextstate=TAP_PAUSE_DR;} else {nextstate=TAP_UPDATE_DR;} break;
case(TAP_PAUSE_DR): if(!tms) { ; } else {nextstate=TAP_EXIT2_DR}; break;
case(TAP_EXIT2_DR): if(!tms) { nextstate=TAP_SHIFT_DR; } else {nextstate=TAP_UPDATE_DR;} break;
case(TAP_UPDATE_DR): if(!tms) {nextstate=TAP_RUN_TEST_IDLE; } else {nextstate=TAP_SELECT_DR_SCAN;}; break;
case(TAP_SELECT_IR_SCAN): if(!tms) {nextstate=TAP_CAPTURE_IR;} else {nextstate=TAP_TEST_LOGIC_RESET;} break;
case(TAP_CAPTURE_IR): if(!tms) {nextstate=TAP_SHIFT_IR;} else { nextstate=TAP_EXIT1_IR; } break;
case(TAP_SHIFT_IR): if(!tms) { ; } else {nextstate=TAP_EXIT1_IR;} break;
case(TAP_EXIT1_IR): if(!tms) { nextstate=TAP_PAUSE_IR; } else { nextstate=TAP_UPDATE_IR; } break;
case(TAP_PAUSE_IR): if(!tms) { ; } else {nextstate=TAP_EXIT2_IR; } break;
case(TAP_EXIT2_IR): if(!tms) { nextstate=TAP_SHIFT_IR; } else {nextstate=TAP_UPDATE_IR; } break;
case(TAP_UPDATE_IR): if(!tms) {nextstate=TAP_RUN_TEST_IDLE; } else {nextstate=TAP_SELECT_DR_SCAN; } break;
default: nextstate=TAP_TEST_LOGIC_RESET;
}
state=nextstate;
return(state);
}
This code is an example, intended for illustrations, it hasn't been tested. "enum" types may be used instead of "#define". This function should be called once on the rising edge of TCK. While you probably don't want to implement a TAP controller in software on an actual hardware device (it would be very slow), simulating one is useful for keeping track of the state of a controller you are talking too, debugging, educational purposes, etc. This example can also be pretty easily converted to VHDL or Verilog for synthesis or simulation. Note that #defined values for the various states were chosen so that the similarly named instructions (which have similar state transitions) share 3 bits in common to facilitate logic minimization.
IEEE 1149 has the combinatorial form of the state machine logic expressed as a 4 sums of 17 products, each product having at most 4 inputs. Thus 21 4 input LUTs could be used to implemnt that logic (about double that when you include the sequential portion).
The size of the instruction register varies from chip to chip. Minimum is 2 bits. It contains a shift register, a storage register, and decode logic. The value 01 is loaded into the two least significant bits of the instruction shift register during the capture-ir state. The instruction register also functions as a status register. What is loaded into the other bits depends on the chip.
The bypass instruction is all 1s. The bypass register is loaded with a 0 when the bypass instruction is executed and the chip enters the Shift-DR state. The contents of the bypass register are ignored during an update-DR. With multiple devices in the chain, you can selectively put some devices into bypass mode by loading a bypass instruction into some instruction registers and another instruction, such as sample/preload, into the others.
This instruction loads the state of each input pin into the boundary scan register and also the internal logic state of each output pin. This instruction does not interfere with normal functions. Sample and Preload may be separate or combined.
External test. This does interfere with normal function. It performs an external boundary scan test to test things like chip to chip connections. In EXTEST mode, each Update-DR state causes data in the boundary scan register to be driven to the output pins. The Capture-DR state causes data from the input pins and internal states for the output pins to be loaded into the boundary shift register (similar to sample/preload). Data is shifted on the rising edge of TCK.
This instruction allows the chip ID code to be read.
Additional part specific instructions can be defined. These can be used for things like built in self test, programming flash memory, and On Chip Debug.
Presumably used for things like factory test. If a chip has, for example, a CPU, RAM, FLASH, FPGA, IO, and bus interconnect blocks, there may be separate or combined scan chains around each block. With a block, there may be internal cells used to further partition the logic into testable domains.
A JTAG device contains multiple registers or chains accessable via the TAP port. If the TAP controller is in the Shift-IR state, the instruction register will be used. In the Shift-DR, the register inserted into the chain will depend on the currently active instruction in the instruction register.
In addition, to the instruction register, each device has more than one data register. At any given time, data is shifted though either the instruction register or one of the data registers depending on the state of the TAP controller and the last instruction loaded. Registers are shifted LSB first.
The boundary-scan register typically contains one or more bits for each logic pin on the device. More than one bit is needed to deal with output enables and direction controls for some pins. All device inputs must be observable and all device outputs controllable through the boundary scan register. The register may provide access to internal states associated with each pin.
This is a single bit wide shift register connecting TDI and TDO. It is selected in bypass mode and is used to minimize the number of bits that need to be shifted when accessing other chips. Why a one bit delay? Without it, you would end up with an analog delay in TDI proportional to the number of chips and under some conditions, TCK would get there before TDI would.
This is a 32 bit register that contains information that can be used to identify the specific chips in the chain.
A basic boundary scan cell contains two flip flops and some multiplexors. Boundary scan cells are chained together to make a long shift and store type register with preload. The first mux and the two flip flops shown above make up a basic cell of your garden variety shift and store register. The first flip flop is part of the shift register and the second part of the store register. The last MUX selects between normal operation (PI -> PO) and the boundary scan system taking over control of the signal. Input signals (to the cells, may be internal or external signals depending on the cell function). 1 to 3 basic cells are tied together to make the cell used on each I/O pin. The SI and SO pins (shift in and shift out) of adjacent cells are tied together in a log daisy chain. The PI and PO pins contain "parallel" data to be captured in Capure-DR state or output in Update-DR state. Depending on the state of the Mode signal, PI may be passed directly through to PO (normal operating mode for the chip). The mux on the left selects between shift modes (Shift-DR state) and preload (Capture-DR state). The UpdateDR signal is clocked when the TAP controller reaches Update-DR state, latching the data from the shift register into the store register. The first flip flop is sometimes called "CAP" and the second "UPD".
"Cell" will be used somewhat interchangably here to refer to either the simple cell or the 1,2, or 3 cluster of simple cells associated with a pin. The stanandard neglects to give the two separate names.
Before you can understand the interaction between the boundary scan logic cells and the I/O buffers, you need to understand the nature of the I/O buffers. There are four basic types: output only, input only, tristate output, and bidirectional. We will focus on the bidirectional buffer the other types are degenerate forms of the bidirectional Open drain and open source signals are also degenerate forms, with one of the driver FETs missing. The bidirectional pin is not simply a magic buffer on an internal tristate bus. It is very hard to make a bidirectional buffer work without a direction signal provided (see the section on level translation) and attempts to do so will offer poor performance in a variety of circumstances. Tristate buses are rarely used internal to the chip and in many cases they are prohibited. An internal tristate bus can have problems with capacitance, fanout, and the internal drivers, being much smaller than external I/O buffers, may be damaged by bus contention. So tristate buses are typically replaced with some sort of multiplexor arrangement. Typically, a bidirectional signal on an IC pin connects to the internal logic as three separate signals: input, output, and output enable.
Each cell has a general cell type (corresponding to the section headings below and denoting its function) and also references a CELL_INFO structure ("BC_0", "BC_1", "BC_2", etc.), which gives some more info on its structure, from the STD_1149_1_2001 header file or a user defined cell type. CELL_INFO tells you what signals the first mux connects to in various states; see section B.10.2.2 in 1149.1-2001. The comments preceeding the cell definitions contain obscure references to figures in the standard, i.e. "f11-18" means "Figure 11-18".
Each pin may have more than one cell type associated with it. An input pin will require 1 cell, a simple output 1 or 2 cells, a tristate output 2 or 3 cells, and a bidirectional pin 2 or 3 cells.
The picture below gives a cluster of three cells for a bidirectional pin. Other forms of pins can be approximated by simply removing the unnecessary logic. Don't forget, though, that even if a pin is output only, you probably still want to be able to capture the state at the pin so when minimizing the logic for an output, you can eliminate the input connection to the internal logic but keep the cell (possibly discarding the second flip flop and mux). Note: this is a zoomable image; click on it. Cells may not necessarily be daisy chained in the order shown. The first mux may vary from the diagram. Possible input sources are PI (parallel in), PO (parallel out OR the pin itself), UPD (the output of the second flip flop), CAP (the output of the first flip flop), X (unknown state), ZERO, or ONE in addition to the SI signal and may vary depending on whether you are in EXTEST, SAMPLE, or INTEST.
The configuration shown in the diagram above is all you really need for both internal and external test if you are designing a chip or for basic conceptual understanding. However, other configurations are used and may reflect corner cutting or other arbitrary choices. General purpose boundary scan software will need to parse the BSDL file.
Allows test system to sample the state on that pin but not assert state to either the pin or external logic. It is like the INPUT cell in the diagram above except: PO goes nowhere and the second flip flop and mux can be eliminated.
Allows test system to sample the value being driven onto the pin by internal logic (used for internal tests) or drive a value on to the pin (external tests). OUTPUT3: 3 state output (2 bit)
One bit is used for the signal and one for the enable line. Four states can be read (drive high, drive low, tristate high, tristate low) and "driven". Can sample the output and enable signals from internal logic or drive the output pin (with tristate).
Can sample the input pin or drive a value into the internal logic.
This is a method of cutting corners and reducing shift chain size. The input and output cells are replaced, with the aid of muxes, with a single cell. You can't read the actual logic state on the pin when it is being driven, so you can't test the driver itself, unless the driver output is coupled back to the input instead of the PO pin. This allowed by newer versions of 1149.1 though the BSDL becomes ambiguous since the pins are still identified as reading back "PO" (normally the input to the driver), not the actual pin output..
Here I have sketched up a concept for an advanced cell configuration for improved testability. This is a zoomable image. This cell cluster has programmable pull-up and pull-down and an input comparator with programable threshold. There are some ways to minimize it. A couple cells, one output only and one input only, can be combined. Also, instead of a separate pull up and pull down, it is possible to have two enables to allow the main driver to be strong or weak.
With a trivial amount of extra logic, a programmable logic vendor can provide a way to create internal scan chains. On Spartan 3 and 3E FPGAs, the BSCAN_SPARTAN3 block provides up to two internal scan chains using the USER1 and USER2 instructions. A better design would have simply provided one mux input and provided access to the instruction register bits so that a large number of scan chains, that look like any other scan chains, could be created thus being able to fully simulate a JTAG enabled device except for the instructions having an offset to account for the built in chains. As it is, the user could use the USER1 instruction to select a chain and the USER2 to access it. If the internal JTAG chains are not used, the programmable logic must select the bypass register. Vertex II, Vertex-II Pro, and Virtex-4 families also have the USER1 and USER2 access, with the Virtex-4 providing up to four user chains.
A better way to implement this is to provide a single extra input on the TDO mux. If a fuse is not programmed, this input connects to the bypass register to conform 1149.1. If, however, the fuse is programmed, it connects to user logic. All undecoded instructions select the extra fuse input on the mux. This way, all unused opcodes select the bypass register, as required by 1149.1. All bits of the instruction register are brought out to the user logic, as are the capture and update strobes. With an 8 bit instruction register size, there would be a large number of unused instructions. For example, all instructions with the MSB=1, except 11111111 (EXTEST) could be used. User logic would provide a 1 bit alternative bypass register or provide a signal to the main mux that selects the bypass register for all undecoded instructions. User logic implements an expansion mux and strobe logic for as many registers as needed. Programable logic must clearly identify instructions which access internal factory test logic (this is normally done in the BSDL file, anyway).
Another method is to have a configuration fuse control a mux on TDO such that the user logic is inserted in the chain after the programmable logic on board TAP controller after the device is configured. This provides 100% emulation of an ASIC design's tap controller without a separate TAP port. However, it is likely to confuse the configuration software and it is a nuisance to recover from if your onchip TAP is defective.
One use for such an internal chain is to provide access to an internal wishbone bus via the normal JTAG pins.
Some CPLDs allow the TAP pins to be used as USER I/O pins with a logic level on one pin selecting the mode. Putting a jumper on the selection pin (or asserting it with a spare signal on the pod) can allow access to an internal user TAP or SPI circuit.
Some newer PCs do not come with a parallel port at all and USB to printer port adapters are generally not suitable for bit banging. A long, highly capacitive cable is likely to be needed to reach the system under test, which slows down the rate you can send data. Both the parallel port interface itself and the JTAG device are too slow for a modern processor to issue output instructions at full speed so you need a delay for each bit clocked (actually two delays). Since this delay is too short to use the system timer, you will need to use delay loops which will waste CPU time. Your JTAG code will consume close to 100% of the CPU time until preempted, at which point there will be a pause in the JTAG bitstream until your process gains control again, so the flow of JTAG data will be sporadic. It is possible to use advanced modes (ECP) of the parallel port (which is often disabled in the BIOS) to send data more efficiently but this would necessitate the use of a CPLD in the parallel pod which is not normally the case (Xilinx parallel cable IV appears to be an exception). Suppose you set your delay to be 1us and your process gets 30% of the available timeslots. Then you will be sending 500Khz bursts (you need to write to the port twice for each tck) one third of the time for an effective data rate of only 150Khz and wasting 30% of your CPU time. Even the Parallel cable IV can only do 2.5Mhz tck rate.
When a JTAG device is driving signals through series resistors and the device on the other side doesn''t have JTAG, many tests will not detect shorts between adjacent resistor network pins. If you are actively driving your pins, the shorts will not overpower the driven outputs. Driving adjacent pins with test paterns a pin is tristated, however, may show that it follows the values of adjacent pins.
One type of patern is known as a wagner pattern. This basically involves assigning consequitive binary numbers to each net as a sequential test vector (STV). One bit from each STV makes up a parallel test vector (PTV). This pattern does a good job of detecting shorted nets with far fewer vectors than a walking ones or zeros test but not as good a job of isolating faults. A walking bit test requires of order N^2 tclk cycles, where N is the number of nets, while the wagner pattern is logarithmic. The wagner pattern is improved by sending out complements of the original vectors. The Parrallel Response Vectors (PRVs) are transformed into Sequential Response Vectors (SRVs); pins with the same SRV are probably shorted. The wagner pattern will detect some opens but additional open's testing is needed. In addition to the wagner test, an all 1s and all 0s PTV will help distinquish between stuck at 1 and stuck at 0 faults vs shorted nets. The assumption that either wagner patterns or walking bit patterns will detect shorted nets assumes that one driver wins.
"Algorithmic Extraction of BSDL from 1149.1-compliant Sample ICS", Ramond, et all, international Test Conference 1995, pg 561-568 explains how to extract BSDL from a sample chip (doesn't identify all the internal instructions).
Various faults in the JTAG chain itself can cause problems and some of these can be diagnosed.
Short circuit testing is normally performed first so you can power down a device quickly to minimize possibility of damage. The shorter the duration, the lower the proability of immediate or delated failure. Either the driver transistors or the bond wires may be damaged. In practice, digitial IC's often withstand more than you might expect. Investigation of Device Damage Due to Electrical Testing focuses on overvoltage spikes; likely to be more of an issue when you are testing interconnected boards.
Beware of test patterns thay may enable the output of an IC (such as a RAM OE) onto nets resulting in driver contention that may interfere with a test or cause damage.
intermediate-state shorts, where the result of shorting two nets together leads to indetermnate results are a problem for many logic families. Using a varying number of drivers on a net may tip the mid-state into a detectable region but may also increase the chances of driver damage (no worse than a short to GND/VCC). Tristate/O and Tristate/1 combinations instead of 0/1 may help detect but may also produce a lot of false positives. onTAP claims to be good at detecting midstate faults. The harder such a fault is to detect, the less likely it may produce imediate damage. If two 5V drivers shorted together produces a 2.5V level, that indicates that they are sharing the heating evenly; on the other hand, if a strong driver is competing with a weak driver, it is more likely to result in a determinate output. Shorts to power or ground, unless they are resistive, are likely to result in a determinate output .
Pretty much all the useful information is in strings assigned to contstants or attributes. These strings can contain arrays of structures and other complicated forms that require parsing. These strings are split into pieces in potentially random ways and concatenated with "&". So, you can pretty much forget about a single stage parser. You have to parse the VHDLness to get the strings, and their constants, and then parse the contents of the strings.
The CELL_INFO array contains multiple records of type (a,b,c) where a is the context in which the cell is used (INPUT, OUTPUT2, OUTPUT3, INTERNAL, CONTROL, CONTROLR, CLOCK, BIDIR_IN, BIDIR_OUT, or OBSERVE_ONLY), b denotes the capture instruction (EXTEST, SAMPLE, or INTEST), and c denotes the data source captured (PI, PO, CAP, UPD, X, ZERO, or ONE (explained above)).
The BOUNDARY_REGISTER is a string (usually split over multiple lines and concatenated with "&"):
cell_number "(" cell_type, port_id, function, safe_bit, [ccell disval rslt] "),"
PIN_MAP contains a mapping from signal names to pin numbers, separated by commas. "FOO:9" means PIN 9 is FOO. "Q:(1,2,3,4)" means that signal Q1 is on pin 1, Q2 is on pin 2, Q3 is on pin3, and Q4 is on pin4. Give or take a little. Q in this case is a bit_vector(1 to 4). If it was a bit_vector(0 to 3) this would probably be Q0 to Q4. The bit vector information comes from the "port" of the entity describing the part and this is one of the few places where you find information in actual VHDL.
cell number is usually sequential. function is one of INPUT, OUTPUT2, OUTPUT3, CONTROL, CONTROLR, INTERNAL, CLOCK, BIDIR, OBSERVE_ONLY. port_id is ???. safe_bit is one of 0,1 or X. The last three have to do with disabling a pin. They denote which cell number is disabled, what value (0 or 1) results in disabling, and what state results from disabling (Z, WEAK0, WEAK1, PULL0, PULL1, or KEEPER).
You should be able to connect a JTAG cable to any board and have it automatically determine:
There are many reasons to probe an unknown chain. A generic program such as a debugger or serial flash loader may support specific chips and not care about other chips. It needs to know the configuration of the chain so it can put the other devices in bypass mode and add the appropriate number of dummy bits to the beginnng and end of the data stream. The inability to reliably probe an unknown chain is probably one primary reason a lot of microcontrollers need to be on their own separate chain to support debugging. Another reason may be a nuisance patent. Probing an unknown chain can also be used in a production environment to automatically select the appropriate test or download routines when a device is connected.
Automatic identification of boards which have the same chips on them is possible in a number of ways. You can change the order of the devices on the scan chain. You can use a common set of otherwise unused pins on a CPLD or other JTAG device which are hardwired on the PCB to a unique ID code. You can program an analog voltage on a JTAG ADC voltage monitor chip with a pair of resistors. Also, many JTAG programable logic devices have a user ID register that is programmed when the programmable logic is programmed; this is used to identify the unique firmware on an otherwise standard chip. Once a chip has been programmed once, using JTAG without auto ID or by programming the chip prior to part placement, the board can then be identified. These forms of auto ID can be used to automatically select the right firmware or guard against operator error during factory in circuit programming and test. They can also be used to identify boards installed into different slots in an embedded system.
Note that it would be relatively simple to convert the SDIO slot on PDAs to a JTAG test controller, provided you have documentation on how to program the SD card port on your device.
Taken from the TI Scan Educator program. This is an 8 bit tristate bus driver with two enable pins. It contains an 18 cell boundary scan register with a two cell TI SCOPE specifc boundary control register, and a one cell bypass register.
Initial state:
TDI=0 TMS=1 TCK=0 (TDO=Z), test-logic-reset state
Now move from test-logic-reset to Shift-IR state with TMS={0,1,1,0,0)
TDI=0 TMS=0 TCK=0 (TDO=Z)
TDI=0 TMS=0 TCK=1 (TDO=Z)
(state now run-test/idle)
TDI=0 TMS=0 TCK=0 (TDO=Z)
TDI=0 TMS=1 TCK=0 (TDO=Z)
TDI=0 TMS=1 TCK=1 (TDO=Z)
(state now Select-DR-Scan)
TDI=0 TMS=1 TCK=0 (TDO=Z)
TDI=0 TMS=1 TCK=0 (TDO=Z)
TDI=0 TMS=1 TCK=1 (TDO=Z)
(state now Select-IR-Scan)
TDI=0 TMS=1 TCK=0 (TDO=Z)
TDI=0 TMS=0 TCK=0 (TDO=Z)
TDI=0 TMS=0 TCK=1 (TDO=Z)
(status value captured into instruction shift register)
(state now Capture-IR)
TDI=0 TMS=0 TCK=0 (TDO=Z)
TDI=0 TMS=0 TCK=0 (TDO=Z)
TDI=0 TMS=0 TCK=1 (TDO=Z)
(state now shift-IR)
TDI=0 TMS=0 TCK=0 (TDO=1)
SAMPLE/PRELOAD is 10000010 (shifted in LSB first)
TMS=0 TDI={0,1,0,0,0,0,0}
TMS=1 TDI={1} // state transition on last bit
Note that this denotes the logical connections, not physical. Normally, there needs to be a buffer/level translator or at least a resistor. This table is concerned with software driver programming.
| Parallel Pin | Xilinx Parallel 3 | Digilent JTAG3 | Wiggler Clone |
|---|---|---|---|
| 1 - Strobe | |||
| 2 - Data 0 | TDI drive (tristated by pin 5 high) | Drives TDI, inverted | Drives nSRST (inverted OC) |
| 3 - Data 1 | TCK drive (tristated by pin 5 high) | Drives TCK, inverted | Drive TMS |
| 4 - Data 2 | TMS (tristated by pin 5 high) | Drives TMS, inverted | Drives TCK |
| 5 - Data 3 | Tristate TCK, TDI, and TMS | TDI drive | |
| 6 - Data 4 | TDO drive (OC) | ||
| 7 - Data 5 | |||
| 8 - Data 6 | 8,11,12 shorted | ||
| 9 - Data 7 | 9,11,12 shorted | ||
| 10 - Ack | |||
| 11 - Busy | 8,11,12 shorted | 9,11,12 shorted | Read TDO |
| 12 - Paper Out | 8,11,12 shorted | 9,11,12 shorted | |
| 13 - Select | TDO/DONE readback | TDO readback, not inverted | |
| 14 - Auto Feed | |||
| 15 - Error | VCC Sense | ||
| 16 - Init | |||
| 17 - Select In | |||
| 18-25 - Ground | Ground (20,25) | Ground(18) |
JTAG spec allows up to 25 megabit per second transfers. According to hardwarehacking.org, "With a parallel port cable, however, you will be lucky to achieve more than about 400,000 bits-per-second.". The open source windrvr replacement notes it is limited to about 200Khz on the parallel cables. Note that there is some confusion in speeds, as they may sometimes refer to the port update rate and sometimes the tck rate (half that).
An FT232RL chip would probably acheive about the same speed as a parallel cable. Speed could be doubled by running open loop. An FT2232C/D chip in MPSSE mode would probably get around 5.6 megabits per second.
Many poorly structured JTAG programs are written in such a way that they are very sensitive to latency and will slow down by many orders of magnitude when run over USB or IP. The debrick utility is an example of this. All I/O is done by calling clockin(), once for each bit. It reads the TDO line each time. USB 2.0 high speed is a high bandwidth link but it is also high latency. You get 1 I/O operation each millisecond (8 per millisecond in high speed). An efficient USB high speed pod could conceivably pump about 320 megabits per second of data (far more than most chips can handle) but writing two values to TCK and TDI and then reading TDO will take a millisecond or more, thus reducing the speed to under 1Khz (or 8khz for high speed). The solution to this is to delay parsing data which is read back. One approach is to pass a pointer to where to write the results and don't expect it to be updated immediately. Then write a bunch of data before reading the results. Even if you delay the read until you have written one word of flash, writing 3MB of flash will take 50 minutes. Higher performance over USB full speed could be acheived using templates stored on the microcontroller.
Use a sliding window approach to reading results. Thus you may send multiple vectors before you check the results of the first vector. Here is some oversimplified code:
#define MAX_WINDOW 16
int i;
int response_count = 0;
bit_vector_t stimulus_vector;
bit_vector_t response_vector[MAX_WINDOW];
for(i=0; i= (MAX_WINDOW-2) && ! vector_ready(response_count) {
// spin
}
response_vector[i%MAX_WINDOW] = allocate_response_vector();
generate_stimulus_vector(&stimulus_vector);
send_vector(i, stimulus_vector, response_vector);
if(vector_ready(response_count)) {
check_response_vector(response_vector[response_count%MAX_WINDOW]);
free_response_vector(response_vector[response_count%MAX_WINDOW]);
response_count++;
}
}
Also, you can use feed forward to reduce latency issues. In this case, you send the data to be transfered, the data you expect back, and a mask indicating which data to ignore. This is also good to use as a primary programming API as the library can automate checking and it has more options as to what commands to send over the wire. It also lends itself to recording data files that can be played back later.
Compression could allow higher JTAG speeds without needing a USB high speed. In flash programming, only 1 of the 4 address bytes changes 99% of the time. In typical sweeping 1 and 0 test vectors, only 1 or 2 bytes out of the entire vector needs to change. In most cases, the compression algorithm will use the last vector as a start. algorithm needs to be simple enough that it can be implemented on a typical microcontroller without slowing down to less than USB speeds. A very simple algorithm is to transmit 1 extra byte for each block of 8 bytes. The bits in that byte indicate which bytes change (and will be transmitted). Speed improvement up to 8x is possible, depending on the pattern. Another simple algorithm would be to send a offset as well as a length allowing a contiguous subset of bytes to be transmitted. Run length encoding could be used but that will have more cpu overhead.
Flash programming can involve a lot of JTAG traffic. In some cases, such as programming a non-jtag parallel flash chip by manipulating the pins on a CPU, many hundreds of bits may need to be sent over JTAG for every byte programmed so there can be a couple orders of magnitude of inefficiency. For JTAG flash devices, the overhead is probably less than an order of magnitude.
Programming these devices appears to be very efficient as Xilinx appears to have set things up so that you just have to dump the entire bitstream into the TDI pin without interuption with a little setup and teardown before and after. This means that the number of JTAG operations is just a little more than the number of bits to be programmed.
Some applications require opto-isolation. Getting fast enough isolators can be a problem. magnetic isolators may be preferable. An NVE IL261 ($9 digikey) is a 110Mbps part with 4 channels in one direction and 1 in the opposite. An ADUM1401CRWZ is similar (3 in one direction, 1 in the other) Note: consider an isolator that has inputs which line up with outputs and comes in a package in which low value resistors are also available.
The state of level translation ICs leaves a lot to be desired. Relying on pullup resistors can slow things down. 10K ohm pulling up 20pf gives a time constant of 200ns.
If you have 1000 pins, with 3 bits per pin, walking a 1 through each pin and reading the results would be about 3 million operations. Repeat for 0.
Shift instructions should not be used to access bits. They are inefficient as they repeat the same calculation with the same results many times. For an 8 bit port:
unsigned char keepmask;
unsigned char tdimask;
unsigned char tmsmask;
unsigned char tdomask;
unsigned char tckmask;
unsigned char writexormask;
unsigned char readxormask;
value = inb(port) & keepmask;
if(tdi) value |= tdimask;
if(tms) value |= tmsmask;
if(tck) value |= tckmask;
value ^= writexormask;
tdo = !! ((inb(port) ^ readxormask ) & tdomask);
// !! converts to 0 or 1 type boolean, as opposed to any bit set
(optional, depending on how used).
outb(port, value);
// Optional, if we are doing a full tck cycle.
value ^= tckmask;
outb(port, value);
Instead of trying to burn a flash using inefficient jtag sequences, download a tiny bootloader. The bootloader can load the file using USB, ethernet, or high speed TTL serial. On the ARM SAM7 cpus, there is a bootloader built into the chip, no jtag is even needed, though jtag could probably be used to trigger the bootloader without changing a jumper and cycling power. Some processors and FPGAs have an internally accessable register that is accessable via JTAG instructions. This can be used to communciate with a boot loader, and may work similarly (but faster) to having a serial port. A JTAG on chip debugger can be used to write to external flash memory locations much faster than boundary scan. And some chips have an onchip DMA mode (MIPS ejtag). A FPGA bootloader "program" can be used as well.
Sequences for multiple devices can be interleaved but it can get tricky as one device may need a TAP controller state transition that corrupts the state of another device. Merging can save protocol overhead and it can take advantage of delays (such as during flash programming). External boundary scan for multiple devices can easily be merged; you shift the data into all devices simultaneously instead of sequentially loading each one. They are really parts of a larger test vector, anyway. Programming multiple similar flash devices with the same or different data should be possible if they use the same programming sequence but it gets more complicated if they use different sequences. Merging internal test data could become difficult. Simple vector sequences might be ok, but loading a new instruction, for example, might adversely affect the state of other devices. ISC_ILLEGAL_EXIT in the BSDL fileists instructions that will disrupt operation.
Ethernet pods can be useful in many situations. Ethernet provides 1500V of electrical isolation. Ethernet does not require the computer and the development system to be in the same room (though that is handy for debugging). Ethernet lets your download the software from your office while you are walking to the lab. An ethernet pod with an minimal LCD interface lets people on the production line program parts without a computer; the pod can communicate with a server which will serve images and tests. The pod can be server (with human interface) or a client. The daemon can interact with the user through the pod.
These are my notes. I am considering implementing what you see written here. They may be
There is a need for a new well designed universal cross platform jtag suite (library, drivers, and applications) with a permissive license (NOT GPL) that can be used by any software, commercial or open source, with any pod, on any target, whether it is connected via parallel port, USB, or ethernet. Much of the existing software is infested with the GPL license, even at the library level. An orwellian copyleft license is focused on denying "bad" behavior (without regard to how many good activities are also denied) rather than allowing "good" behavior.
Just a few GPL incompatabilities:
Universal Non-disclosure agreement: if you don't disclose the technical details of your product we agree not to buy it.
A JTAG library needs to have a license that allows the code to be used in either a commercial pod or a permisively licensed one (and i don't give a damn whether or not it allows use in a GPLed pod), or otherwise embedded in a target system. And a JTAG front end client (command line or gui) needs to allow proprietary extensions (either from a vendor or an end user) and needs to be compatible with permissively licensed software. Even LGPL would be of somewhat limited value to the JTAG community.
I despise the GPL. Back when I started giving away code for free, free code was really free. Then Stallman came along an screwed it up. GPL tries to prevent people from doing "bad" things and it doesn't care who it hurts in the process and how many "good" things are prevented. Permissive licenses allow people to do "good" things and don't care if other people do "bad" things. The latter philosphy is vastly superior. GPL has little effect on psychopathic corporations but it can have an enormous effect on altruistic small businesses. A commercial library may cost a developer a $100. A GPL library can cost him his livelyhood. And those of us who are trying to give away as much as we can are far more affected by this than the freeloaders. The GPL "commons" is itself a freeloader. It steals from the public domain and permissive license commons but gives nothing back. If you want to make a real contribution to humanity, don't use GPL, use a permissive license (such as three clause BSD).
We all take from the creative commons. We should all try to give something back.
Here are some thoughts on an API and protocol structure. The idea is to keep the pod code simple, though some pods may implement more functionality. Most vectors involve a lot of clocking TDI with TMS static with some TMS stuff at the beginning and end. The TMS stuff can be almost bitbanged, i.e. we send a byte down the wire for each bit clocked (as opposed to two bytes). So, a typical vector could consist of a few JTAGBITBANGs or individual state transitons for setup, a STREAMSERIAL or COMPRESSSERIAL for the vector, and at the API level, many of the functions will have a default implementation that calls other simpler functions if an unimplemented function is called. JTAGBITBANGs can optionally be used, with some loss of efficiency, to handle extra TDI bits and the beginning and end of a VECTOR (such as bypass bits) without realligning the data.
Lengths will usually be in bits; streams data will be padded to an even number of bytes; the extra bits will be ignored.
Note that this structure will work for dumb parallel port pods, serial pods, dumb USB pods, smart USB pods, SPI pods, ethernet pods, and other configurations. Code which calls this API can be linked with a stripped down version to embed in a target system for on the fly reprogramming (or field updates). A driver can be written to program a device through the on-chip-debug or serial debugger on any micro, though it will be very slow. The same function names and calling conventions can be used above and below the stream encoder/decoder allowing various functions to be used at various levels (including embedded). This API will work with drivers which are linked as shared libraries/DLL, kernel drivers, standalone programs (pipe stdin/stdout), or TCP/IP deamons. Anywhere you have either a procedure call interface or a binary stream capability. As a demonstration of this, a parallel port driver can be built such that it can be called from a pipe or TCP/IP daemon. In this way, all the non-microcontroller specific code can be debugged without the overhead of running on a target system.
The protocol functions could be divided into three files:
For operations that function on I/O ports, there should be various word typedefs. High level software will always use a 32 bit word, even for 8 bit ports, so that the same code can talk to 8, 16, and 32 bit micros.
API should allow concurrent access to multiple JTAG streams so that multiple boards can be tested together.
Multilayered stack (simplified)
Also, utility functions such as a BSDL parser, SVF/XSVF parser, chip database, etc.
Some thoughts on a human readable syntax for jtag commands. What is shown is at a low level, without showing each manipulation of the clock line. This notation could be used for the ASCII level of the protocol above (or equivalent binary commands), for debugging output, and for manual input.
Setup pin assignments
SETUP PIN[0],OUTPUT,TMS
SETUP PIN[1],INPUT,TDO
SETUP PIN[2],OUTPUT,TDI
SETUP PIN[3],OUTPUT,TMS
These lines set a control line without wiggling clock
TDI=0
TDI=1
TMS=0
TMS=1
TCK=0
TCK=1
TRST=0
TRST=1
SRST=0
SRST=1
These next ones clock one or more bits of data on the TMS or TDO line, leaving the other line intact:
TMS{11111} # JTAG Reset
# equivalent to TMS=1 TCK=1 TCK=0 TMS=1 TCK=1 TCK=0 TMS=1 TCK=1 TCK=0 TMS=1 TCK=1 TCK=0 TMS=1 TCK=1 TCK=0
TDO{0101010101010101} # a typical vector for shift DR or shift IR modes
# equivalent to TDO=0 TCK=1 TCK=0 TDO=1 TCK=1 TCK=0 TDO=0 TCK=1 TCK=0 ...
Thus:
TCK=0 # set TCK to zero, ready for next low to high transition
TDI=0 # Set TDI to known state
TMS{11111} # Test-Logic-Reset
TMS{01100} # Test-Logic-Reset -> Run-Test/Idle -> Select-DR-Scan -> Select IR Scan -> Capture-IR -> Shift-IR
TMS=0 # stay in Shift-IR state (Redundant)
TDI{00001111} # shift in instruction 00001111
(response 00000001)
TMS{11100} # Shift-IR ->Exit1-IR -> Update-IR ->Select-DR-Scan -> Capture-DR -> Exit1-DR
TDI{01010101010101010101010101010101} # shift in data
(response 010001000100101010101010101010001)
TMS{11} # shift-IR -> exit1-DR -> Update-DR
Compatability with existing software can be accomplished by:
The protocol described here could be used to communcate with an ethernet enabled pod. It could also be used to communicate with a server application running on top of a parallel or USB pod. Thus, you could for example download code from your development station while you walk to the lab. This mode could also assist software developers as you can test software using hardware located at a user's site. And since it is designed to be forgiving of latency, you can do this at a considerable distance. A USB over ethernet gateway could be used to debug some pod issues.
A JTAG pod needs the following level of technical documentation to be considered acceptable:
When you withhold important technical documentation, you are hurting your customers more than your competitors.
We need to start a public domain database with:
Those developing jtag pods may wish to support some of these.
This is an open collector one wire protocol used on HSC12, 9S12, and coldfire CPUs. BDM is documented. Timing is problematic for bit banged software implementations. A brief description (from memory): the speed of the protocol depends on the clock rate (PLL not XTAL) the cpu is using. At 16Mhz, the pod generates a 1 Mhz clock rate which is modulated to 25% or 75% duty cycle to send ones and zeros. When it is time for the processor to send a response, the pod generates pulses with 25% duty cycle and the processor selectively stretches them to 75%. There are timeouts that further complicate matters. The BDM pulse frequency is 1/16th of the CPU clock. When sending, you pull the line low for 3 or 4 cycles to send a 1 or 12 or 13 cycles to send a 0. The pin is sampled on the 10th cycle. You have up to 512 cycles after the start of the last pulse to send the next pulse before the CPU times out so the minimum frequency is about 31.250Khz at 16Mhz. The file S12BDMV4.pdf in the 9S12DP512 Zipped Datasheets documents one variant of this protocol. BDM normally uses a 2x3 pin (0.100") male header on the circuit board:
BKGD 1 2 GND
3 4 RESET
5 6 VCC
Note that programming a FT232R in RS-232 serial mode might let you interact with BDM (with an external resistor) at some clock rates. Sending the character FF will send a 1 clock pulse, FE a two clock, FC, 3 clock, F8, 4 clock, etc. So setting the baud rate to 1/4 the cpu frequency and sending FF and FC characters (and receiving similar characters) might be a way to bit bang the data across.
debugWIRE is a one pin (RESET) debugging protocol used on some ATMEL 8 bit AVR microcontrollers. It is disabled by default and must be enabled using ISP. Atmel does not publish the debugwire protocol (only the protocol used to talk to their debug pods).
(TDO) MISO 1 2 VCC (VTref)
(TCK) SCK 3 4 MOSI (TDI)
(nSRST)RESET 5 6 GND (GND)
Signal names in parenthesis are the line on the JTAG10PIN connector these signals are wired to when adapting a JTAGICE MKII to use ISP/debugwire on a 6 pin connector.
Many AVR micros can be reprogrammed using the SPI port while reset is held low. It appears that in ISP programming mode, the on chip SPI is in slave mode and thus the directions of the MOSI and MISO pins are reversed from application mode.Alternatively, RESET may be pulled up to 12V for high voltage programming (faster?). It is likely that these pins are used for other purposes on the board so you might have to fight with other drivers (hopefully with series resistors).
It appears that the ISP protocol is used for programming only and the debugwire protocol is used for debugging only.
Spider programmer page has schematics for AVR ISPThis is apparently a nightmare as the specs are not released and they change drastically from chip to chip.
This protocol is not an ISP protocol. It is normally used to communcate with sensors and other one wire devices.
Some folks have made modified code to run on the version of this cable that is built into some starter kits. Nobody seems to by the standalone cable, for obvious reasons. xup and the ixo usb_jtag projects are ones that have downloaded modified code. The code is pretty primative and slow.
This section lists some chips which can be used to make USB JTAG pods.
Dumb USB-to-serial adapter (full speed) with bit bang mode. Very low total parts count, just a few external caps and the connectors. Internal crystal. Internal USB resistors. Internal configuration eeprom. 1.5V-5V VCCIO. Lacks the multiprotocal serial engine of the FT232RL so you can't send 8 bits per byte and instead need to send two bytes per.
Dumb dual port USB-to-serial adapter (full speed) with bit bang mode. 16 times faster than the FT232RL. Needs external serial configuration eeprom, external crystal, and more caps than the FT232RL. 5V/3.3V IO so level converters are needed.
USB full speed, ethernet MAC (requires external PHY), 48MIPS? ARM7 32 bit core, SPI, JTAG TAP You could probably acheive around 3.2Mhz bidirection or 6.4Mhz single directional (ignoring TDO) without compression (USB bus limited) and maybe faster with compression, depending on the operation. Using ethernet, should be able to achieve 25Mbps. By switching a jumper, you can activate an internal boot program that allows downloading by USB, among other sources. Requires around 32 external components including voltage regulator, crystal, decoupling caps, USB resistors, PLL filter, JTAG connector, JTAG pullup resistors, USB connector, and jumpers (TST, ERASE, JTAGSEL) to make a basic USB micro before you add the application specific stuff (level translation) assuming you want full download/debug capability via JTAG and USB. Additional components, including PHY and connector, required for ethernet.
32 bit CPU, USB on-the-go full speed, ethernet MAC, 512K flash, 80 Drystone MIPS, AVR32 core, SPI, peripherals similar to SAM7
USB full speed, 8051 core, lacks JTAG TAP, lacks on chip debug, 16K RAM, no flash (program has to be downloaded from PC every time unless you use external RAM, Internal USB endpoint FIFOs, I/O FIFO with state machine
Port GPIO Master Slave Other
---- ---- ------ ------- ---------
SCL (I2C)
SDA (I2C)
D+ (USB)
D- (USB)
XTALIN (oscillator)
XTALOUT (oscillator)
RESET#
WAKEUP#
PE0/T0OUT
PE1/T1OUT
IFCLK external 5-48mhz FIFO Clock (or uses internal 30/48Mhz clock
CLKOUT 8051 clock (osc PLL divided down)
? RDY0 SLRD read strobe for fifo in slave mode
? RDY1 SLWR write strobe for fifo in slave mode
? CTL0 FLAGA
? CTL1 FLAGB
? CTL2 FLAGC
PA0 GPIO INT0# interrupt
PA1 GPIO INT1# interrupt
PA2 GPIO GPIO SLOE fifo handshake? (page 113)
PA3 GPIO GPIO GPIO WU2 Power mangement (page 89)
PA4 GPIO GPIO FIFOADR0 fifo handshake?
PA5 GPIO GPIO FIFOADR1 fifo handshake?
PA6 GPIO GPIO PKTEND fifo handshake?
PA7 GPIO GPIO PA7/FLAGD/SLCS# fifo handshake?
PB0 GPIO FD[0] FD0[0] (fifo data)
PB1 GPIO FD[1] FD0[1] (fifo data)
PB2 GPIO FD[2] FD0[2] (fifo data)
PB3 GPIO FD[3] FD0[3] (fifo data)
PB4 GPIO FD[4] FD0[4] (fifo data)
PB5 GPIO FD[5] FD0[5] (fifo data)
PB6 GPIO FD[6] FD0[6] (fifo data)
PB7 GPIO FD[7] FD0[7] (fifo data)
PD0 GPIO FD[8] FD0[8] (fifo data)
PD1 GPIO FD[9] FD0[9] (fifo data)
PD2 GPIO FD[10] FD0[10] (fifo data)
PD3 GPIO FD[11] FD0[11] (fifo data)
PD4 GPIO FD[12] FD0[12] (fifo data)
PD5 GPIO FD[13] FD0[13] (fifo data)
PD6 GPIO FD[14] FD0[14] (fifo data)
PD7 GPIO FD[15] FD0[15] (fifo data)
100 and 128 pin packages provide two additional ports
Use T0OUT to drive IFCLK? Auto reload division up to divide by 255, otherwise manual reload needed. Prescaled by 4 or 12 giving 12 or 4 Mhz which is a bit slow for full speed jtag. Maybe we can program the GPIF state machine to make a higher clock from the 48Mhz clock? Chip lacks SPI which could be used for fast programmed I/O. Connecting SLWR to tck may be helpful in using pod to snoop a JTAG interface. May not have individual data direction control on the FIFO data lines, which is a real problem. Not sure we can use two fifos, one for input and one for output. Indeed the FIFO looks really limited for such applications. There is no SPI to allow data to be clocked out by hardware 8 bits at a time. You can see why people use a CPLD with this chip.
It looks like 25Mhz would be possible with this chip with an external CPLD to drive two internal FIFOs in slave mode. Without the CPLD, you aren't going to get anywhere close. Despite the fancy programable state machine, this chip is a dog. The fact that it uses an 8051 core was your first clue.
Same as the FX2 only limited to USB full speed.
(sampling) 1MB Flash, USB high speed, 64K RAM, ARM7TDMI 60Mhz, JTAG OCD but no JTAG boundary scan?,
USB high speed, 32Kx8 flash, 3.5Kx8 RAM +1.5K for USB fifos, 64-LQFP or 100-LQFP, about $9 digikey, 8bit HCS12 CPU, similar chip with ethernet but no USB availible. No JTAG. BDM used for debugging. Controllers for SD, compact flash, IDE, etc. 3.3V/5V I/O. UART but no SPI, though the SD card port might support SPI. NOPE, SD controller does not support SPI though it may have the 4 bit equivalent. Operation up to 20Mhz. Queue controller for transfering data to peripherals. Does not appear to have ability to load firmware over USB. However, once the device has been programed once with a boot loader, this could be done (I have written such a loader for non-usb versions of this chip).
ARM Cortex 32 bit CPU, 128K Flash, 20K RAM, USB full speed, two ADC, seven timers, 2xI2C, 3xUAR, 2 SPI (18Mhz), CAN, no ethernet, LQFP100, JTAG, serial wire debug (SWD), $9 digikey, smaller versions available. Does not support flash programming via USB unless you have previously installed a bootloader via JTAG. Five 16 bit GPIO ports with haphazard pinout.
ARM7TDMI microcontroller with 1MB flash, 64KB RAM, USB High Speed $12.64 digikey, no ethernet. Dual DAC, Dual ADC, No SPI? but has SD/MMC card interface. No PWM? Only available in BGA (3 rows of balls). Samples only. Webmaster needs to be taken out and shot: JAVA, FLASH, unauthorized opening of windows, content is a mess.
As time and resources allow, I am working on a ton of JTAG related projects. They are highly interdependent so they are being done in parallel.
I have designed a very simple compact FT232RL based JTAG adapter. 1.5? to 5V. No PCB yet. Probably be around 400kbps. May be USB flash drive sized.
The pod logic can be programmed into a FPGA for very high performance.
As described here
One of the big issues in making a JTAG pod is level translation. The state of level translation ICs leaves a LOT to be desired. You would think that there would be a lot of chips that you could apply 1.2-5V power and logic on port A and 1.2-5V (or 0V) power and logic on port B. Well, that isn't the case. Many level translators assume that one power bus will always be higher than the other. You also need to deal with hot plugging where signal lines might get connected before Vref. And you can have any combination of the POD and the target being powered up or down. TI has a voltage translator selection application note that is informative; it would be more useful if there were actually good parts to choose from. Pullup resistors should not be used to pull the output of a driver higher than its supply voltage. Also, many level translators don't work at 5V. Many level translators have output enable or direction signals but don't be surprised if the input is connected to the wrong supply voltage. Many 5V devices have TTL not CMOS levels, which must be taken into account when doing voltage translation. Very few translators go from 1.2 to 5V on either port, let alone both ports with either VCC higher. And good luck finding a suitable part that also has a second source.
Maxim has a level translation tips though not adequate for a serious pod.
A typical JTAG pod might have 3.3V logic and need to interface to 1.2 to 5V logic. This means the other side of the translator could have higher or lower supply voltage. If the POD uses 5V logic, conversion may be simpler.
The state of single directional voltage translation leaves enough to be desired; when dealing with bidirectional signals it gets worse. Bit programmable bidirectional signals, such as would be found on a GPIO port or on a JTAG pod that allows flexible pin assignments, are particularly problematic for level translation. It is one thing to translate a bidirectional signal when you have a direction signal to work with and another when the translator has to guess. Suppose you have a high level on a port A input. Is it high because the micro drove it high or is it high because the translator is driving it high because it thinks port B is being driven high. Voltage clamps with passive pullups don't have this problem but they have the usual issues with pullups.