Chapter B2. Theory of Operation

This chapter describes the CFT processor from a theoretical perspective.

For a hardware description of the units described here and more minor ones beyond the scope of a theoretical discussion, please refer to hardware-description. For a description of the processor's programming model, please refer to chapter-b1-programming-model. These two chapters and this one are necessarily inter-dependent.

The CFT processor was purposefully designed to be very simple. This allows it to be built with relatively simple manual techniques, but also makes it easier to debug, maintain and reason about.

To keep things simple and modern, the CFT is a Von Neumann architecture: it uses the same memory for both programs and data, allowing the two to be mixed. You're reading this on a computer with the same architecture.

Like many traditional computer designs, it is an accumulator-based architecture, also known as a one-operand machine: operations use a single, main register (the Accumulator) to hold all values.

Internally, the design (like most hobbyist designs out there) is built on microcode hosted on Flash EEPROMs, which allows many bugs in the processor to be solved by reprogramming the Flash memory rather than rewiring.

B2.1. Datapath

Processors are built out of simple, discrete units connected together by buses and control signals. The CFT is no different. The processor’s datapath, the way data ﬂows from unit to unit, is shown in th-datapath.

The datapath is organised around a single internal bus (shown surrounding the datapath diagram), the IBUS. Modern processors use multiple internal buses or buses segmented into so-called pipelines: this allows one processor unit to start work before the previous one has completed, increasing performance but also the complexity of the design. In contrast, the IBUS is not pipelined. At any given time, at most one unit is writing to it, and (in normal circumstances) at most one unit is reading from it.

This precludes pipelining techniques, but keeps the processor’s behaviour easy to visualise and implement. A number of units may either write to the IBUS, or read from it. The IBUS may also be connected to the external Data Bus (DBUS) via the Data Bus Transceiver, so that data can be exchanged between the processor and its peripherals.

The Control Unit (near the top right corner) is not directly attached to the IBUS, but controls the rest of the processor. To do so, it looks up the values of various registers and ﬂags on a large table of Microcode and activates control signals as dictated by the table, driving the rest of the processor.

The Arithmetic and Logic Unit (ALU) is responsible for arithmetic and logic operations operations performed by the processor. It can perform a number of binary and unary operations. All operations involve the Accumulator as one of the operands. Binary operations involve the current value of the Accumulator (AC) and the value of the IBUS. Unary operations involve only the AC. The ALU’s operations update some of the flags used by the control Unit.

Ancillary to (and in fact part of) the ALU is the Constant Store, which can provide a small number of constant values to the IBUS. These are useful for a number of operations, such as clearing registers by assigning them the zero constant, or initialising values of registers when the machine is reset.

There are three major registers in the datapath: the AC, the DR, and the PC. Major registers are 16-bits wide. They may be read from or written to, incremented by one, or decremented by one.

Of these, the AC is used as a general-purpose register, permanently and directly supplying the left operand of the ALU, and generating flags used by the Control Unit in decision making.

The Program Counter (PC) is used to store the location in memory of the next instruction to be executed.

The Data Register (DR) is used to temporarily store intermediate addresses used for indirect memory accesses but may be generally used by the microcode as a scratch register.

A number of minor registers are also available. These are either narrower than 16 bits, or have various restrictions placed on them. Many are essentially the flag bits of more modern processors.

The Interrupt Flag (I) is a single-bit register that controls whether asynchronous interrupts (e.g. peripherals needing attention) may temporarily stop the processor.

The The L Register is a versatile single-bit register: it is used as a flag, a carry out bit, carry in bit, a borrow bit, or a shift register, and is usually treated as a one-bit extension of the AC register.

The Overflow flag (V) is a single-bit flag set by the ALU if an addition has generated a result that will not fit 16 bits. This is used for signed, two’s complement arithmetic: the The L Register performs the same for unsigned arithmetic.

The Zero Flag (Z) is set if and only if all the bits of the AC are zero. It is commonly used for decision making.

The Negative Flag (N) is set if and only if the most significant bit of the AC is zero. In signed arithmetic, this signifies the value is negative, but it may be used to simply test that bit of the accumulator. It is commonly used in decision making.

The Instruction Register (IR) is a 16-bit register containing the instruction being executed. From the point of view of the IBUS, it can only be written to, but its value directs the Control Unit’s behaviour directly.

The Address Register (AR) is a write-only 16-bit register that stores an address. When required, it writes this address onto the Address Bus to facilitate external read/write cycles. This address selects a memory location or device to access.

Memory addresses used for data are calculated by the Address Generation Logic (AGL), which is responsible for implementing addressing modes. The AGL can generate 16-bit addresses either close to the current instruction (by using the top bits of the PC), or near the beginning of memory (by zeroing the top bits).

The Reset unit is responsible for initialising the state of the processor on power on or cold reset. It establishes the initial value of the PC and other registers, and times reset signals sent to peripherals.

Finally, the MBU is a recent addition to the processor: it used to be an external peripheral but has become so useful it was absorbed by the processor proper. It extends the memory address space of the CFT from 16 bits to 21 bits by implementing a simple memory banking scheme.

B2.2. Processor States

The major states of the processor are very conventional, as are the transitions between them:

Reset. The initial state. This processor enters this state asynchronously when it is reset, or power is applied. It remains in this state for a set number of clock periods, according to the operation of the reset sequencer. During the reset state, slow units stabilise (especially useful just after power on, or after a brown out), and numerous registers in the computer are cleared to sane values.
Fetch. In this state, the processor performs a memory read to get the contents of the IR, which implicitly jumps to the appropriate microprogram, and to the Execute state. The Fetch state is entered at the end of the Reset state; at the end of the Stop state once the computer is no longer halted; at the end of the Interrupt state once the interrupt microprogram has executed; and at the end of the Execute state, when the microprogram signals its end — the Fetch-Execute loop forms the implicit Run state. In fact, Fetch and Execute are not explicitly signalled: Fetch is simply the first memory read cycle of a microprogram, and Execute is the remainder. The distinction is only useful in theory, although the two states are also displayed on the front panel lights.
Execute. In this state, the instruction retrieved in the Fetch state is executed. This state is only entered at the end of the Fetch state and is where all the processing is carried out. At the end of the Execute state, the processor usually re-enters the Fetch state to retrieve the next instruction, but may also enter the Interrupt state.
Interrupt. The Interrupt state is entered at the end of the Execute state if an interrupt has been previously been signalled and interrupts are unmasked. In this state, the processor saves certain registers and jumps to a hardwired location holding an interrupt service routine. The actual workings of interrupts are slightly more complicated than this, as outlined in sec-interrupts.
Stop. In this state, the processor’s microprogram counter is inhibited, freezing the processor. The clocks are still running, allowing peripherals that use them to operate. This state is entered while the computer is halted. The processor’s design is fully static, so it may stay in the Stopped state indefinitely.

Figure 8. CFT Processor Major States: Reset (R), Fetch (F), Execute (E), Stop (S), and Interrupt (I).

The Wait State is a transient, astable state. It may be entered at any time, though it has special meaning during memory or I/O cycles and is easier to generate during them. As long as the processor is in the wait state, the control unit protracts its current operation. Wait states are meant to be used with devices too slow to handle the processor’s read or write cycles. They allow most of the processor to operate at its top speed, slowing down only when communicating with such devices.

The processor spends nearly its entire runtime flipping between the Fetch and Execute states. This is known as Fetch-Execute Cycle and follows this algorithm:

IR ← mem[PC]: an instruction is read from the memory address contained in the PC.
PC ← PC + 1: the PC is incremented by one.
The instruction is decoded.
The instruction is executed.
Go to step 1.

These steps are implemented at the microcode level with a level of parallelism. The first two steps take two clock ticks (known as clock cycles): memory accesses take two cycles, at the end of which the PC is incremented. Instruction decoding happens during this time as well. Instruction execution begins with the third cycle and can last several cycles.

The fetch-execute cycle of the CFT is implemented in microcode. Almost every microprogram begins with two microcode instructions to fetch the next instruction from memory and increment the PC. The part of each microprogram up to and including the incrementing of the PC is conventionally called the fetch state, though the computer treats it like ordinary microcode processing. In fact, the only point where Fetch/Execute is displayed is on the front panel for the user’s convenience.

B2.3. Reset Circuitry

The reset circuitry generates appropriate reset signals as necessary, and prepares the processor for operation.

To do this, it uses a number of different sources for reset signals including the front panel and power supply’s health status (for brown-out detection). While a reset signal is held active, the processor is immediately halted and the reset sequence begins. During this sequence, the reset circuitry puts the value FFF0 on the IBUS, while the processor repeatedly writes this value to the PC and resets its other registers.

The repetition allows the clock generator and processor to stabilise during a cold start, but also generates a long reset pulse for any peripherals connected to the processor. Many peripherals need reset pulses many microseconds in length, and the reset sequence is calibrated to fit such peripherals.

At the end of the sequence, the reset signal becomes inactive, and the Control Unit enters the Fetch-Execute Cycle, starting execution at the reset address FFF0.

Schematic of the Reset Circuitry. — Figure 9. Schematic of the Reset circuitry.

B2.4. Clock Generator

The clock generator works using a 16.000 MHz crystal oscillator as a source. The clock's pulse train is fed into half of a 74HC253 multiplexer (IC1) that allows the clock to be stopped or stepped using external (front panel) active-high signals FPCLKEN and FPUSTEP according to the following truth table:

FPCLKEN	FPUSTEP	FASTCLK	Output	Notes
0	0	X	0	Output is FPUSTEP
0	1	X	1	Output is FPUSTEP
1	X	0	0	Output is FASTCLK
1	X	1	1	Output is FASTCLK

This is possibly the most idiotic way ever of implementing a 2:1 multiplexer. In early revisions of this board, the clock generator had three clock sources: a full-speed clock and two slow clocks generated by a 556 (a dual 555 timer IC). This has been moved to the front panel controller board, but the multiplexer somehow stayed behind even though it could have been replaced by something much simpler.

At any rate, when FPCLKEN is high, the processor clock is the 16 MHz clock from the crystal oscillator.

When FPCLKEN is low, the clock is driven by the front panel signal FPUSTEP. The front panel can strobe this at different rates to generate various slow clocks, or single microsteps, or (with the aid of an external state machine) run the clock to step instruction by instruction.

The selected clock (internal or external) clocks a 74HC193 counter (IC2) configured to count up. The counter's output is reset to zero on reset. On the rising edge of the clock, the output increments by one. The least significant two bits of this counter are in turn fed to both halves of a 74HC139 double 2-of-4 demultiplexer (IC3).

The two sets of multiplexer outputs form two different clock setups:

Four 75% duty cycle, 4 MHz clocks named CLK1, CLK2, CLK3 and CLK4, each with 90° phase difference.
Two 50% duty cycle, 4 MHz clocks named T12 and T34, with a 180° phase difference. Of these, only T34 is used in the processor.

The 75% duty cycle clocks provide rising and falling edges every 62.5 ns.

Figure 10. The four phases of a processor cycle. Please note that the microcode fetch and decode stages happen asynchronously, since microcode ROM access times are higher than 25% of the clock period. The DBUS is never accessed during the first half of the processor cycle, which allows other devices (such as a VDU or DRAM refresh circuitry) to access the bus.

Figure 11. Schematic of the Clock Generator.

B2.5. The Microcode Sequencer

This is a group of units that form the core of the Control Unit, and nearly all of the Control Unit. It includes:

The Microcode Counter (µPC)
The Microcode Store
The Read Unit decoder
The Write Unit decoder
The Skip and Branch logic
The Instruction Register (IR)

B2.5.1. The Microprogram Counter (µPC)

This is a four-bit counter that selects the microinstruction to execute. This limits each microprogram to 16 steps, which is enough for our purposes. Since the CFT doesn't have niceties like microprogram jumps, implementing a 4-bit counter is a simple job. Or is it? Well, it turns out there are some subtleties.

The counter is implemented around a 74HC161 presettable, synchronous 4-bit counter with twin enables and reset. And we use it to full advantage. Here's what it does:

While RSTHOLD is asserted, the counter resets to zero asynchronously. This is part of the processor's reset sequence, of course. We do this by connecting RSTHOLD to the counter's active-low CLR input.

Every operation other than resetting has to happen synchronously, at the end of the processor cycle. The IC needs a rising edge, so we use CLK4 as its clock.

At the end of an instruction, when END is asserted, the counter also resets to zero, but it has to do this synchronously. This is done by tying the counter's four preset inputs to ground, so the preset is always 0000 and asserting its active low LDinput. I know I wrote ‘when END is asserted’, but I lied again. There are two ways to end an instruction:

the control unit can assert END, or
a processor extension plugged into the Expansion Bus can assert ENDEXT, which is an open drain signal.

We combine the two by pulling up ENDEXT (we're using CMOS, after all, and No Floating Inputs is our mantra) and using a negative logic OR gate (i.e. an AND gate).

Back to the counter's requirements.

During a Wait State, when WS is asserted, the counter has to stop dead in its tracks and not increment. We do this by connecting the WS signal to the counter's ENP (CEP in some datasheets) count enable active-high input. When WS is asserted it's low, and the counter's ENP input is disabled.

We also don't want the microcode store churning out instructions while the computer is halted, so the counter must be stopped in the Halt state. To do this, HALT is connected to the counter's ENT enable input (CET in some datasheets).

Note that the two enable inputs do subtly different things, but only if carry out is used, and it isn't here. In the end, the counter's behaviour looks like this:

RSTHOLD	CLK4	END	ENDEXT	WS	HALT	Behaviour
0	X	X	X	X	X	Clear to 0000.
1	↑	0	X	X	X	Preset to 0000.
1	↑	1	0	X	X	Preset to 0000.
1	↑	1	Z	0	X	Do nothing.
1	↑	1	Z	X	0	Do nothing.
1	↑	1	Z	1	1	Count up.

B2.5.2. The Microcode Store

CFT Microcode is 24 bits wide and stored in three 8×512 KiB Flash devices. They're not ROMs per se so I can pull and reprogram them whenever I discover glaring errors in the microcode. Which I hope will be never, but I know better.

All three devices obviously receive the same inputs and output a 24-bit control vector. They are permanently selected by tying their CE inputs to ground, and wired to behave like ROMs by tying their WE inputs to Vcc. Their OEinputs are controlled by RESET and HALT through a NAND gate according to this truth table:

RESET	HALT	OE	Notes
1	1	0	Normal operation, ROMs enabled.
0	X	1	Reset asserted, tri-state microcode store.
X	0	1	Processor halted, tri-state microcode store.

The ROMs form a massive truth table with 15 inputs and 24 outputs. Here are the inputs:

14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

RSTHOLD	IRQS	FV	FL	IR11–15					SKIP	AINDEX	uPC

RSTHOLD:: active low. Indicates the processor is being reset.
IRQS:: active low. Indicates an interrupt has been seen and must be serviced.
FV:: the current value of the overflow flag.
FL:: the current value of the link flag.
IR11–15:: the five most significant bits of the IR, including the indirection field (least significant bit) and the op-code.
SKIP:: active low. Indicates a previously evaluated condition is true and a skip needs to be taken. (usually)
AINDEX:: active low. Indicates that the IR's operand field contains a value in the range 080–0FF.
Microprogram Counter (µPC):: all four bits of the microprogram counter.

The output of the microcode ROMs looks like this:

23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

END	WEN	R	IO	MEM	DEC	STPAC	STPDR	INCPC	CLI	STI	CLL	CPL	OPIF				WUNIT			RUNIT

RUNIT:: four bits. Read from unit. Selects a unit to read from. This enables the appropriate tri-state IBUS driver. The values of this field are discussed in The Read Unit Decoder.
WUNIT:: three bits. The specified unit reads from the IBUS. The values of this field are discussed in The Write Unit Decoder.
OPIF:: four bits. This field instructs the Skip and Branch Logic (SBL) to perform an operation, the result of which will be checked in the next processor cycle. The exact values of this field are discussed below.
CPL:: complements The L Register when low.
CLL:: clears The L Register when low.
STI:: sets I when low.
CLI:: clears I when low.
INCPC:: increments the PC when low.
STPDR:: steps the DR when low.
STPAC:: steps the PC when low.
DEC:: if low, STPDR and STPAC decrement the DR and AC respectively. Otherwise, they increment them.
MEM:: if low, request a memory cycle.
IO:: if low, request an I/O cycle.
R:: if low, request a read cycle from memory or I/O space.
WEN:: if low, request a write cycle to memory or I/O space.
END:: if low, ends execution of this microprogram.

The micro-instruction format includes both horizontal and vertical fields. Fields RUNIT, WUNIT and OPIF are vertical, and need to be decoded to derive individual control signals. This keeps the micro-instruction width down, but it also provides a safety interlock, so no two units are driving the IBUSsimultaneously.

512 KiB ROMs but only 15 inputs?

I had 4 MiB parts in stock. They're the same size as older PC ‘BIOS’ chips. They have 19 address lines, and the CFT's microcode only needs 15. That means I'm using one sixteenth of the space. Eventually, an idea hatched to allow using the remaining space to store different versions of microcode, and that culminated into the UCB. The UCB changes the Microcode Store a bit, and adds two new CFT instructions. It's ‘optional’, i.e. the one and only CFT in the world includes it but it can be disabled using solder bridges on PB0.

Schematic of the Micro-Program Counter and Microcode Store. — Figure 12. Schematic of the Micro-Program Counter (µPC, top left) and Microcode Store.

B2.5.3. The Read Unit Decoder

RUNIT	Signal	Meaning
0000		Nothing: the IBUS is tri-stated.
0001	R1	Reserved.
0010	RAGL	Read from the AGL.
0011	RPC	Read from the PC.
0100	RDR	Read from DR.
0101	RAC	Read from AC.
0110	R6	Reserved.
0111	R7	Reserved.
1000	N/A	ALU: read from the Adder sub-unit.
1001	N/A	ALU: read from the AND sub-unit.
1010	N/A	ALU: read from the OR sub-unit.
1011	N/A	ALU: read from the XOR sub-unit.
1100	N/A	ALU: read from the Roll sub-unit.
1101	N/A	ALU: read from the NOT sub-unit.
1110	N/A	ALU: read Constant 1.
1111	N/A	ALU: read Constant 2.

The read decoding happens in two parts using two 74HC138 3-to-8 decoders. The first one is enabled when RUNIT₃ is low; the other when RUNIT₃ is high. The high decoder lives near the ALU since it also implicitly decodes ALU operations. Both decoders are enabled when RESET is de-asserted. They're disabled when CLK1 is low, to allow the micro-instruction to be fetched. The fetch won't have finished when CLK1 goes high as the ROMs need 70ns, but any writes are delayed to allow the units to stabilise.

There's one further complication here: to allow the front panel to work, and also for future expansion and debugging, the control signals need to be tri-stated when HALT or RESET. The 74HC138 can't tri-state its outputs, so we connect it to a pair of 74HC541 tri-state buffers controlled by UCE. UCE is asserted when neither HALT or RESET are asserted. To avoid floating inputs, the outputs are pulled high. To allow future expansion through an inevitable rat's nest of Kynar wire, even the unused signals are connected to the buffers and pulled high. Even better (and this might avoid the rat's nest), R1, R6, and R7 are routed to the Expansion Bus.

I lie. I lie so much!

Did I say two 74HC138 decoders decoding OPIF completely? I lied. The one responsible for ALU operations is a special case, since the ALU uses ROMs to do its magic. OPIF is never decoded, it's just fed to the ALU ROMs which select the operation implicitly. The 74HC138 decoder simply selects between the binary ALU operations (OPIF values 1000 to 1011) and unary ALU operations (OPIF values 1100 to 1111) since each group resides on separate ROMs.

B2.5.4. The Write Unit Decoder

The write unit decoder takes the vertical microcode field WUNIT and decodes it into six active-low signals controlling what unit (if any) reads its code from the IBUS. The decoding is as follows:

WUNIT	Signal	Meaning
000		Nothing.
001	W1	Reserved.
010	WAR	Write to the AR.
011	WPC	Write to the PC.
100	WIR	Write to the IR.
101	WDR	Write to the DR.
110	WAC	Write to the AC.
111	WALU	Write to the ALU's B Port.

The decoding is as simple as you'd expect, using a single 74HC138 3-to-8 decoder. The decoder is enabled when RESET is high (de-asserted) and when CLK3 is low, to allow the microinstruction to be fetched and stabilised.

Figure 13. Schematic of the Read and Write Unit Decoders.

B2.5.5. Skip and Branch Logic

This unit allows microcode to evaluate various conditions and jump to a different microprogram address depending on the result. If the OPIF field in the micro-instruction is non-zero, the specified condition will be evaluated during this machine cycle. The result of the evaluation is registered and made available in the next cycle.

The result determines the value of the SKIP field in the microcode address. This seems limited, but actually allows very complex behaviour. Each microcode step can have a different condition checked and independently acted upon. We take full advantage of this for the OP1 and OP2instructions, but not just.

The same unit implements both microcode conditionals and machine code skips. To implement a skip instruction, the relevant flag is tested, and then, if the condition is true, the next microinstruction increments the PC.

Here is a table of all the things that can be tested. Some of these are obviously very tightly connected to the CFT instruction set, and there are two spare condition codes for future expansion:

OPIF	Meaning
0000	No operation. Always returns false.
0001	Test bit 3 of the IR.
0010	Test bit 4 of the IR.
0011	Test bit 5 of the IR.
0100	Test bit 6 of the IR.
0101	Test bit 7 of the IR.
0110	Test bit 8 of the IR.
0111	Test bit 9 of the IR.
1000	No operation. Reserved for expansion.
1001	No operation. Reserved for expansion.
1010	Test V.
1011	Test The L Register.
1100	Test Z.
1101	Test N.
1110	Check if the least significant 3 bits of IR are non-zero.
1111	Check if the least significant 4 bits of IR are non-zero.

Most of these conditionals are implemented differently.

Figure 14. Schematic of the Skip and Branch Logic.

B2.5.5.1. Testing Bits of the IR

These are the easiest checks of all: the output of the SBL is the value of that particular bit of the IR.

Watch me mess up again!

The output of the SBL was named SKIP, and it's active low. And yet, the output of the bit testing section of the SBL is not reversed, and we want to act when a bit is 1. The semantics are reversed! Once I nearly shot myself in the foot, I defined separate mcasm macros in the microcode to separate skipping conditions and acting on bits. Three sets of semantics, both active low and active high, and they're all multiplexed onto the same active low signal. It's nearly painful.

B2.5.5.2. Testing Flags

Just like testing bits of the IR, this is very simple: the value of the active high flag is the value of the (active low) SKIP signal. The value isn't inverted, so if the flag is high, SKIPwill be high.

B2.5.5.3. Checking for Roll Instructions

A roll instruction is an OP2 instruction with at least one of the three least significant bits set. This is done using 74HC32 two-input OR gates to get the disjuction of IR_0–2. The (active high, positive logic) result is fed to SKIP, so once again the semantics are reversed.

And Again!

Seriously, it's a train wreck at this point. This part of the design came about before I started using surface-mount single gates in tiny little packages, so it uses a big four-gate chip. And it only needs two of the four gates. Bummer. I could have used a single three-input OR gate and saved board estate and some propagation delay! Oh well. In my defense, I thought I'd never be able to solder those tiny little packages by hand but it turned out to be okay.

B2.5.5.4. The Instruction Set Skip Logic

This one is complicated. It evaluates a skip based on nine inputs using seven gates in four layers. Informally,

When IR₀ is set, the FV is tested.
When IR₁ is set, the FL is tested.
When IR₂ is set, the FZERO is tested.
When IR₃ is set, the FNEG is tested.
When IR₄ is set, the test result is negated.

At least one bit in IR_0–4 must be set for any meaningful checks to take place.

In formal logic, the expression calculated is SKIP = (IR₀FV + IR₁FL + IR₂FZERO + IR₃FNEG) ⊕ IR₄, where ⊕ is the Exclusive OR function.

With IR₄ clear, you'll notice we can select 16 combinations of flags to test. The test will be true if any of the tested flags are set. This matches perfectly the behaviour of Group 1 of the skip minor operations in the OP2 instruction. The case where IR_0–4 is 0000 is special: no flags are tested, and no skip is ever performed. This is the ‘skip never’ operation, part of the NOP instruction.

With IR₄ set, the test result is negated. We still select one of 16 combinations of flags to test. This time we test for clear flags, though. And by DeMorgan's Law, the test will succeed if all tests succeed. The case where IR_0–4 is 0000 is again special: no flags are tested, the intermediate result is false, and inverted by the XOR gate to true. This becomes the SKIP instruction, ‘skip always’.

This is exactly the way the PDP-8 evaluated skips, except the PDP-8 had one less flag to check (there was no FV). It's as elegant as it is powerful: we can test any combination of minterms or maxterms of the four flags.

The gate tree has a combined worst-case propagation delay of 35 ns, according to the schematics.

B2.5.5.5. Putting It All Together

The value of SKIP is selected from these 16 sources using two 74HC251 1-of-8 tri-state multiplexers, each handling eight values. The low multiplexer's OE (output enable) pin is connected to OPIF₃, so it drives for the first eight values of OPIF. The high multiplexer's OE pin is connected to OPIF₃ inverted by a single NOT gate, so it drives for the last eight values of OPIF.

To avoid having both chips drive simultaneously (which can happen because of the propagation delay of the NOT gate), we pull up their positive outputs and feed them to an AND gate. The AND gate has three inputs. The third input is the SKIPEXT active-low signal from the Expansion Bus. This is open drain and also pulled up, so expansion cards can request a skip. (this is used for the IOTinstruction)

One of two multiplexers is always enabled. When OPIF is 0000, the first source of the first multiplexer (tied to ground) drives the output, and SKIP is 0.

The output of this final AND gate is fed to a 74HC74 flip-flop, which stores it on the positive edge of CLK4. The flip-flop's PRE is connected to RESET, which resets this bit to zero. The non-inverting output of the flip-flop is the SKIP signal and forms part of the microcode address.

And the train wreck finally happens

Mixing the semantics was a bad idea, and now I have a bug in my hands: SKIPEXT will never be obeyed because there's no OPIF combination that drives a guaranteed 1—so the output of the AND gate can't be guaranteed to be equal to SKIPEXT. One of the two spares will have to be tied high, and the microcode will have to be modified to use that value while checking for SKIPEXT. Seeing as no hardware takes advantage of SKIPEXT yet (or even the IOT instruction), this fix can wait.

B2.5.6. The Instruction Register

The Instruction Register is as simple as they come! It's just two 8-bit 74HC573 latches, with their outputs permanently enabled, and their Latch Enable (LE) pins controlled by WIR, inverted via a single NOT gate. The output is made available to the rest of the control unit. A pair of 74HC541 buffers, their outputs also permanently enabled, send the value of the IR to the front panel.

Figure 15. Schematic of the Instruction Register.

B2.6. The Address Generation Logic

For something that implements half the addressing modes on the CFT, the AGL is one of the simplest parts of the processor. It's responsible for Zero Page (or Register) addressing. It's a simple six-bit multiplexer that selects between PC10–15 and 000000 based on the value of IR₁₀, which is the R Field (R) field in the IR:

When IR₁₀ is 0, the AGL outputs the value of PC10–15 at the start of this instruction's fetch cycle. (this is an important detail, keep reading).
When IR[10] is 1, the AGL outputs 000000.

The six-bit output of the AGL forms the six most-significant bits of the AR, which is used to address memory and I/O space. The least significant 10 bits of the AR always come from the instruction operand field, IR0–9.

When IR₁₀ is 0, page-relative addressing is selected. In this mode, the address is formed based on the current page being executed, and that is the most significant six bits of the PC. However, once an instruction starts to execute, the PC already points to the next instruction to be executed. So, we register these six bits on the rising edge of END.

The register used is a 74HC574 8-bit D flip-flop with output enable. The output enable pin (OE) is driven directly by IR₁₀. So, when IR₁₀ is low (page-relative addressing), the last value of PC10–15 is output. When it's low (Zero Page addressing), the output is tri-stated. We pull those lines low, and that implements the desired output.

The six bit output from the 74HC574 is combined with IR0–9 to form a 16-bit value. This bus is fed to two 74HC541 tri-state buffers. When RAGL is low, the IBUS is driven with this value.

Figure 16. Schematic of the Address Generation Logic.

B2.7. The Auto-Index Logic

The Auto-Index Logic asserts the signal AINDEX when:

Reset is high (de-asserted).
The IR's operand field contains a value in the range 080–0FF.
The IR's R is set (Page Zero addresses selected)
The IR's I is set, selecting Indirect Addressing.

The truth-table makes it look complicated, but it's here for completeness.

IBUS	IBUS (R)	IBUS (I)	WIR	RESET	AINDEX
XXX	X	X	X	0	1
XXX	0	X	X	1	1
XXX	X	0	X	1	1
001	1	1	↓	1	0

In short, the IBUS must match the binary pattern XXXX'1100'1XXX'XXXX for AINDEX to be asserted.

Why read from the IBUS and not the IR directly? Glad you didn't ask. It's an implementational limitation. The natural place for the Auto-Index Logic is right by the IR, since it's really a part of the Control Unit. But the thing just wouldn't fit there! I had to move it to the register board which had the space for it.

The control bus doesn't carry the IR, but it does carry the IBUS and WIR signal. The IR latches the value of the IBUS when WIR is low, and this is exactly what I do: I abuse a 74HC138 decoder to compare IBUS to XXXX XXX1 0011, i.e. operand in the range 080–07FF, R set, I set. When the condition is true, the decoder asserts an active low signal which is fed to the D input of half of a 74HC74 D-flip-flop. The flip-flop clocks on the positive edge of WIR, which is connected to its CLK input. The flip-flop's PRE input is connected to RESET. We use the Q non-inverting output to drive the AINDEX signal.

Unnecessarily Limited?

Since I involve both I in decoding auto-index mode, only half of the microprograms (those with Indirect mode asserted) may use auto-index registers. It's not up to the microcode to decide, and that's what leaves us with up to five addressing modes per instruction (rather than eight): four combinations of I and Rplus auto-index mode when both I and R are set and the right addresses are used.

B2.8. Address Register and Address Bus Drive Logic

The AR is a simple 16-bit write-only register responsible for driving the Address Bus. Shocking, I know. The register receives its value from the IBUS, storing it in a pair of 74HC574 8-bit flip-flops. The flip-flops are clocked on the positive edge of WAR. The output is routed internally to other units, and also to a pair of 74HC541 8-bit drivers. These drivers are activated when either MEM or IO are asserted, and their output drives the Address Bus directly. Simple, really.

Schematic of the Address Register, Auto-index Logic and I/O Device Decoder. — Figure 17. Schematic of the Address Register (top), Auto-index Logic (left, below the AR) and I/O Device Decoder (bottom right).

B2.9. The Major Registers

These three 16-bit registers do everything in the CFT. They're very similar, but each has some ancillary logic pertaining to their specific purposes that makes them unique.

All three major registers are implemented as banks of four 74HC193 counters. These work just like flip-flops, except they also have built-in fast increment and decrement functions with carry look-ahead to reduce the effect of cascading carry outs to carry ins. The benefits of built-in incrementation and decrementation are huge: we can step registers without involving the ALU, and while the ALU is doing something else. Call it a primitive sort of pipelining. Most processors have this for their special registers. We just extend it to all of them.

The outputs of the counters go to two sets of 74HC541 buffers to allow tri-stating (and for better drive characteristics). One set places the register on the IBUS, the other (permanently enabled) sends the register's value to the front panel.

All major registers are cleared when RESET is active. The counters use an active high reset signal (weird), so we invert RESET using a single NOT gate.

B2.9.1. The Program Counter (PC)

The PC is exactly as described above: it's implemented as a bank of four 74HC193 counters. The PC can only ever be incremented, so only the carry out chain is wired, and decrementing is permanently deasserted. The counters load their values when WPC is low. They step up with INCPC is low.

B2.9.2. The Data Register (DR)

The DR follows the major register pattern outlined above. It loads its value when WDR is low. It's wired for both incrementation and decrementation, with carry out and borrow chains. Half of a 74HC139 decoder (the other half of the one used on the DR—waste not, want not) decodes STPDR and DEC from the Control Unit as follows:

STPDR	DEC	DRUP	DRDOWN	What this does
1	X	1	1	Nothing.
0	1	0	1	Increment.
0	0	1	0	Decrement.

B2.9.3. The Accumulator (AC)

The AC looks like a DR on steroids. It loads its value when WAC is low. Like the DR, it's wired for both incrementation and decrementation, with carry out and borrow chains. Half of a 74HC139decoder decodes STPAC and DEC from the Control Unit as follows:

STPAC	DEC	ACUP	ACDOWN	What this does
1	X	1	1	Nothing.
0	1	0	1	Increment.
0	0	1	0	Decrement.

The similarities end here. There are three extra parts to the AC:

The Negative Flag. This is permanently wired to AC₁₅ without any logic at all in between.
The Zero flag. This is implemented using a pair of cascaded 74HC688 8-bit comparators. Each is wired to compare eight bits of the AC against 00000000. The most significant comparator cascades into the least significant one using the G cascade input. The active-low ‘equal’ output of the least significant comparator is inverted and made available as FZERO.
The borrow and carry out active low signals from the most significant AC counter are ANDed together (OR in negative logic). The result is ACCPL, and it instructs the The L Register to toggle itself when the AC wraps around.

B2.10. Data Bus Driver

The Data Bus Driver is an unexpectedly complex part of the processor because it has to handle:

Reading from the Data Bus.
Writing to the Data Bus.
Memory Space.
I/O Space.
Wait States.

At its core is a pair of 74HC245 bus transceivers acting as the gateway between the IBUS and DBUS. Their ‘A’ side is the IBUS. Their ‘B’ side is the DBUS. The drive direction DIR is set using R, since low means B to A, and high means A to B.

The chips' enable signal is is the negative-logic disjunction of MEM, IO, and WAITING, accomplished by using a single-gate three-input AND gate. This is how this relates to the processor's state:

MEM	IO	R (DIR)	WEN	WAITING	Bus Transaction	Transceiver Behaviour
1	1	1	1	1	None.	Buses isolated.
0	1	0	1	1	Memory read.	DBUS to IBUS.
0	1	1	0	1	Memory write.	IBUS to DBUS.
1	0	0	1	1	I/O read.	DBUS to IBUS.
1	0	1	0	1	I/O write.	IBUS to DBUS.
X	X	0	1	0	Read wait state.	DBUS to IBUS.
X	X	1	0	0	Write wait state.	IBUS to DBUS.

The microcode will drive the vector MEM, IO, R, WEN synchronously, and only the first five combinations listed above are possible: idle, memory read, memory write, I/O read, and I/O write. The remaining eleven combinations aren't technically impossible, but I go to great lengths to ensure that, e.g. R and WEN are never asserted simultaneously.

Also, involving the WAITING signal in the bus driver at this point is paranoia: when in the Wait State, the µPC stops counting, and the microinstruction (MEM, IO, R, and WEN) stays constant.

B2.10.1. Wait States

The wait state is implemented using half of a 74HC74 dual flip-flop chip. The wait state resets to zero when RESET is asserted. It is asynchronously set whenever T34 and WS are both low. We get this using a single OR gate, which behaves like an AND gate in negative logic. So: when WS is asserted during the T34 phase of the processor cycle, a wait state is set. The WS signal may be asserted at any point during the first half of the processor cycle, but it should stay asserted at least until CLK2 goes high, otherwise the request will be ignored. Peripherals should deassert WS when they're done.

This makes sense: a peripheral that is potentially (but not necessarily) slow can assert a wait state the moment it's addressed. If whatever slow process it performs is done before T34 is entered, then there really is no wait state! The peripheral has performed its task within a single processor cycle and life goes on as usual.

However, if a wait state is set, it lasts at least until the next rising edge of CLK2, at which point the flip-flop will clock in a hardwired zero. (its D input is tied to ground).

The '74 is wired in such a way that the clock is ignored while PRE is asserted. So the WScondition can be lifted at any point after the first T34, and the flip-flop will remember the value. Then, at the first rising edge of CLK2 after the WS request has been deasserted, the flip-flop clocks in a zero and the wait state is lifted.

The flip-flop outputs both an active-high WAITING and an active-low WAITING. The former is used in generating appropriate write pulses. The latter is used to keep the DBUStransceivers active during the wait state.

B2.10.2. Write Strobe Generation

Brace yourself for some complexity. The processor's registers use latching, so they have valid data by the end of a read cycle—provided the device has had enough set-up time. (and if not, there are Wait States).

Writing is a different matter. We need to assert the write signal W after enough time has passed for the processor to set-up the data and connect the IBUS to the DBUS, and then deassert it before we stop driving the bus because devices will be using the positive edge of W to clock in data and it's not nice to stop driving the bus right just as we're telling them to read it. So the Data Bus driving logic contains a generator for a narrow write strobe.

RESET	WEN	WAITING	CLK4	W	Notes
0	X	X	X	1	Resetting. Inhibit write strobes.
1	1	X	X	-	No write transaction, no output change. (read on).
1	X	1	X	-	Wait state, no output change. (read on).
1	0	0	↓	0	Write cycle. Start generating write pulse.

This is accomplished using two 74HC74 flip-flops working in tandem, on the same inputs.

Their CLK input is driven by CLK4 NOR WEN NOR WAITING. The output of the NOR gate goes high when CLK4 goes low at the start of the T4 period, provided WEN is asserted and WAITING is deasserted. Talk about mixing things up with a single gate! When the clock goes high, the flip-flops' output goes high (their d line is tied to Vcc).

Q, the inverted output of the first flip-flop goes low at this point. It then runs through zero, one, two, or three pairs of 74AC04 NOT gates. Each of these pairs delays the signal by between 3 ns and 14 ns (typically 8 ns). The exact number of gates can be selected using four jumpers.

Once out of this delay line, this signal goes to both flip-flops' CLR input, resetting the flip-flops to zero and ending the strobe, and also deasserting CLR. RESET also cancels ongoing strobes. This is done using a three-input AND gate acting as a negative logic OR gate. And adding its own propagation delay to the proceedings!

The second flip-flop's Q output is fed into a single tri-state buffer, which is controlled by HALT. Its output is in turn connected directly to W. The buffer ensures W is tri-stated when the computer is halted. The Microcode Store does this for all other bus transaction signals including WEN, and W needs to follow suit.

Why two flip-flops in tandem?

More paranoia, this time for the sake of propagation delay and signal quality. The first flip-flop is solely responsible for ending the write strobe. The second flip-flop outputs W.

Here Be Bugs—Maybe

Well, crappy signal quality, at least. The last time I tested this circuit, the quality of the generated pulses was pretty bad, despite bus terminators. Obviously lots of fine-tuning is necessary!

Schematic of the Data Bus Driver. — Figure 21. Schematic of the Wait State (top left), Data Bus Driver (middle) and Write Strobe Generator (right). Never has something so simple looked more complicated.

B2.11. The Interrupt State Machine

After the Data Bus discussion, I'm sure you're up for something lighter. Tough. Instead, here's another complicated bastard of a unit with a side of theory dry enough to have sietches and guys worshipping Shai-Hulud.

Interrupts allow the normal flow of computation to be temporarily stopped in order to process out-of-order events like user input, or slow devices completing previously initiated transactions. They are crucial to building a computer with practical uses, but can be tricky to implement due to their asynchronous nature. Race conditions and metastability can cause trouble.

The CFT interrupt subsystem is a five-state finite-state machine. On RESET, the processor starts with interrupts masked, and will ignore any interrupt requests. Interrupts may be enabled at any time using the STI instruction. Once enabled, on receipt of an interrupt, the processor completes the currently running instruction, then executes the interrupt microprogram. The microprogram disables further interrupts, saves the current value of the PC and jumps to FFF8 the location of the Interrupt Service Routine (ISR). The ISR handles the interrupt and usually re-enables interrupt requests just before returning to normal program execution.

Since the CFT lacks a hardware stack, nested interrupts are not an option. So, enabling interrupts and exiting the ISR must happen atomically. Otherwise, an incoming interrupt may jump back to the ISR immediately after the STI instruction and before the ISR returns. To avoid this, STIwaits until the next PC-changing instruction (loading, not incrementing) before enabling interrupts.

The interrupt state machine has five states:

Interrupt state transitions — Figure 22. Interrupt state transition diagram.

Interrupts Disabled.: This is the initial state on reset. In this state, interrupts are ignored. When the STI instruction runs, it strobes the STI signal. The rising edge of STI arms the interrupt enable and moves the state machine to the Interrupt Enable Armed state.
Interrupt Enable Armed.: Interrupts are still masked in this state. When the PC is next written to using a JMP, JSR, RET, RTI or RTT instruction (or macros involving any of them), and on the rising edge of the WPC strobe, interrupts become fully enabled.
Interrupts Enabled.: In this state, interrupts will be received by the processor. If a low level is seen on the IRQ line, the state machine moves to the Interrupt Armed state. If a CLI instruction is executed, a low level of the CLI signal will reset the state to Interrupts Disabled and subsequent interrupts will be ignored.
Interrupt Armed.: The state machine remains in this state until the currently executing instruction is finished. This is signalled by the rising edge of END from the Control Unit, at which point IRQS goes low to acknowledge the interrupt, the Control Unit executes the interrupt microprogram, and the state machine moves to the next state.
Interrupt Microprogram.: The state machine remains here while the interrupt microprogram is being sequenced. In the course of this sequence, interrupts will be disabled via a strobe of CLI. This resets the state machine to the Interrupts Disabled state, and the process cycles.

Figure 23. Interrupt waveforms. 1. The STIinstruction strobes STI preparing to enable interrupts. 2. When next the PC is loaded with a value, interrupts are enabled. The rising edge of WPC strobes STI-DELAY disarming the interrupt enable and asserting the FINT flag to allow interrupts. 3. With FINT low, an incoming IRQ signals an interrupt request. FIRQ goes high to arm the interrupt. 4. At the end of the current microprogram (rising edge of END), IRQS is asserted, both acknowledging the interrupt and making the Control Unit execute the Interrupt microprogram. 5. During the microprogram, CLI is asserted. Interrupts are disabled and cleared (FIRQ goes low). 6. At the end of the microprogram, the state machine cycles back to masked interrupts. Control is now in the ISR.

This lovely mess is implemented using five flip-flops. For some reason I can't remember, two of them are 74HC112. The rest are 74HC74.

Figure 24. Schematic of the Interrupt State Machine. More flip-flops than Summer 2016. (except, not really)

B2.11.1. Interrupts Disabled and STI Armed

The first two flip-flops store the Interrupts Disabled state (when clear) and the STI Armed state. They're connected as a chain to solve metastability issues: the first flip-flop is set asynchronously when STI is asserted. To keep it in the same clock domain as the second flip-flop, it's clocked on the positive edge of WPC, with its D line tied to ground, so it generates pulses that start with STI assertions and end at the end of WPCassertions.

The second flip-flop's D is fed from the non-inverted Q output of the first one, so it's set on the rising WPC edge after an assertion of STI. All this is done for paranoia's sake. In practice, the microcode is structured such that STI is asserted a few clock cycles before WPC.

The inverted output of this flip-flop is fed back to its CLR input to clear the signal, and fed forward to the next flip-flop.

B2.11.2. Interrupts Enabled

The previous flip-flop's Q signal asynchronously sets the Interrupts Enabled flip-flop. The flip-flop is only cleared by either RESET or CLI, using an AND gate acting as a negative-logic OR gate. The Q signal of this flip-flop drives FINT. (Q drives an LED on the processor board so the state of the interrupt state machine can be monitored)

B2.11.3. Interrupt Request Armed

If IRQ is asserted when FINT is low (asserted), the fourth flip-flop in the chain is asynchronously set. Like the previous flip-flop, the only way to clear this one is through RESETor CLI. Its positive Q output drives the next flip-flop.

B2.11.4. Interrupt Seen

The final flip-flop in the state machine uses the output of the previous one as its Dinput. This is clocked on the positive edge of END and reset on RESET. The positive output drives an LED on the processor board. The negative output drives the IRQS signal, which is an input to the Microcode Store and also output on the Expansion Bus to act as an ‘interrupt request acknowledged’ signal.

Metastability Alert!

This is a potential source of metastability. Fortunately, the IRQ controller board can mitigate this, and if not, there's a spare flip-flop on the processor board that I can use to reduce the chance of metastability.

B2.12. The Arithmetic and Logic Unit (ALU)

Not you're all warmed up and your eyes have glazed over, it's time to talk about the ALU. The CFT's ALU takes up around a quarter of the entire processor, and is still nearly useless. It can perform these operations:

Operations on two numbers (binary):
- Addition of two 16-bit numbers with carry out using Link Register (L).
- Bitwise AND of two 16-bit numbers.
- Bitwise OR of two 16-bit numbers.
- Bitwise XOR of two 16-bit numbers.
Operations on one number (unary):
- One bit left roll of the <L,AC> 17-bit vector.
- Four bit left roll of the <L,AC> 17-bit vector.
- One bit right roll of the <L,AC> 17-bit vector.
- Four bit right roll of the <L,AC> 17-bit vector.
- Bitwise NOT of a 16-bit number.
‘Operations’ on no number (emitting constants):
- The constant 0, which is also the address of RETV.
- The constant 1, also RTTV. Used mostly for the latter.
- The constant 2, also RTIV. Used mostly for the latter.
- The constant FFF8, the address of the Interrupt Service Routine.

There is no subtraction or multiplication. Don't even think about division. There are no arbitrary rolls and no shifts at all. Feel limited? Remember there are ARM architectures in current use in in the 21st century that have no division. None of the famous microprocessors in the 1980s had hardware multiply. And the unexpanded PDP-8 had even less than the CFT. (addition, bitwise AND, and single and 2-bit rolls)

Even with this little, the CFT's ALU needs six fully used 512K×8 ROMs plus assorted support ICs and occupies an entire processor board. And if I didn't use the ROMs, a more purist approach would use up six times the board real estate and run at a tenth of the speed.

Such as it is, the ALU consists of the following parts:

Two ports for its two operands. In standard ALU parlance, these are named ports A and B. They're both registered to avoid glitches.
Operation decoders, as not everything is done by the ROM tables.
Buffers to tri-state the ALU's output.
Binary operation tables on three ROMs.
Unary and constant tables on another three ROMs.
Logic to update the L and V.

B2.12.1. ALU Port A and B Registers

The ALU always performs operations between the AC and the current value on the IBUS. The ALU's result is obviously put on the IBUS, and from there goes to the AC. To protect against glitches and instability, we use two banks of two 74HC574 flip-flops to hold the values of the AC and IBUS while the ROMs do their magic.

Port A is connected to the AC. It's permanently enabled, and clocks in a value on the rising edge of CLK4, at the very beginning of a processor cycle. It outputs 16 lines called A that are local to the ALU.

Port B is connected to the IBUS. It's also permanently enabled, and reads a value on the rising edge of WALU from the Microcode Store. It outputs 16 lines called B that are also local to the ALU.

Both A and B are used as direct input to the ALU ROMs in a fairly complex way I'll fail to explain soon.

B2.12.2. Unary/Binary ROM select

Since the ALU has two parts, we need to somehow divine which part will be activated for a particular microinstruction. This is done by using a 74HC138 for its intended purpose (for once). The decoder is enabled when RESET is deasserted (high) and T34 is asserted (low), so that no ALU ROMs are selected during the first part of the processor cycle. This gives them 125 ns to provide an answer, and I've used 70 ns Flash devices, so this is feasible if I don't mess up the selection and decoding logic!

Recall from the discussion on RUNIT that the microcode instruction has a 4-bit vector to select what device to read from, and the upper eight options all deal with the ALU:

RUNIT	Signal	Meaning
0000		Nothing: the IBUS is tri-stated.
0001	R1	Reserved.
0010	RAGL	Read from the AGL.
0011	RPC	Read from the PC.
0100	RDR	Read from DR.
0101	RAC	Read from AC.
0110	R6	Reserved.
0111	R7	Reserved.
1000	N/A	ALU: read from the Adder sub-unit.
1001	N/A	ALU: read from the AND sub-unit.
1010	N/A	ALU: read from the OR sub-unit.
1011	N/A	ALU: read from the XOR sub-unit.
1100	N/A	ALU: read from the Roll sub-unit.
1101	N/A	ALU: read from the NOT sub-unit.
1110	N/A	ALU: read Constant 1.
1111	N/A	ALU: read Constant 2.

Observe that all values of RUNIT with the most significant bit (RUNIT₃) set address the ALU. Also, note that the binary operations all have RUNIT₂ low, while the unaries and constants have RUNIT₂ high. This forms the basis of our decoding:

RESET	RSTHOLD	T34	RUNIT₃	RUNIT₂	ROM Selected
0	X	X	X	X	None while the processor is resetting.
1	0	X	X	X	None while the processor is resetting.
1	1	1	X	X	None during first half of processor cycle.
1	1	0	0	X	None, the ALU isn't being addressed at all.
1	1	0	1	0	Binary ROM selected.
1	1	0	1	1	Unary/Constant ROM selected.

To get this result, RESET drives the decoder's active-high enable, and T34 drives one of its active-low enables. The other active-low enable is tied to ground and permanently enabled. This is done because these are the two signals we're in no hurry to react to (the '138 decodes faster than it enables or disables). Then, the vector <RUNIT_2–3,RSTHOLD> (with RSTHOLD as the most significant bit) is passed to the decoder's three-bit input. The seventh (110) output is connected to the Chip Enable (CE) inputs of the binary ROMs.

The 138 is used because it's convenient, and because it guarantees at most one output will be asserted. This avoids glitches where both binary and unary parts of the ALU drive the IBUSsimultaneously and release the magic smoke in the chips.

B2.12.3. Unary Operation Decoder

Unary operations and constants are merged into three bits plus IRQS:

IRQS	UOP	Operation
X	000	RBL, roll one bit left.
X	001	RBR, roll one bit right.
X	010	RNL, roll four bits left.
X	011	RNR, roll four bits right.
X	10X	NOT, two copies due to decoding issues.
1	110	Constant 0000
1	111	Constant 0001
0	110	Constant 0002
0	111	Constant FFF8

There two things to note here.

First and foremost, there are two identical bitwise NOT operations! This happens because of decoding issues in generating UOP out of the microinstruction and IR.

Second and hindmost, the constant store works differently when the microcode is interpreting the microprogram that jumps to the ISR: the ALU helps it by emitting constants useful in interrupt handling: the RTIV vector and the address of the ISR.

Now for the gory bits. (not really, these are electronics. There's no gore)

Somehow, we need to fit four binary operations, five unary operations, and four constants (that makes thirteen) into three bits (that makes eight). This can only happen with hackery and ad-hockery. Sorry! Not everything can be elegant.

The choice was made to shave bits off the unary part of the ALU. So, the unary operation decoder takes into account:

IR₀ and IR₂—these decode the roll amount and direction within OP1. Note that this means we can't select an arbitrary roll from microcode, since the ALU decoders are hardwired! All we do is let the ALU know we want it to calculate a roll and the ALU knows what to do.
RUNIT₀ and RUNIT₁. These decode the lower two bits of a unary operation when RUNIT is 11XX.
IRQS. This selects the special constants for use when handling an interrupt request. Again, the ALU can only emit the constants 0000 and 0001 in normal operation, and only the 0002 and fff8 when the ‘jump to interrupt handler’ microprogram is running.

IR_0–2	RUNIT_0–1	Output UOP	Operation
0X0	00	000	RBL, roll one bit left.
0X1	00	001	RBR, roll one bit right.
1X0	00	010	RNL, roll four bits left.
1X1	00	011	RNL, roll four bits right.
XXX	01	100	NOT. Bitwise negation.
XXX	10	101	Constant 1.
XXX	11	111	Constant 2.

This is accomplished using a 74HC253 as a dual 1-of-2 multiplexer. When RUNIT₀ ∧ RUNIT₁ is zero, IR₀ is routed to UOP and IR₂ is routed to UOP. This detects and selects roll operations.

When RUNIT₀ ∧ RUNIT₁ is non-zero, RUNIT_0–1 is routed to UOP. In either case, UOP is RUNIT₀ ∧ RUNIT₁.

Note that this doesn't take into account whether the Microcode Store has requested access to the ALU! The ALU will keep decoding signals and preparing to emit its output, but the chips won't be selected and the output will be tri-stated. I do this because I want the decoding to happen in parallel, to speed up access times. The upper two bits of RUNIT are being decoded in parallel with the lower two, with no inter-dependencies.

So much voodoo! Why so much voodoo?

Well, this was only one possible way of keeping the microcode instruction width down to 24 bits, while also making the ALU fit in its ROMs. There are many other ways, but this one worked. These design decisions make the CFT less of a generic microprogrammed machine, but it was never meant to be generic! You want generic? Go buy yourself a PERQ. And then tell me where you find it, I want one too.

Schematic of the ALU Registers and Decoders. — Figure 25. Schematic of the ALU Port A and B registers (left), unary operator decoder (centre left), binary/unary ROM select (centre right), and Y (binary operator output) port buffers (right).

B2.12.4. Unary ROMs

To recap, the unary COMs work as 16-bit function tables for the following functions:

Operations on one number (unary):
- One bit left roll of the <L,AC> 17-bit vector.
- Four bit left roll of the <L,AC> 17-bit vector.
- One bit right roll of the <L,AC> 17-bit vector.
- Four bit right roll of the <L,AC> 17-bit vector.
- Bitwise NOT of a 16-bit number.
‘Operations’ on no number (emitting constants):
- The constant 0, which is also the address of RETV.
- The constant 1, also RTTV. Used mostly for the latter.
- The constant 2, also RTIV. Used mostly for the latter.
- The constant fff8, the address of the Interrupt Service Routine.

The latter group is really simple. Constants are just that, so they don't depend on any register input.

Of the former group, NOT is also very easy to implement.

That leaves the rolls. Yes, they're going to give us trouble again. There are three issues with rolls:

Their range is 17 bits wide, unlike other unary operations. Their value depends on both L and AC.
Their domain is 17 bits wide, just like addition. Their output sets both L and IBUS.
There are relatively complex interrelationships between the bits.

For anyone new to using ROMs as huge truth tables, here's a primer. Each of the table's inputs gets its own address pin. Each bit of output gets its own data pin. The trouble here is that we have:

21 bits of input: 16 bits from the A port, one bit each from FL and IRQS, and three bits of UOP.
17 bits of output.

Here's what the ideal input looks like:

20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

UOP			IRQS	L	A

And here's the ideal output:

16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

L	Y

There's no such ROM available, of course, so the trick is to split the problem up into smaller tables that can fit in one ROM each, and provide the same results. Doing this is an art form, but I'm a high-functional idiot who happens to have built an AI tool that solves just this problem: RSAR. If you give RSAR a huge truth table (or other dataset), it can tell you which inputs are necessary to get the outputs you want.

The methodology was simple: write a Python program to generate the large truth table. Then use ROMtools (yet another project available here) to build a huge 2M×17 ROM image. I then converter this image to a text dataset and fed it to RSAR. After some trial and error, I arrived at three groups of six outputs each, with an additional control signal for FL:

ROM	Outputs
0	Bits 0 to 5.
1	Bits 6 to 11.
2	Bits 12 to 15, L output (ROLL16), and control signal ISROLL.

Given these three sets of desired outputs, RSAR came up with three surprising partitionings of the original large table. This table shows all of the 17 inputs, and which address bit is assigned to it per ROM. All three ROMs use exactly 18 bits—we even have an unused bit, and the last ROM has twounused bits. I shouldn't be surprised of course, optimising such complex tables as a post-processing step is exactly what RSAR does. But it's always impressive to see.

Signal	ROM 0	ROM 1	ROM 2
A₀	0		0
A₁	1		1
A₂	2	0	2
A₃	3	1	3
A₄	4	2
A₅	5	3
A₆	6	4
A₇	7	5
A₈	8	6	4
A₉	9	7	5
A₁₀		8	6
A₁₁		9	7
A₁₂		10	8
A₁₃	10	11	9
A₁₄	11	12	10
A₁₅	12	13	11
FL	13		12
IRQS	14	14	13
UOP	15	15	14
UOP	16	16	15
UOP	17	17	16

I then went back to ROMtools and instructed it to slice the problem up into three ROMs using the bits above, and it created appropriate ROM images containing the functions.

The unary ROMs are thus two 256K×8 (2 Mbit) units and one 129K×8 (1 Mbit) unit. Since I already had plenty of 512K×8 devices, I used those in the design instead. It gives me room to grow. Unused address inputs are tied to ground to select the lowest addresses. CE is also tied to ground to permanently select the chips. The output enable signal, OE is driven by UOE from the binary/unary ROM decoder.

Since the A port of the ALU is registered, the unary ROM can output directly to the IBUS when enabled. And it does: the outputs of the ROMs are sent directly to the IBUS.

The last two outputs, ROLL16 and ISROLL drive part of the L register logic. These ones are active high and pulled down so they will be deasserted when the ROMs' outputs are tri-stated.

Schematic of the ALU Unary/Constant Store ROMs. — Figure 26. Schematic of the ALU Unary operation and Constant Store ROMs.

B2.12.5. Binary Operation Decoder

There isn't any! The least-significant two bits of RUNIT, RUNIT_0–1 are routed directly to the binary ROMs to look up the result for the correct binary operation.

The binary ROMs are decoded in the simplest way possible because they're cascaded to allow carry bits to propagate for addition. We need them to be as fast as possible.

To keep things even faster, the binary ROMs are permanently selected enabled by tying CEand OE to ground. To tri-state their outputs when they're not needed, we use an alternate mechanism.

B2.12.6. Binary ROMs

Making the binary ROM images was much simpler than the unary ones. What complicates the issue is that the binary operations truth table has 34 inputs:

16 bits for the A port.
16 bits for the B port.
2 bits for the operation code.

What makes it easy is that in three of four operations, all inputs are completely independent of each other. The fourth, addition, presents problems: bits are interdependent because of addition carry. Normally, large-width adders suffer from carry propagation problems. In our case, we use tables to do the dirty work so we only need to propagate signals between ROMs. Since we use three ROMs, we have two signals to propagate.

Still, we can trivially split the problem up into two 6-bit slices and one 4 bit slice, and this is exactly what's done.

Binary ROM	A port	B port	Carry in	Y output	Carry out	Others
0	A_0–5	B_0–5	FL	Y_0–5	CO1
1	A_6–11	B_6–11	CO1	Y_6–11	CO2
2	A_12–15	B_12–15	CO2	Y_12–15	ALUCPL	FLATCH, FVOUT

ROMs 0 and 1 require 15 bits of address, so 32K×8 is enough. ROM 2 requires 11 bits, just 2K×8. Of course, I still used 4M×8 parts because I had them lying around.

Another thing to note: ADD doesn't have carry in, but FL is actually connected to ROM 0! It's not currently used, but I was probably thinking of future expansion, given all the free bits in the ROMs!

All three ROMs have their CE and OE tied to ground to permanently assert them. This is done for speed, because it shaves off quite some time from carry out propagation. Instead, the outputs of Y go to a pair of 74HC541 buffers. When the operation decoder asserts BOE, Y is output to the IBUS. This is much faster than enabling the ROMs.

Carry cascade delays

The worst case look-up time for an addition result is threetimes the access time of a ROM: 3×70 ns, 210 ns. This is nearly an entire processor cycle. Since we don't know in advance if carry out will be propagated, we just need to wait. This is fine, because an ADD instruction reads a value from memory and adds it to the current value of the accumulator. The accumulator hasn't changed since at least the previous instruction, so that's been stable for a while. Reading the value from memory is done in the first execution cycle of the instruction, and reading back the addition result happens at the end of the next one, 250 ns later.

ROM 2 outputs three special signals:

FLATCH. This signal goes high when the ALU wants to update the The L Register and V flags. This happens for addition only.
ALUCPL. This is asserted (low) to flip the The L Register register when an addition resulted in carry out. This may only happen if FLATCH is also asserted.
FVOUT. This is the new value of the V flag, to be used only if FLATCH is asserted.

These signals are passed on to an overly complex piece of logic which I'm completely failing to describe in the next section.

B2.12.7. Updating the V and L Flags

Some binary operations (ADD, I'm looking at you) set flags after they're done. Since the CFT is an accumulator machine, the ALU isn't responsible for many of those, but that still leaves us with two:

The The L Register which is used as carry out. When there's a carry out after an addition, Binary ROM 2 outputs an asserted ALUCPL, and this is used to toggle L.
The Overflow flag (FV). This is output by Binary ROM 2 as FVOUT.

An additional signal FLATCH is output by Binary ROM 2 to indicate that the flags should be updated. This only happens for the ADD operation. When both BOE and FLATCH are asserted, two one-bit registers store the current values of ALUCPL and FVOUT. These registered values are then output as FLTADD (an input to the The L Register) and FV (the overflow flag) respectively. This is done on the positive edge of CLK3, so the flags are updated at the end of the processor cycle.

A 74HC253 is used as a generic two-input gate to check for BOE and FLATCHassertions. The chip is used to gate CLK3 to the CLK input of a 74HC74 dual flip-flop which stores the flag results.

Another minor bug!

The ALU is full of them, it seems. When either BOE or FLATCH aren't asserted, the gate's output is low. When they're both asserted, and CLK3 isn't yet low, the output will go high, and the flip-flops (which clock on any positive edge) will register the flags early. This isn't a huge problem. The The L Register is clocked too and will ignore the spurious assertion. The gate's ‘false’ inputs should have been high, not low. This can be hacked on the current revision of the ALU board, but it would require desoldering the 74HC253 to access a via under it. (yes, I know, I shouldn't be putting vias under ICs. This probably counts as another bug.

Here's the truth table of the flag circuitry:

RESET	BOE	FLATCH	CLK3	ALUCPL	FVOUT	FLTADD	FV	Notes
0	X	X	X	X	X	1	0	Reset.
1	1	X	X	X	X	-	-	No binary operation. Keep last state.
1	0	0	X	X	X	-	-	Keep last state.
1	0	1	↑	0		0		ALU toggling FL. Assert FLTADD.
1	0	1	↑	1		1		ALU not toggling FL. Deassert FLTADD.
1	0	1	↑		0		0	FV cleared.
1	0	1	↑		1		1	FV set.

The two outputs are different in nature: FV is an actual flag, so it's sticky. It's reset when RESET is asserted, as you'd expect. FLTADD is an active-low strobe, so it's set whenever CLK3 is low.

And a major bug!

The L flip-flop is asynchronously set when CLK3 is low (via the 74HC253), but it's also clocked on the rising edge of CLK3. The asynchronous signal is delayed around 30 ns as it passes through the 74HC253, which means the flip-flop will probably not have time to start clocking on CLK3. The Verilog verification model uses CLK4 for the 74HC253, and it passes tests (but the propagation delays in the simulated 74HC253 may be inaccurate).

Figure 27. Schematic of the ALU Binary ROMs. The schematic also shows the logic the updates ALU-related flags.

B2.13. The L Register

The L register is basically a complex flip-flop with all the features of a D and JK flip-flop:

Asynchronous clear.
Synchronous toggle.
Synchronous set or clear (data in).

It obeys the following truth table:

RESET	CLL	CPL	FLTADD	ACCPL	CLK1	ISROLL	ROLL16	Action
0	X	X	X	X	X	X	X	Asynchronous reset to 0.
1	0	X	X	X	X	X	X	Asynchronous reset to 0.
1	1	0	X	X	↓	X	X	Synchronous. CPL asserted. Toggle The L Register.
1	1	1	0	X	↓	X	X	Synchronous. ADD carry out. Toggle The L Register.
1	1	1	1	0	↓	X	X	Synchronous. ADD carry out. Toggle The L Register.
1	1	1	1	1	↓	1	0	Synchronous. Roll operation. Set FL to 0.
1	1	1	1	1	↓	1	1	Synchronous. Roll operation. Set FL to 1.

It's implemented using a 74HC112 dual flip-flop, only half of which is used. The asynchronous reset is the easiest: a single AND gate (OR in negative logic) asserts the flip-flop's CLRinput when RESET or CLL are asserted.

That's the easy bit. JK flip-flops toggle when both J and K inputs are set, and we want to be able to use that. But they have no D inputs, so we must decode the data input into J(set) and K (clear) signals.

Toggling happens when any of CPL, FLTADD and/or ACCPL are asserted. We use a single three-input NAND gate to to produce an active-high signal when any of these signals are low.

Setting happens when ISROLL is set, and we decode that with a pair of AND gates: ISROLL AND ROLL16 asserts J. ISROLL AND NOT ROLL16 asserts K.

Each of J and K inputs has two signals going to it, and we use two two-input OR gates to combine them. They may happen simultaneously, but they can't. The toggles come from additions and the CPL instruction, while the sets come from the roll instructions, and they can't happen simultaneously.

Bug!

Looks like the NAND gate that generates LJ (IC4) should be an AND gate.

B2.14. The I/O Device Decoder

The I/O Device Decoder provides control signals for the most significant 8 bits of an I/O address. At the cost of two chips, it saves two chips per expansion board, and helps me avoid partial decoding. This is by no means a necessary part of the processor. It's there for pure convenience. Yes, I understand the irony of going for convenience in a project so manifestly masochistic, but excessive repetition can be boring and I'd need to solder an extra 11 wires per expansion board. That's 190 connections where I can make mistakes or short something out. I may be a masochist, but I'm not stupid.

The I/O Device Decoder decodes I/O addresses like this:

IO	AR	SYSDEV	IODEV1XX	IODEV2XX	IODEV3XX
1	X	1	1	1	1
0	0000-00FF	0	1	1	1
0	0100-00FF	1	0	1	1
0	0200-00FF	1	1	0	1
0	0300-00FF	1	1	1	0
0	0400-FFFF	1	1	1	1

To implement this veritable wonder of combinatorial logic, I've used a 74HC688 8-bit comparator to compare AR11–15 to 00000, tying the unused bits to ground. The comparator's G enable signal is driven by IO. I use G because it's the fastest input on the chip. The AR value will have over 250 ns to settle, but the faster we respond to IO assertions the better for potentially slow expansion boards. So, when IO is asserted and the AR contains addresses in the range 0000-07FF, the comparator's P=Q output is asserted.

This is routed to one of the two active-low enables of a 74HC138 decoder, of course. The decoder's other two enables are all permanently enabled. AR8–10 are decoded into eight active-low signals, each representing 256 I/O space addresses. Of those, the first four are routed to the Expansion Bus and can be used by any expansion board.

In this way, an expansion board can:

Use a single 74HC138 decoder to decode up to five of the remaining eight bits, decoding eight-address ranged cheaply. Most expansion boards I've designed so far do exactly this.
Include a jumper bank to select which of the four I/O ranges to use, allowing multiple cards of the same type to coexist, with base addresses 256 locations apart.

B2.15. Bus Terminators

The CFT buses can behave like transmission lines, so they need to be conditioned and terminated to account for signal bouncing. This is done for the processor's IBUSas well as its expansion bus. The IBUS meanders around a lot, is connected via cheap ribbon cable and sometimes needs to carry high-speed strobes (for CFT standards). The expansion bus is long and carried on a backplane I've had for decades and am using for historical reasons, so can't really trust. None of the CPU boards were built with noise in mind, and my electronics skills fall well short of the voodoo needed to fix such problems.

To terminate the buses, I've used Texas Instruments' SN74ACT1073 (16-bits) and SN74ACT1071 (10-bits) bus terminators. They provide signal conditioning, bus termination and bus hold and all they need is one pin per signal. They were love at first sight.

There's a Story About That

Before I constructed all the processor boards, I'd already built the memory board and the first version of the Debugging Board, which would allow me to test the bus and boards individually, using an external computer, and without the processor present. Something went really wrong and on one of the first tests, I fried an expensive I²C GPIO controller, and two memory chips. The suspects were floating buses and really bad signal conditioning, as well as some very strange pull-ups, pull-downs or impedance matching resistor packs on the backplane (I forget which) — but I know the backplane wasn't built for CMOS. I immediately researched voltage clamps, bus conditioning etc. and went through another iteration of the processor boards with the TI chips which provide enough of those functions to make me less paranoid. It's worked fine ever since.

Schematic of (some of) the Front Panel Buffers and Bus Termination. — Figure 29. Schematic of some of the Front Panel buffers (others are strewn about the schematic sheets in this document), and the expansion bus terminators.

B2.16. Front Panel Buffers

Most of the control logic of the processor is shown on the front panel. The cables to the front panel are very long, and guaranteed to pick up all the noise in the world. In the original version of the front panel, they would have driven LEDs directly and many CMOS devices can't drive the 5 mA needed.

To fix that, all signals leaving the processor boards for the front panel are buffered using 74HC541 buffers. The buffers:

Terminate the signals and isolate them from the front panel, reducing their length and thus transmission line effects.
They help avoid picking up interference from the long ribbon cables that take the signals to the front panel.
They do away with voltage drop and generate nice, rail-to-rail, square waveforms.
They offer modest protection from external shorts and other badness.

On the downside, they slew the signals (‘slew’, as as in Buffy the Vampire Slewer). Nobody would notice. They're driving LEDs, after all.

B2.17. Built-In Processor Extensions

The CFT was designed for easy extension, and in such a way that the line between peripherals and processor instruction set extensions is blurry. No surprises there, this was the case with many computers in the past.

There are two extensions built as part of the CFT processor. Another, the interrupt controller board (comparable to a programmable interrupt controller chip), wouldn't fit so it was left as a separate board.

B2.17.1. The µCB Extension

The Microcode ROMs I'm using on the CFT are 512K×8 devices, and they have 19 address lines. Microcode addresses are 15 bits wide. What to do with those spare bits? Well, obviously add more microcode and more instructions!

The UCB is basically the poster child of processor extensions. Yes, it ties into the Microcode Store, but it shows off what can be done with this mechanism.

B2.17.2. Programming Model Extension

The UCB introduces instruction prefixes to the CFT. 6502 programmers will scoff, but people with processors with an ‘8’ in them will be strangely at home here. We can ‘shift’ an instruction to a different meaning by prepending a particular op-code.

Like the Z80, the prefix is the same size as the machine's word. 8 bits on the Z80, 16 bits on the CFT. Unlike the Z80, the assembler considers the prefix a separate instruction, so it goes on its own line:

        SBN ucb.BNK3    ; Next instruction shifts to instruction set bank 3
        STORE R 3       ; Who knows what this might do?
        LOAD R 6        ; This is back to the standard instruction set (bank 0)

The standard assembler library defines the new instruction macros SBN (change bank for next instruction) and SBP (change bank for all subsequent instructions). These are internally defined as OUT R &00x and OUT R &01x respectively. The values ucb.BNK0 to ucb.BNKF set the desired bank. In standard assembler fashion, they're ORred with the rest of the values on the line. The sixteen ucb.BNKx values will get better names (or aliases) when I come up with ideas on what to put in the other fifteen microcode banks.

Note that the SBP instruction changes the instruction set permanently, until the next SBP instruction. If an interrupt is triggered, the behaviour would be undefined. You also can't read the current microcode bank setting, so you can't put it on, say, a stack. So, this instruction is meant to set the instruction set at boot time, and it's probably never touched again. Other than that, it would only be useful in very tight blocks of code that run with interrupts disabled. The idea was to allow for testing various revisions of potentially buggy microcode without having to pull and reprogram the ROMs after every test. The idea arose when I was comparing two- and three-cycle memory reads and writes to see what would work. Pulling and reprogramming the ROMs for every test is a huge pain.

Who Knows Indeed!

As of the very, very end of 2018 (I'm writing this on New Year's Eve), there are no extensions to the microcode. All banks contain the exact same microcode, so the UCB extension does nothing—yet. There are some extensions planned, but I'll wait to see what needs arise as I write more of the ROM.

Caution! Increased Interrupt Latency

Using the SBN instruction ignores interrupts until the shifted instruction is executed. Depending on long the shifted microprogram is, this can add between 0.5 µs and 8 µs to the interrupt processing latency. Whatever, really: the CFT's peripherals are all designed with high latency in mind, like proper mini-computer peripherals would be.

B2.17.3. Some Ideas For Banks

Here are some basic ideas for what can be put in banks:

Opposite World. All AC and DR increments decrement instead. Auto-index mode decrements too.
I/O space operation slowed by one or two cycles for really slow devices. (better handler on the expansion board itself, though).
Longer memory access cycles.
Versions where ADD takes an extra processor cycle in case carry doesn't propagate.
Exotic addressing modes.
Any obvious, simple operation that is commonly implemented in more than two instructions.

B2.17.4. Extending the Extension

The ALU's Binary ROMs have enough free bits. Route the microcode bank number to them too, then we can have different operations too. (within reason)

B2.17.5. Hardware Design

The UCB ignores the value of the AC (i.e. the value placed on the Data Bus). All the information it needs comes from the I/O space address being written to. The extension maps to I/O addresses 000–00F for permanent microcode switching, and 010–01F for temporary shifts. Shockingly, this is done with a 74HC138 which decodes the range perfectly, though we need a single OR gate to provide a negative logic AND of SYSDEV and W:

SYSDEV	W	AB_0–7	SETUPCBP	SETUPCBT	Notes
1	X	XXXXXXXX	1	1	Not an I/O space request, or AB ≥ 0100.
0	1	XXXXXXXX	1	1	Not writing.
0	0	0000XXXX	0	1	Change the permanent bank.
0	0	0001XXXX	1	0	Change the temporary bank.

The bank to change to is held in the least significant four bits of the Address Bus, rather than the Data Bus.

The SETUPCBP and SETUPCBT strobes clock one of two 74HC175 quad D flip-flops, each of which takes AB_0–3 as input. These act as our bank registers. Incidentally, they're reset to zero when RESET is asserted. No surprises there.

Their outputs are connected to a 74HC157 4-of-8 multiplexer which selects either the permanent or temporary bank.

An additional 74HC74 flip-flop remembers which one to enable. It is asynchronously set to high when SETUPCBT is asserted during the selection of a temporary bank. Its D input is tied to ground, and clocked on the positive edge of END. This generates a brief strobe between the negative edge of W and the positive edge of END. During the strobe, the multiplexer switches to the temporary bank. Its selection is output to a third 74HC157 quad D flip-flop, which is also clocked o the positive edge of END. This makes the selection take effect as soon asthe current instruction has completed, but not during it. The output of this flip-flop goes back to the Microcode ROMs to address the most significant four bits and select a new microcode bank.

There's one further complication: a temporary shift instruction followed by the shifted instruction is an atomic entity. Not in the boom way, in the can't cut it up way. We can't allow an interrupt to break up the shifted instruction, because when the ISR completes and returns to the second part, it will no longer be shifted. This is accomplished back in the Microcode Store: we OR the active high output of our ‘temporary instruction set shift’ flip-flop with IRQS. While a temporary shift is in operation, interrupts are effectively disabled.

B2.17.6. Making It Optional

This extension is an afterthought. A welcome afterthought, but an afterthought. It's simple enough and passes Verilog verification, but it might not work properly on real hardware. To avoid it compromising the entire Control Unit, there are solder jumpers on the Microcode Store board. Removing the IRQS masking gate and bridging the solder jumpers will tie the most significant four bits of the Microcode ROMs to ground and reconnect IRQS directly to the ROMs per the original design. This bypasses the UCB completely, and its instructions become 4-cycle NOP instructions (like all unmapped I/O space writes, of course)

B2.17.7. Microcode Design Considerations

There are some obvious little details about designing banked microcode, but they probably merit mentioning.

All banks must have an identical reset microprogram. The reset microprogram is repeated a lot in the microcode. It must be identical across banks.
All banks must have an interrupt microprogram. It doesn't need to be identical, but it probably would be.
All banks must have some sort of OUT instruction. Otherwise, you can't switch out of the bank if it's set as a permanent one.

Figure 30. Schematic of the µCB Extension.

B2.18. The Memory Banking Unit

The MBU uses the three most significant bits of memory addresses and extends the computer's addressable memory by eight bits: a net gain of five bits, or 21 bits total: 2 MW of memory can now be accessed. It does this by splitting the CFT's native 64 kW address space into eight 8 kW banks and using bank switching to map any 8 kW block of physical memory to each of these banks.

There are some positive side effects to this:

Lots and lots more memory.
Multi-user features! By allowing Page Zero to be mapped anywhere in memory, we can support multiple users with different contexts.
Lots and lots more memory. The constructed memory card offers 1 MW of RAM and 512 kW of ROM.

Terminology Warning!

Some discussions of the MBU erroneously use the term ‘page’ for the 8 kW blocks of memory. This is confusing, but may still be found in ROM comments etc. Exercise caution in parsing such descriptions!

B2.18.1. Programming Model Extension

The MBU appears as the eight I/O space addresses 020–027. Each corresponds to one memory bank. To control the MBU, an OUT instruction is issued to the appropriate bank. The AC must conform to this format:

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

0	0	0	0	0	0	0	Disable	Memory Block

The ‘Disable’ bit must be 0 to enable the software mapping, 1 to disable it and revert back to the hardwired map. The ‘Disable’ bit is only available on locations 020–023 due to partial decoding. The negative semantics are a design choice to aid the software. It simplifies configuring banks and enabling the MBU simultaneously.

For instance, to enable banking and map block 23 to bank 3:

        LI &023         ; Enable MBU and map block &23
        SMB 3           ; ...to bank 3.

To disable it again: (note that this also maps block 23 to bank 3, and the MBU will remember this mapping)

        LI &123         ; Disable the MBU. (bit 8 set)
        SMB 3

The standard assembler file mbu.asm includes defines and macros to make this easier. For example, this is how the CFT ‘operating system’ initialised the MBU once detected, to produce an original mapping with 32 kW of ROM and 32 kW of RAM blocks:

        ;; Enable and configure the MBU.
        mbu.LSMB(7, &83)    ; ROM
        mbu.LSMB(6, &82)    ; ROM
        mbu.LSMB(5, &81)    ; ROM
        mbu.LSMB(4, &80)    ; ROM
        mbu.LSMB(3, &03)    ; RAM (implicitly enables the MBU)
        mbu.LSMB(2, &02)    ; RAM (implicitly enables the MBU)
        mbu.LSMB(1, &01)    ; RAM (implicitly enables the MBU)
        mbu.LSMB(0, &00)    ; RAM (implicitly enables the MBU)

Write Only!

The MBU is a write-only device. The current mapping can't be inspected. The OS keeps eight words on Zero Page that represent the last set mapping, but of course there's nothing to stop me form just updating them mapping and not letting the OS know.

B2.18.2. Hardware Design

The design of the MBU is complicated by the following requirements:

It must be capable of disabling itself programmatically.
When disabled, a default memory mapping must be made available.
The front panel ‘RAM/ROM’ switch must be honoured.

The basic workings of the extension are like those of a standard peripheral, except that the MBU is the only peripheral allowed to drive the AEXT_0–7 signals on the Expansion Bus.

At the centre of the design are four 74HC670 register files. These are hard to find these days, but I've had a soft spot for them since I got my first Texas Instruments databook. I wouldn't dream of designing a processor without at least some of them in there. Each chip contains four 4-bit registers. Four data inputs, two write address lines and a write enable line are used to write to any 4-bit register. The reading circuitry is separate: four data outputs, two read address lines and a read enable line control what register (if any) is output.

The four 74HC670 chips are organised into a 2×2 array to form an eight 8-bit register file. Each of the eight registers is a memory bank.

B2.18.2.1. Writing to the Registers

Write decoding involves quite a number of bits, so we do it by using an 74HC138, a single OR gate, and one half of a 74HC139 decoder.

The 74HC138 decodes SYSDEV and AB_3–7 to detect I/O space addresses 020–027.

The OR gate acts as a negative logic AND gate between the 74HC138 output and W to detect writing to I/O space addresses 020–027.

The 74HC139 decodes this into two signals:

BANKW0 indicates a write to I/O space addresses 020–023 and selects the first pair of 74HC670 ICs.
BANKW1 indicates a write to I/O space addresses 020–023 and selects the first pair of 74HC670 ICs.

When BANKW0 is asserted, the low-order register pair latches the 8-bit value in DB_0–7 into the register identified by AB_0–1. The bit in DB₈ is clocked on the rising edge of BANKW0. This outputs a badly named, complementary pair of BANKING and BANKING signals. The MBU is enabled when BANKING is low and BANKING is high.

When BANKW1 is asserted, the high-order register pair latches the 8-bit value in DB_0–7 into the register identified by AB_0–1.

Bug!

The schematic (and the actual board I've built) shows an AND gate, when it should be an OR gate. Whoops.

SYSDEV	W	AB_0–7	BANKW0	BANKW1	Notes
1	X	XXXXXXXX	1	1	Not an I/O space request, or AB ≥ 0100.
0	1	XXXXXXXX	1	1	Not writing.
0	0	00100000	0	1	`OUT R &20`: Write to low-order register 0
0	0	00100001	0	1	`OUT R &21`: Write to low-order register 1
0	0	00100010	0	1	`OUT R &22`: Write to low-order register 2
0	0	00100011	0	1	`OUT R &23`: Write to low-order register 3
0	0	00100100	1	0	`OUT R &24`: Write to high-order register 0
0	0	00100101	1	0	`OUT R &25`: Write to high-order register 1
0	0	00100110	1	0	`OUT R &26`: Write to high-order register 2
0	0	00100111	1	0	`OUT R &27`: Write to high-order register 3

B2.18.2.2. Reading from the Registers

Reading from the registers can happen independently and simultaneously, but reading and writing is guaranteed not to happen simultaneously on the CFT. While writing to the registers happens in I/O space, reading from them happens in memory space.

The unit is selected for read on any memory access whatsoever. This makes sense. We need to extend memory space addresses, after all. This makes decoding very easy. All we need is the other half of the 74HC139 we already used before. It selects the MBU when MEM is asserted. Bits AB_13–15are used to select the memory bank. This is what breaks up the unexpanded memory space into eight 8 kW banks. However, since we have two register banks, we need to generate BANKR0 and BANKR1signals based on the value of AB₁₅. So, the decoding scheme is as follows:

MEM	AB	BANKR0	BANKR1	Notes
1	XXXXXXXXXXXXXXXX	1	1	Memory not being accessed.
0	000XXXXXXXXXXXXX	0	1	Address 0000–1FFF: read from low-order register 0.
0	001XXXXXXXXXXXXX	0	1	Address 2000–3FFF: read from low-order register 1.
0	010XXXXXXXXXXXXX	0	1	Address 4000–5FFF: read from low-order register 2.
0	011XXXXXXXXXXXXX	0	1	Address 6000–7FFF: read from low-order register 3.
0	100XXXXXXXXXXXXX	1	0	Address 8000–9FFF: read from high-order register 0.
0	101XXXXXXXXXXXXX	1	0	Address A000–BFFF: read from high-order register 1.
0	110XXXXXXXXXXXXX	1	0	Address C000–DFFF: read from high-order register 2.
0	111XXXXXXXXXXXXX	1	0	Address E000–FFFF: read from high-order register 3.

The outputs of the low order and high order registers are wired together. The 74HC139 decoder ensures at most one chip will be driving each line at any time. The connected outputs are connected to the inputs of a 74HC541. When BANKING is low (banking enabled), the outputs drive AEXT_0–7. When BANKING is high, the outputs are tri-stated.

B2.18.2.3. Behaviour when Banking is Disabled

When BANKING is high, the bank register file is tri-stated.

Its complementary BANKING will be low. This drives AEXT_0–7 using a hardwired mapping that depends on the setting of FPRAM (the RAM/ROM switch on the front panel) and AB_13–15.

When FPRAM is in the RAM position and the signal is low, all of the CFT's memory is mapped to RAM (and it starts up virginal and halted, like a PDP-8). In this case, we want this mapping:

Bank	CFT addresses	AEXT	Physical addresses
0	0000–1FFF	00	00000–011FFF
1	2000–3FFF	01	02000–031FFF
2	4000–5FFF	02	04000–051FFF
3	6000–7FFF	03	06000–071FFF
4	8000–9FFF	04	08000–091FFF
5	A000–BFFF	05	0a000–0B1FFF
6	C000–DFFF	06	0c000–0D1FFF
7	E000–FFFF	07	0d000–0F1FFF

When FPRAM is in the RAM and ROM position, the signal will be high. In this case, we boot up with 32 kW of RAM and 32 kW of ROM. ROM is mapped starting at the first Megaword onwards (0x100000), so the mapping becomes:

Bank	CFT addresses	AEXT	Physical addresses
0	0000–1FFF	00	00000–011FFF
1	2000–3FFF	01	02000–031FFF
2	4000–5FFF	02	04000–051FFF
3	6000–7FFF	03	06000–071FFF
4	8000–9FFF	80	10000–111FFF
5	A000–BFFF	81	12000–131FFF
6	C000–DFFF	82	14000–151FFF
7	E000–FFFF	83	16000–171FFF

Prepare for Untold Horrors. Or At Least Mild Nostalgic Discomfort.

Converting a physical memory block number (the value of AEXT) to a physical address is annoying. You have to calculate (aext << 13) | (addr & 0x1fff) in your head. It's annoying because the shift isn't at a convenient nybble. It actually is a little bit more natural to show physical addresses in the universally despised 8086 segment:offset format. For example, physical address 111FF0becomes 80:1FF0. It makes things clearer (at the cost of one extra character). And at least, unlike the corresponding Intel syntax, there's only one way to write a physical address on the CFT.

Now for the nitty-gritty of it. We need to derive logic to implement the mapping from FPRAM and AB to AEXT. If you study the tables above you can't help but notice:

AB_13–14 seem to map to AEXT_0–1 directly.
AB₁₅ maps to AEXT₂ in the RAM-only mapping.
AB₁₅ maps to AEXT₇ in the RAM and ROM mapping.

Sounds like a job for super-74HC253! (said no-one ever, anywhere in the multiverse)

So: we wire a 74HC253 as a two-input, two-output arbitrary function on AB₁₅ and FPRAM. It outputs two signals, HWA2 and HWA7 that will be the source for AEXT₂ and AEXT₇.

FPRAM	AB₁₅	HWA2	HWA7	Notes
0	0	0	0	RAM-only layout. Maps to RAM.
0	1	1	0	RAM-only layout. Also maps to RAM.
1	0	0	0	RAM and ROM. Maps to RAM.
1	1	0	1	RAM and ROM. Maps to ROM.

These signals, along with AB_13–14, are brought to a 74HC541 buffer. The buffer's output is enabled when BANKING is low (i.e. banking is disabled) and MEM is asserted (i.e. the Address Bus holds a valid memory space address. Its output is connected to AEXT as follows:

Input signal	Output
AB₁₃	AEXT₀
AB₁₄	AEXT₁
HWA2	AEXT₂
Ground	AEXT₃
Ground	AEXT₄
Ground	AEXT₅
Ground	AEXT₆
HWA7	AEXT₇

B2.18.2.4. Front Panel Connections

An additional 74HC541 buffer, permanently enabled, buffers AEXT and BANKING for display on the front panel. The BANKING indicator lights when banking is enabled and the soft mapping is in use.