What's This Then?
For those who just tuned in, this is about the CFT, my home-designed, home-built mini-computer made from scratch. I've designed and built the instruction set, data path, processor, computer, software stack, cross-assembler, emulator, software toolchain and even some of the tools needed to do this. You can call this a ‘fantasy mini’.

It all started with KiCad. I decided Eagle's board size limitations were becoming too stifling for me and I wasn't going to be duped into a subscription model when I don't use the software this regularly. KiCad 5 was a revelation. It's clunkier than Eagle in many ways, but much better in many others.

Having no limits on board size meant that a lot of the harder decisions I made in designing the CFT were suddenly lifted and I started questioning every single one of them. That led to a new backplane style, new, larger processor boards, a new front panel, a new, less bizarre DFP. And suddenly, the realisation that even the microcode and architecture has constraints that I could lift.

This document is about all of those constraints and whether they should really be lifted to make the CFT more capable and easier to program, or just stay there and keep it clunky and old-timey. Both ideas appeal to me greatly, so expect dilemmas and crippling indecision.

Problems to Solve

The current incarnation of the CFT is ugly and nearly unmaintainable. The problems:

  • The backplane is not ideal for my needs. It requires side-buses for control signals that look ugly.
  • Patching mistakes means redoing entire large boards. I need a better backplane and smaller, simpler, cheaper to redo boards.
  • Too many connecting wires that can fail.
  • The Control Bus is a horror.
  • Both of these problems preclude me from building a card cage and enclosure: cards are different sizes, and the Control bus would make it difficult to pull the processor boards.

How to Solve These Problems

  • A new front panel. The current front panel requires five 50-way ribbon cables going to nine modules. The proposed new design reduces this to one 40-way ribbon cable and two modules, with the modules daisy-chained.
  • A new design for the DFP. The DFP has served well, but it's also a source of many cables. Five 50-way ribbon cables to the front panel, four 40-way cables to the processor boards, plus connections to the Control and Expansion Buses. That's a lot of wires. The proposed redesign will reduce this to one edge connector for the computer's buses and a 40-way cable to the front panel.
  • I will (reluctantly) abandon the glorious 19-slot DIN 41612 backplane in favour of a custom-made one that uses cheaper edge connectors. The backplane is modular and can be expanded in width. Each slot has around 120 bussed signals and 40 unbussed ones. These can be wired in custom ways or, in the case of peripherals, brought to the rear of the computer to connect to sockets.
  • Take advantage of the backplane design to split the processor boards into relatively generic, small boards. This makes it easy to have them fabricated (since most fab houses have a five board minimum order), easier to patch, cheaper to make, and makes the computer's innards look a little like the PDP-8's.

Front Panel

The current front panel is made out of 8 modules, each with 20 LEDs in five rows of four lights.

The updated front panel is to be made out of 4 identical modules, each with 40 LEDs in five rows of eight lights each. The modules are daisy-chained together using a very short 40-conductor cable. The first module connects to the DFP.

Data for lights is time-multiplexed 16-bits at a time at a high rate (ideally using a 16 MHz clock). Because of this high rate, the bus should include bus hold circuitry and/or be impedance matched.

Given there are 160 lights displayed on the front panel, lights are updated every ten clock ticks. To get all the lights to the front panel for every CFT instruction at full speed, this requires a clock ten times that of the CPU's clock, and that's 160 MHz. Bad idea! Alternatively, accept that at full speed, the lights won't be 100% accurate and it'd be absolutely impossible for a human to tell and use a 16 MHz clock for the lights, or whatever works reliably.

The front panel includes jumpers for controlling whether a group of eight lights is always on or controlled by the LTS switch.

The switches on the front panel are also sampled as a matrix, in groups. Interlocks and semantics for those are handled purely in software and up to 64 switch signals can be handled (eight rows of eight switches). This includes the panel lock switch. Both pins of the power switch are connected directly to the DFP since they directly engage the power supply.

Debugging Front Panel (DFP) Board

This will be a very wide board occupying the entire bottom of the card cage, and connected to all three edge connectors. It will include a much more powerful microcontroller with plenty of I/O pins. (probably still an Atmega though)

Safety interlocks using 74HC258 and 74HC138 chips will remain, but shift registers will go away in favour of sampling 8 bits at a time. There will also be a signal router combining those and sending them on the front panel.

The MCU should run a higher clock than the 7.3728 MHz, at least double that.

Backplane

As I write this, the backplane design is in a good state of completion. It comes in three distinct rows of eight slots each. The connectors are cheap PCI Express 16-lane connectors with 164 pins. 124 pins are bussed. 40 are local to each card and broken out to a 40-pin 2.54mm pitch header. Any one or two-row header can be installed there, including multiple shrouded headers. Ribbon cable connectors can bring the signals to ports on the rear panel.

If a header isn't installed, the 40 pins can be wired at will, and this is the way the processor control signals will be connected between cards.

The backplane includes bypass capacitors and eight pins of power to each slot. There may be more eventually.

The same module is to be used for the processor and peripherals. A double-high bridge card will connect the three bus columns, provide bus hold circuitry, and also drive the data, address etc lines for the expansion bus.

Slot pitch is 25mm, and with 1.6mm cards, a 22mm spacer is required to build such a card.

I envisage the new backplane as vertical, slots facing forward. Cards are installed horizontally, like the PDP-8. 40-pin slot-local connectors are installed in the rear, with cables running to the rear panel. The cable to the front panel is either installed in front (reversing pin numbers) or plugs in to the back and passes under the backplane and card cage to go to the front panel.

A render of the new backplane module.

A render or the new backplane module. Multiple modules are used to build the processor and expansion backplanes.

Processor and expansion boards

All boards should be the same height (depth when installed on the backplane). The minimum width is around 100 mm. I can't decide between a cheap 80×100 mm and a more practical 160×100 mm. I think the latter will win out because it's more suitable for DIP packages.

Cards can be single, double, and triple width.

Card TypeWidth (mm)Height (mm)
Single width100 + A160
Double width2×(100 + A) + B160
Triple width3×(100 + A) + 2×B160

Dimension A is an variable to be decided. Dimension B adjusts for the gap between slots, and is also to be decided.

To keep costs down and debugging cheaper, I'm trying to go for single width cards where possible, each handling 8 bits of data. That implies that e.g. registers would come in two cards each. I will attempt to make cards as generic as possible to allow reuse so, for instance, I can get ten register board prototypes made and use six of them for the major registers. Some cards are very simple and can be implemented on suitable, single-width prototype boards. These are the card types so far:

Card typePurposeNotes
Single widthGeneric prototypingGeneral purpose for DIP chips, e.g for clock generator, reset, SBL, AIL.
Single widthROM boardOne or two PLCC flash chips plus room for decoder ICs for microcode, ALU, ROM.
Single widthMajor register board8 bits of a major register with prototyping space.
Triple widthBus driverBus hold circuitry. Bridges slot columns. Drives expansion bus.
Triple widthDFP 2Diagnostics, debugging, remote testing, front panel controller.
Triple widthPeripheral prototypingFor building custom peripherals.
Single widthCPLD boardFor building the VDU board and possibly similar boards.

Each card, including prototyping ones should have one activity LED in the exact same spot. There will be space for between one and six monostable multivibrators to make short pulses visible. Each LED needs one sixth of a Schmitt trigger chip (or a single gate one), one signal diode, one clamping diode, one capacitor and one resistor.

Each card will have a card holder. I have found cheap multi-coloured ones from Vero Technologies.

This design initially leaves space for just eight expansion boards, so the prioritised list is:

DesignationPurposeNotes
TTYQuad serial cardThe first peripheral board on the computer!
IRC8 Interrupt boardMove to the processor side or combine with TMR.
IDEIDE Interface plus on-board CF card slot
TMRReal time clock, timers, NVRAMUseful but not immediately necessary.

Once these are done, the next stage includes adding the KBD, VDU and SND boards, which may actually be merged into one double or triple width board.

The full set of original boards is like this:

  • MEM: 1.5 MByte RAM/ROM board.
  • µCB: 4-bit microcode store extension.
  • IRC: 8 interrupt controller.
  • MBU: 5-bit address space extension.
  • TTY: quad serial card, 16550.
  • IDE: double IDE interface (four drives)
  • TMR: real time clock, programmable timers, NVRAM.
  • ETH: Ethernet controller using the WIZ5100.
  • KBD: PS/2 keyboard controller.
  • PSG: Two General Instruments AY-3-8910 audio chips.
  • SPJ: SpeakJet® speech synthesiser (located on PSG board)
  • PFP: Programmer's front panel controller. (superseded by DFP)
  • DEB: Debugging card. (superseded by DFP)
  • DFP: Debugging Front Panel card.
  • FDC: Floppy Drive controller card.
  • GIO: General purpose I/O: 32 inputs, 32 outputs. With wiring for a Centronics parallel port, joystick ports, and a cassette tape interface.
Question
Maybe all peripheral boards should be full width? Then multiple cards can be combined?

Architectural Changes

Desired changes:

  • Some instruction transitions must be atomic (no interrupts).
  • Improve interrupts.
  • Better saving and restoring of computer state (including flags).
  • Perhaps move some units to I/O space?
  • Rethink the MBU.
  • Redesign the AGL and AIL as a result of the redesigned MBU.
  • Allow some instructions to repeat?
  • We use Forth. How about a better look at stacks?

Move Units to I/O Space

Saving and loading state is easier if crucial CFT registers are easier to save. Also, some registers (e.g the I register) could be moved directly to I/O space and thereby simplify the instruction set and microcode layout.

Things that should be accessible include:

  • Enabling or disabling interrupts using OUT.
  • Reading and writing the L register using OUT.
  • Reading and writing the V register using OUT.
  • Accessing the new memory scheme's 8 registers via IN and OUT.

Rethink the MBU

8 KW memory banking is a bit limiting and makes it necessary to reconfigure the banking all the time, which causes trouble with, e.g OS services and Forth. How about using bank registers like the 65C816? Upside: instructions can access multiple 64 KW blocks easily. Downside: more state to save on interrupts.

Considering using eight bank registers, like on the MBU. The first three have special meanings:

RegisterInitial valueUsed for
MB000 or FFAll instruction fetches and local mode addressing (I and R clear). Initialised to FF if ROM is installed. (how?)
MB1UnspecifiedPage Zero bank.
MB2UnspecifiedThe location of the stack pointer.

Then, the scheme where some addresses have special meaning is augmented to add index registers that index based on specific registers. These overlap the autoincrement registers and are split into 8 groups, each using one of the 8 banks for addressing.

This means that programs can now access directly up to 512 KW of memory without reconfiguring the banking scheme. With reconfiguration and 8-bit extension, up to 16 MW is available!

Suggested scheme:

  • &0000–&003F: page zero ‘registers’.
  • &0040–&0080: indexes. &0041 is the Call stack pointer.
  • &0080–&00BF: autoincrement.
  • &00C0–&00FF: autodecrement.

Note that bank registers do not increment or decrement when an incremented address over- or underflows. Address &12:FFFF + 1 is &12:0000.

Question
This is a powerful scheme (with precedents in the 60s and 70s), but to make it work properly a long JMP instruction of some sort is needed. (I think it'll replace TRAP which isn't used yet)

Let's Have Re-entrancy Please

The PDP-8 didn't support recursive subroutine calls because it was never meant for languages that supported recursion or re-entrancy. The PDP-8 jump to subroutine instruction (JMS) was executed, the PDP-8 would write the PC (return address) to the first word of the subroutine, then set the PC to the next address and execute from there. There was no hardware stack. There was also no ROM.

The CFT has ROM so this scheme isn't possible. It also has a far stupider system where the return address is always placed at the same address when jumping to a subroutine. After the MBU redesign, we have the ability to access a gargantuan 64 KW stack, so let's use that.

Corollary
We may now need a fourth major register, the stack register. Otherwise stack operations take six clock cycles each: two to read the stack pointer, two to read or write data from the stack, and two to write the stack pointer back. update thing is getting a little out of hand. In a nice way.

Rethink Page Zero Magic Addresses

In previous verisons, Page Zero had 128 magic addresses that auto-increment when used in indirect mode. This makes them work like very handy index registers, and they implement loops really tightly. (The PDP-8 had just 8 such ‘auto-indexed’ registers)

Using the new memory manager module, the scheme is extended to 256 registers in addresses 0100–01FF. These belong to four groups:

  • Normal registers.
  • Auto-increment index registers that are incremented after being used.
  • Auto-decrement index registers that are incremented after being used.
  • Stack pointer registers. These are incremented after being written to, and decremented bfeore being read from.

Every eighth register location indexes 24-bit memory using a different MBn memory bank, so even though there are 64 of each type, each memory bank has eight that use it. The register group is decided by the page zero bits 6–7. The MBn register is selected using the three least significant bits.

The Page Zero Memory Map looks like this:

AddressTentative namesWhat it is
0000–00FFR00RFFNormal page zero registers.
0100–013FB0R0B7R7Eight groups of registers referencing MB0–MB7.
0140–017FB0I0B7I7Eight groups of auto-increment pointers referencing MB0–MB7.
0180–01BFB0D0B7D7Eight groups of auto-decrement pointers referencing MB0–MB7.
01C0–01FFB0S0B7S7Eight groups of stack pointers referencing MB0–MB7.
0200–03FFNormal page zero registers.

And the addressing scheme looks like this:

Address PatternAddressesBehaviour
0000 0000 xxxx xxxx0000—00FFNormal registers
0000 0001 00xx x0000100–013FIndex MB0
0000 0001 00xx x0010100–013FIndex MB1
0000 0001 00xx X0100100–013FIndex MB2
0000 0001 00xx x0110100–013FIndex MB3
0000 0001 00xx x1000100–013FIndex MB4
0000 0001 00xx x1010100–013FIndex MB5
0000 0001 00xx x1100100–013FIndex MB6
0000 0001 00xx x1110100–013FIndex MB7
0000 0001 01xx x0000140–017FIndex MB0, auto-increment
0000 0001 01xx x0010140–017FIndex MB1, auto-increment
0000 0001 01xx x0100140–017FIndex MB2, auto-increment
0000 0001 01xx x0110140–017FIndex MB3, auto-increment
0000 0001 01xx x1000140–017FIndex MB4, auto-increment
0000 0001 01xx x1010140–017FIndex MB5, auto-increment
0000 0001 01xx x1100140–017FIndex MB6, auto-increment
0000 0001 01xx x1110140–017FIndex MB7, auto-increment
0000 0001 10xx x0000180–01BFIndex MB0, auto-decrement
0000 0001 10xx x0010180–01BFIndex MB1, auto-decrement
0000 0001 10xx x0100180–01BFIndex MB2, auto-decrement
0000 0001 10xx x0110180–01BFIndex MB3, auto-decrement
0000 0001 10xx x1000180–01BFIndex MB4, auto-decrement
0000 0001 10xx x1010180–01BFIndex MB5, auto-decrement
0000 0001 10xx x1100180–01BFIndex MB6, auto-decrement
0000 0001 10xx x1110180–01BFIndex MB7, auto-decrement
0000 0001 11xx x00001C0–01FFIndex MB0, stack pointer
0000 0001 11xx x00101C0–01FFIndex MB1, stack pointer
0000 0001 11xx x01001C0–01FFIndex MB2, stack pointer
0000 0001 11xx x01101C0–01FFIndex MB3, stack pointer
0000 0001 11xx x10001C0–01FFIndex MB4, stack pointer
0000 0001 11xx x10101C0–01FFIndex MB5, stack pointer
0000 0001 11xx x11001C0–01FFIndex MB6, stack pointer
0000 0001 11xx x11101C0–01FFIndex MB7, stack pointer
0000 001x xxxx xxxx0200—03FFNormal registers

Boot Vectors

Boot vector &FF:FFF8? Interrupt vector &FF:FFF0?

If &FF is used, ROM must be mapped downwards from bank &ff. But, really, who cares? We're only ever going to have one ROM bank.

Make OP1 and OP2 faster

When IFn is requested by microcode, bit n of the IR is tested. This action should also simultaneously check bits 0 to n-1 of the IR and allow microcode to end instruction execution. Otherwise, instructions execute for up to 10 cycles with no good reason to do so.

Hardware wise, when a microcode instruction asserts IFn and END simultaneously, the microprogram should end if bits are clear. Since END is active low, we OR it with a value derived from masking the IR and ORing its bits together. Here's a truth table for just two bits of the IR:

END-INIFMaskIREND-OUTNotes
1XX??XX1END not asserted.
000??XX0END asserted, IFn inactive.
00101X00Bit 0 is clear, end instruction.
00101X11Bit 0 is set, don't end.
01011000Bits 0 and 1 are clear, end instruction.
01011X11Bit 0 is set, don't end.
010111X1Bit 1 is set, don't end.

The mask is generated from the IF signal. Each bit of the mask is ORred with the corresponding bit in the IR. All of them are ORred together with the END signal, and that should be the extent of the circuitry needed.

The mask calculation is as follows:

IFnAYY₉Y₈Y₇Y₆Y₅Y₄Y₃Y₂Y₁Y₀
-000000000000000000000000
IF0000100000000010000000001
IF1001000000000110000000011
IF2001100000001110000000111
IF3010000000011110000001111
IF4010100000111110000011111
IF5011000001111110000111111
IF6011100011111110001111111
IF7100000111111110011111111
IF8100101111111110111111111
IF9101011111111111111111111

The terms for each of these are somewhat complex.

Y₀ = A₀ + A₁ + A₂ + A₃ = A₀ + Y₁
Y₁ = A₁ + A₂ + A₃      = A₁ + Y₃
Y₂ = A₂ + A₃ + (A₁A₀)  = Y₃ + (A₁A₀)
Y₃ = A₂ + A₃
Y₄ = A₃ + A₂(A₀+A₁)
Y₅ = A₃ + A₂A₁
Y₆ = A₃ + A₂A₁A₀
Y₇ = A₃
Y₈ = A₃(A₁+A₀)
Y₉ = A₃A₁

The actual IFn values in the microcode are different, so this will have to be reworked. I'm thinking of using a diode matrix ROM to make it easy to reconfigure things.

Redesign the Address Generation Logic

The MBU redesign precipitates a need to redesign the AGL. It now needs to have multiple enable inputs, to instruct it to generate addresses for:

  1. Fetching instructions, relative to MB0.
  2. Accessing Page Zero, when R is set in an instruction. This is relative to MB1.
  3. Page Zero, when R=1. This is relative to MB1.
  4. Addressing the stack, relative to MB2.
  5. Fetching data in local mode (I=0, R=0), relative to MB0.
  6. Fetching data via indirection using one of the magic index registers, relative to whichever register is referenced in the instruction operand.

In theory, the unit would now have to have four decoded control inputs:

  1. Form MB0-relative address, used for fetching and page-local data access.
  2. Form MB1-relative address, used for page-zero (R) access.
  3. Form MB2-relative address, used for stack access.
  4. Select register based on 3 least significant bits of operand, used in Page Zero magic register addresses.

In all other cases, MB0 is selected. The truth table for this is as follows:

MB0_AGLMB1_AGLMB2_AGLIRAddrIR2–0Selected Register
0XXXXXXXXXXMB0 for instruction fetch
10XXXXXXXXXMB1 page zero access
110XXXXXXXXMB2, stack access
11100XXXXXXMB0 local page access
11101XXXXXXMB1 page zero access
11110XXXXXXMB0 indirect mode
111110040–00FF000MB0 selected by IR2–0
111110040–00FF001MB1 selected by IR2–0
111110040–00FF010MB2 selected by IR2–0
111110040–00FF011MB3 selected by IR2–0
111110040–00FF100MB4 selected by IR2–0
111110040–00FF101MB5 selected by IR2–0
111110040–00FF110MB6 selected by IR2–0
111110040–00FF111MB7 selected by IR2–0

This truth table has an implicit 9 bits of input!

11111-1-- 100
11-11--1- 010
1-1-1---1 001
10------- 001
110------ 010
1-101---- 001

Redesign the Auto-Index Logic

The AIL now needs to send to the control unit two signals. Where before we had the AINDEX input, we now have AINC and ADEC. All things considered, this is a minor change.

Allow units to mark an instruction ‘atomic’

Use an open drain input to set a flip flop. The end of execution of the next instruction clears it. If interrupts are set, they are ignored while the flip-flop is set. This allows multi-word instructions to be implemented in a generic way. This mechanism exists for the µCB extension, so it's easy to generalise. The main purpose for doing this is to allow the implementation of a REP (Repeat) instruction that repeats the next instruction a set number of times.

Wishful Thinking: the REP instructions

Add an R register to the register file. Make it read-accessible using IN and OUT instructions so state can be saved and retrieved. REP repeats the following instructions, creating really tight (but simple), microcode loops.

To implement this, we need microcode jumps (which is another task altogether). The REP instruction should allow a few options:

  • Terminate when Z flag is set/clear: most useful for load instructions.
  • Terminate when N flag is set/clear.
  • Repeat a certain number of times.

Maybe add atomic versions of these? Maybe make all of them atomic? Atomic is definitely easier.

One Solution
Fetches always take two cycles. When REP is active and the condition hasn't been met, the END signal should reset the µPC to 2, rather than 0.

Verging on Crazy Talk Now: a multiply instruction

Can we use the REP instruction to implement multiplication? This can happen if shifts/rolls have an ‘accumulate’ bit that adds to the current value of the AC, or just using a magic register in I/O space and repeating the ADD instruction to accumulate.

Microcode Changes

There are a few fields in the Microcode Control Vector that can be made vertical, to make the µCV more compact. Suggest we combine into a single ‘action’ field:

  1. Memory read (MEMR)
  2. Memory write (MEMW)
  3. I/O in (IOR)
  4. I/O out (IOW)
  5. Decrement DR
  6. Decrement AC
  7. Clear Link (CLL)
  8. Complement Link (CPL)
  9. Set Interrupt Flag (STI)
  10. Clear Interrupt Flag (CLI)

Note: INCPC, CLI and END have to be independent.

This combines into a single 4-bit field a total of 10 bits, saving 6 bits. We can expand RUNIT to 5 bits and WUNIT to 4 bits and have 4 bits leftover.

Old FieldBitsNew fieldNew bits
RUNIT4RUNIT5
WUNIT3WUNIT4
OPIF4OPIF4
CLL1Part of OP4
CPL1Part of OP
STI1Part of OP
CLI1CLI1
INCPC1INCPC1
STPDR1Part of OP
STPAC1Part of OP
DEC1Part of OP
MEM1Part of OP
IO1Part of OP
R1Part of OP
WEN1Part of OP
END1END1
(unused)4
Question
How can we best use these 4 bits?
Follow-up Question
What would the updated µCV lights on the front panel look like?

Suggested layout of extended RUNIT and WUNIT fields.

ValueRUNITWUNIT
00000IdleIdle
00001Write to IR (moved)
00010Read from AGLIdle (was ‘Write to AR’)
00011Read from PCWrite to PC
00100Read from DRWrite to DR (moved)
00101Read from ACWrite to AC (moved)
00110Read from Data BusWrite to Data Bus
00111Write to ALU Port B
01000ALU AddWrite MB0:IBUS to AR.
01001ALU ANDWrite MB1:IBUS to AR.
01010ALU ORWrite MB2:IBUS to AR.
01011ALU XORWrite MBn:IBUS to AR (n is from IR[0..2])
01100ALU Rolls
01101ALU NOT
01110Constant Store 1
01111Constant Store 2
10000
10001
10010
10011
10100
10101
10110
10111
11000
11001
11010
11011
11100
11101
11110
11111
Idea
Really old versions of the CFT microcode treated the data bus as another read/write unit. How about doing the same now? I could add two read sources (‘Read from Memory Space’, ‘Read from I/O Space’) and two corresponding write destinations (‘Write to Memory Space’ and ‘Write to I/O Space’). If this is done, these four operations can be removed from the OP field, which can now be three bits wide instead of four. We now have five unused bits and nearly nothing to do with them. Luxury!

Instruction Set Changes

InstrOpcdIROperandSemantics
LJSR00000RmmmmmmmmmmPush PC, Push MB, MB0=[m], PC=[m+1]
TRAP000011mmmmmmmmmmPush PC, Push MB, MB0=[m], PC=[m+1]

| LJSR | 0000 | 00001Rmmmmmmmmmm | | | | | Push PC, Push MB0, MB0=[m++], PC=[m] [[42]++]=PC; [[42]++] [sp++]Push PC and MB0 to stack. Save MB0 and PC and jump to trap. [1]=PC; PC=a | | TRAP | 0000 | 00000Rmmmmmmmmmm | | | | | Save PC and jump to trap. [1]=PC; PC=a |