What's This Then?For those who just tuned in, this is about the CFT, my home-designed, home-built mini-computer made from scratch. I've designed and built the instruction set, data path, processor, computer, software stack, cross-assembler, emulator, software toolchain and even some of the tools needed to do this. You can call this a ‘fantasy mini’.
It all started with KiCad. I decided Eagle's board size limitations were becoming too stifling for me and I wasn't going to be duped into a subscription model when I don't use the software this regularly. KiCad 5 was a revelation. It's clunkier than Eagle in many ways, but much better in many others.
Having no limits on board size meant that a lot of the harder decisions I made in designing the CFT were suddenly lifted and I started questioning every single one of them. That led to a new backplane style, new, larger processor boards, a new front panel, a new, less bizarre DFP. And suddenly, the realisation that even the microcode and architecture has constraints that I could lift.
This document is about all of those constraints and whether they should really be lifted to make the CFT more capable and easier to program, or just stay there and keep it clunky and old-timey. Both ideas appeal to me greatly, so expect dilemmas and crippling indecision.
Problems to Solve
The current incarnation of the CFT is ugly and nearly unmaintainable. The problems:
- The backplane is not ideal for my needs. It requires side-buses for control signals that look ugly.
- Patching mistakes means redoing entire large boards. I need a better backplane and smaller, simpler, cheaper to redo boards.
- Too many connecting wires that can fail.
- The Control Bus is a horror.
- Both of these problems preclude me from building a card cage and enclosure: cards are different sizes, and the Control bus would make it difficult to pull the processor boards.
How to Solve These Problems
- A new front panel. The current front panel requires five 50-way ribbon cables going to nine modules. The proposed new design reduces this to one 40-way ribbon cable and two modules, with the modules daisy-chained.
- A new design for the DFP. The DFP has served well, but it's also a source of many cables. Five 50-way ribbon cables to the front panel, four 40-way cables to the processor boards, plus connections to the Control and Expansion Buses. That's a lot of wires. The proposed redesign will reduce this to one edge connector for the computer's buses and a 40-way cable to the front panel.
- I will (reluctantly) abandon the glorious 19-slot DIN 41612 backplane in favour of a custom-made one that uses cheaper edge connectors. The backplane is modular and can be expanded in width. Each slot has around 120 bussed signals and 40 unbussed ones. These can be wired in custom ways or, in the case of peripherals, brought to the rear of the computer to connect to sockets.
- Take advantage of the backplane design to split the processor boards into relatively generic, small boards. This makes it easy to have them fabricated (since most fab houses have a five board minimum order), easier to patch, cheaper to make, and makes the computer's innards look a little like the PDP-8's.
The current front panel is made out of 8 modules, each with 20 LEDs in five rows of four lights.
The updated front panel is to be made out of 4 identical modules, each with 40 LEDs in five rows of eight lights each. The modules are daisy-chained together using a very short 40-conductor cable. The first module connects to the DFP.
Data for lights is time-multiplexed 16-bits at a time at a high rate (ideally using a 16 MHz clock). Because of this high rate, the bus should include bus hold circuitry and/or be impedance matched.
Given there are 160 lights displayed on the front panel, lights are updated every ten clock ticks. To get all the lights to the front panel for every CFT instruction at full speed, this requires a clock ten times that of the CPU's clock, and that's 160 MHz. Bad idea! Alternatively, accept that at full speed, the lights won't be 100% accurate and it'd be absolutely impossible for a human to tell and use a 16 MHz clock for the lights, or whatever works reliably.
The front panel includes jumpers for controlling whether a group of eight lights is always on or controlled by the LTS switch.
The switches on the front panel are also sampled as a matrix, in groups. Interlocks and semantics for those are handled purely in software and up to 64 switch signals can be handled (eight rows of eight switches). This includes the panel lock switch. Both pins of the power switch are connected directly to the DFP since they directly engage the power supply.
Debugging Front Panel (DFP) Board
This will be a very wide board occupying the entire bottom of the card cage, and connected to all three edge connectors. It will include a much more powerful microcontroller with plenty of I/O pins. (probably still an Atmega though)
Safety interlocks using 74HC258 and 74HC138 chips will remain, but shift registers will go away in favour of sampling 8 bits at a time. There will also be a signal router combining those and sending them on the front panel.
The MCU should run a higher clock than the 7.3728 MHz, at least double that.
As I write this, the backplane design is in a good state of completion. It comes in three distinct rows of eight slots each. The connectors are cheap PCI Express 16-lane connectors with 164 pins. 124 pins are bussed. 40 are local to each card and broken out to a 40-pin 2.54mm pitch header. Any one or two-row header can be installed there, including multiple shrouded headers. Ribbon cable connectors can bring the signals to ports on the rear panel.
If a header isn't installed, the 40 pins can be wired at will, and this is the way the processor control signals will be connected between cards.
The backplane includes bypass capacitors and eight pins of power to each slot. There may be more eventually.
The same module is to be used for the processor and peripherals. A double-high bridge card will connect the three bus columns, provide bus hold circuitry, and also drive the data, address etc lines for the expansion bus.
Slot pitch is 25mm, and with 1.6mm cards, a 22mm spacer is required to build such a card.
I envisage the new backplane as vertical, slots facing forward. Cards are installed horizontally, like the PDP-8. 40-pin slot-local connectors are installed in the rear, with cables running to the rear panel. The cable to the front panel is either installed in front (reversing pin numbers) or plugs in to the back and passes under the backplane and card cage to go to the front panel.
Processor and expansion boards
All boards should be the same height (depth when installed on the backplane). The minimum width is around 100 mm. I can't decide between a cheap 80×100 mm and a more practical 160×100 mm. I think the latter will win out because it's more suitable for DIP packages.
Cards can be single, double, and triple width.
|Card Type||Width (mm)||Height (mm)|
|Single width||100 + A||160|
|Double width||2×(100 + A) + B||160|
|Triple width||3×(100 + A) + 2×B||160|
A is an variable to be decided. Dimension
B adjusts for the gap between slots, and is also to be decided.
To keep costs down and debugging cheaper, I'm trying to go for single width cards where possible, each handling 8 bits of data. That implies that e.g. registers would come in two cards each. I will attempt to make cards as generic as possible to allow reuse so, for instance, I can get ten register board prototypes made and use six of them for the major registers. Some cards are very simple and can be implemented on suitable, single-width prototype boards. These are the card types so far:
|Single width||Generic prototyping||General purpose for DIP chips, e.g for clock generator, reset, SBL, AIL.|
|Single width||ROM board||One or two PLCC flash chips plus room for decoder ICs for microcode, ALU, ROM.|
|Single width||Major register board||8 bits of a major register with prototyping space.|
|Triple width||Bus driver||Bus hold circuitry. Bridges slot columns. Drives expansion bus.|
|Triple width||DFP 2||Diagnostics, debugging, remote testing, front panel controller.|
|Triple width||Peripheral prototyping||For building custom peripherals.|
|Single width||CPLD board||For building the VDU board and possibly similar boards.|
Each card, including prototyping ones should have one activity LED in the exact same spot. There will be space for between one and six monostable multivibrators to make short pulses visible. Each LED needs one sixth of a Schmitt trigger chip (or a single gate one), one signal diode, one clamping diode, one capacitor and one resistor.
Each card will have a card holder. I have found cheap multi-coloured ones from Vero Technologies.
This design initially leaves space for just eight expansion boards, so the prioritised list is:
|Quad serial card||The first peripheral board on the computer!|
|8 Interrupt board||Move to the processor side or combine with TMR.|
|IDE Interface plus on-board CF card slot|
|Real time clock, timers, NVRAM||Useful but not immediately necessary.|
Once these are done, the next stage includes adding the KBD, VDU and SND boards, which may actually be merged into one double or triple width board.
The full set of original boards is like this:
- MEM: 1.5 MByte RAM/ROM board.
- µCB: 4-bit microcode store extension.
- IRC: 8 interrupt controller.
- MBU: 5-bit address space extension.
- TTY: quad serial card, 16550.
- IDE: double IDE interface (four drives)
- TMR: real time clock, programmable timers, NVRAM.
- ETH: Ethernet controller using the WIZ5100.
- KBD: PS/2 keyboard controller.
- PSG: Two General Instruments AY-3-8910 audio chips.
- SPJ: SpeakJet® speech synthesiser (located on PSG board)
- PFP: Programmer's front panel controller. (superseded by DFP)
- DEB: Debugging card. (superseded by DFP)
- DFP: Debugging Front Panel card.
- FDC: Floppy Drive controller card.
- GIO: General purpose I/O: 32 inputs, 32 outputs. With wiring for a Centronics parallel port, joystick ports, and a cassette tape interface.
QuestionMaybe all peripheral boards should be full width? Then multiple cards can be combined?
- Some instruction transitions must be atomic (no interrupts).
- Improve interrupts.
- Better saving and restoring of computer state (including flags).
- Perhaps move some units to I/O space?
- Rethink the MBU.
- Redesign the AGL and AIL as a result of the redesigned MBU.
- Allow some instructions to repeat?
- We use Forth. How about a better look at stacks?
Move Units to I/O Space
Saving and loading state is easier if crucial CFT registers are easier to save. Also, some registers (e.g the
I register) could be moved directly to I/O space and thereby simplify the instruction set and microcode layout.
Things that should be accessible include:
- Enabling or disabling interrupts using
- Reading and writing the L register using
- Reading and writing the V register using
- Accessing the new memory scheme's 8 registers via
Rethink the MBU
8 KW memory banking is a bit limiting and makes it necessary to reconfigure the banking all the time, which causes trouble with, e.g OS services and Forth. How about using bank registers like the 65C816? Upside: instructions can access multiple 64 KW blocks easily. Downside: more state to save on interrupts.
Considering using eight bank registers, like on the MBU. The first three have special meanings:
|Register||Initial value||Used for|
|00 or FF||All instruction fetches and local mode addressing (|
|Unspecified||Page Zero bank.|
|Unspecified||The location of the stack pointer.|
Then, the scheme where some addresses have special meaning is augmented to add index registers that index based on specific registers. These overlap the autoincrement registers and are split into 8 groups, each using one of the 8 banks for addressing.
This means that programs can now access directly up to 512 KW of memory without reconfiguring the banking scheme. With reconfiguration and 8-bit extension, up to 16 MW is available!
- &0000–&003F: page zero ‘registers’.
- &0040–&0080: indexes. &0041 is the Call stack pointer.
- &0080–&00BF: autoincrement.
- &00C0–&00FF: autodecrement.
Note that bank registers do not increment or decrement when an incremented address over- or underflows. Address
&12:FFFF + 1 is
QuestionThis is a powerful scheme (with precedents in the 60s and 70s), but to make it work properly a long
JMPinstruction of some sort is needed. (I think it'll replace
TRAPwhich isn't used yet)
Let's Have Re-entrancy Please
The PDP-8 didn't support recursive subroutine calls because it was never meant for languages that supported recursion or re-entrancy. The PDP-8 jump to subroutine instruction (
JMS) was executed, the PDP-8 would write the PC (return address) to the first word of the subroutine, then set the PC to the next address and execute from there. There was no hardware stack. There was also no ROM.
The CFT has ROM so this scheme isn't possible. It also has a far stupider system where the return address is always placed at the same address when jumping to a subroutine. After the MBU redesign, we have the ability to access a gargantuan 64 KW stack, so let's use that.
CorollaryWe may now need a fourth major register, the stack register. Otherwise stack operations take six clock cycles each: two to read the stack pointer, two to read or write data from the stack, and two to write the stack pointer back. update thing is getting a little out of hand. In a nice way.
Rethink Page Zero Magic Addresses
Page Zero has 128 magic addresses that auto-increment when used in indirect mode. This makes them work like very handy index registers, and they implement loops really tightly.
Using the new memory manager module, the scheme is extended to also select a memory bank based on the address in Memory Space of the index ‘register’.
Why not extend this? Page Zero on the CFT is huge and has a lot of slack space. Some ideas:
- Make half of the index registers decrement rather than increment.
- Make some of them ‘trap’ indexes. If a JMP involves one of them, disable interrupts before jumping.
Suggested Page Zero scheme:
|0000 0000 00xx xxxx||0000—003F||Normal registers|
|0000 0000 01xx x000||0040–007F||Index |
|0000 0000 01xx x001||0040–007F||Index |
|0000 0000 01xx X010||0040–007F||Index |
|0000 0000 01xx x011||0040–007F||Index |
|0000 0000 01xx x100||0040–007F||Index |
|0000 0000 01xx x101||0040–007F||Index |
|0000 0000 01xx x110||0040–007F||Index |
|0000 0000 01xx x111||0040–007F||Index |
|0000 0000 10xx x000||0080–00BF||Index |
|0000 0000 10xx x001||0080–00BF||Index |
|0000 0000 10xx x010||0080–00BF||Index |
|0000 0000 10xx x011||0080–00BF||Index |
|0000 0000 10xx x100||0080–00BF||Index |
|0000 0000 10xx x101||0080–00BF||Index |
|0000 0000 10xx x110||0080–00BF||Index |
|0000 0000 10xx x111||0080–00BF||Index |
|0000 0000 11xx x000||00C0–00FF||Index |
|0000 0000 11xx x001||00C0–00FF||Index |
|0000 0000 11xx x010||00C0–00FF||Index |
|0000 0000 11xx x011||00C0–00FF||Index |
|0000 0000 11xx x100||00C0–00FF||Index |
|0000 0000 11xx x101||00C0–00FF||Index |
|0000 0000 11xx x110||00C0–00FF||Index |
|0000 0000 11xx x111||00C0–00FF||Index |
So, a full memory map of Page Zero would be:
|Address||Tentative name||What it is|
|0000||First P0 register.|
|0001||Second P0 register.|
|0100||OS Trap 0, high-order address bits. (bank)|
|0101||OS Trap 0, low-order address bits.|
|0102||OS Trap 1, high-order address bits. (bank)|
|0103||OS Trap 1, low-order address bits.|
|02FE||OS Trap 255, high-order address bits. (bank)|
|02FF||OS Trap 255, low-order address bits.|
The OS traps aren't set in stone, these 512 locations in Page Zero could be used for anything. They are very convenient for OS calls (traps), but we also store convenient constants etc. in Page Zero.
Boot vector &FF:FFF8? Interrupt vector &FF:FFF0?
If &FF is used, ROM must be mapped downwards from bank &ff. But, really, who cares? We're only ever going to have one ROM bank.
IFn is requested by microcode, bit
n of the IR is tested. This action should also simultaneously check bits 0 to n-1 of the IR and allow microcode to end instruction execution. Otherwise, instructions execute for up to 10 cycles with no good reason to do so.
Hardware wise, when a microcode instruction asserts
IFn and END simultaneously, the microprogram should end if bits are clear. Since END is active low, we OR it with a value derived from masking the IR and ORing its bits together. Here's a truth table for just two bits of the IR:
|END not asserted.|
|END asserted, IFn inactive.|
|Bit 0 is clear, end instruction.|
|Bit 0 is set, don't end.|
|Bits 0 and 1 are clear, end instruction.|
|Bit 0 is set, don't end.|
|Bit 1 is set, don't end.|
The mask is generated from the
IF signal. Each bit of the mask is ORred with the corresponding bit in the IR. All of them are ORred together with the END signal, and that should be the extent of the circuitry needed.
The mask calculation is as follows:
The terms for each of these are somewhat complex.
Y₀ = A₀ + A₁ + A₂ + A₃ = A₀ + Y₁ Y₁ = A₁ + A₂ + A₃ = A₁ + Y₃ Y₂ = A₂ + A₃ + (A₁A₀) = Y₃ + (A₁A₀) Y₃ = A₂ + A₃ Y₄ = A₃ + A₂(A₀+A₁) Y₅ = A₃ + A₂A₁ Y₆ = A₃ + A₂A₁A₀ Y₇ = A₃ Y₈ = A₃(A₁+A₀) Y₉ = A₃A₁
IFn values in the microcode are different, so this will have to be reworked. I'm thinking of using a diode matrix ROM to make it easy to reconfigure things.
Redesign the Address Generation Logic
The MBU redesign precipitates a need to redesign the AGL. It now needs to have multiple enable inputs, to instruct it to generate addresses for:
- Fetching instructions, relative to
- Accessing Page Zero, when
Ris set in an instruction. This is relative to
- Page Zero, when
R=1. This is relative to
- Addressing the stack, relative to
- Fetching data in local mode (
R=0), relative to
- Fetching data via indirection using one of the magic index registers, relative to whichever register is referenced in the instruction operand.
In theory, the unit would now have to have four decoded control inputs:
- Form MB0-relative address, used for fetching and page-local data access.
- Form MB1-relative address, used for page-zero (
- Form MB2-relative address, used for stack access.
- Select register based on 3 least significant bits of operand, used in Page Zero magic register addresses.
In all other cases, MB0 is selected. The truth table for this is as follows:
|0||X||X||X||X||XXX||XXX||MB0 for instruction fetch|
|1||0||X||X||X||XXX||XXX||MB1 page zero access|
|1||1||0||X||X||XXX||XXX||MB2, stack access|
|1||1||1||0||0||XXX||XXX||MB0 local page access|
|1||1||1||0||1||XXX||XXX||MB1 page zero access|
|1||1||1||1||0||XXX||XXX||MB0 indirect mode|
|1||1||1||1||1||0040–00FF||000||MB0 selected by IR2–0|
|1||1||1||1||1||0040–00FF||001||MB1 selected by IR2–0|
|1||1||1||1||1||0040–00FF||010||MB2 selected by IR2–0|
|1||1||1||1||1||0040–00FF||011||MB3 selected by IR2–0|
|1||1||1||1||1||0040–00FF||100||MB4 selected by IR2–0|
|1||1||1||1||1||0040–00FF||101||MB5 selected by IR2–0|
|1||1||1||1||1||0040–00FF||110||MB6 selected by IR2–0|
|1||1||1||1||1||0040–00FF||111||MB7 selected by IR2–0|
This truth table has an implicit 9 bits of input!
11111-1-- 100 11-11--1- 010 1-1-1---1 001 10------- 001 110------ 010 1-101---- 001
Redesign the Auto-Index Logic
The AIL now needs to send to the control unit two signals. Where before we had the
AINDEX input, we now have
ADEC. All things considered, this is a minor change.
Allow units to mark an instruction ‘atomic’
Use an open drain input to set a flip flop. The end of execution of the next instruction clears it. If interrupts are set, they are ignored while the flip-flop is set. This allows multi-word instructions to be implemented in a generic way. This mechanism exists for the µCB extension, so it's easy to generalise. The main purpose for doing this is to allow the implementation of a
REP (Repeat) instruction that repeats the next instruction a set number of times.
Wishful Thinking: the REP instructions
R register to the register file. Make it read-accessible using
OUT instructions so state can be saved and retrieved. REP repeats the following instructions, creating really tight (but simple), microcode loops.
To implement this, we need microcode jumps (which is another task altogether). The REP instruction should allow a few options:
- Terminate when Z flag is set/clear: most useful for load instructions.
- Terminate when N flag is set/clear.
- Repeat a certain number of times.
Maybe add atomic versions of these? Maybe make all of them atomic? Atomic is definitely easier.
One SolutionFetches always take two cycles. When
REPis active and the condition hasn't been met, the
ENDsignal should reset the µPC to 2, rather than 0.
Verging on Crazy Talk Now: a multiply instruction
Can we use the
REP instruction to implement multiplication? This can happen if shifts/rolls have an ‘accumulate’ bit that adds to the current value of the AC, or just using a magic register in I/O space and repeating the
ADD instruction to accumulate.
There are a few fields in the Microcode Control Vector that can be made vertical, to make the µCV more compact. Suggest we combine into a single ‘action’ field:
- Memory read (MEMR)
- Memory write (MEMW)
- I/O in (IOR)
- I/O out (IOW)
- Decrement DR
- Decrement AC
Clear Link (CLL)
- Complement Link (CPL)
- Set Interrupt Flag (STI)
- Clear Interrupt Flag (CLI)
Note: INCPC, CLI and END have to be independent.
This combines into a single 4-bit field a total of 10 bits, saving 6 bits. We can expand RUNIT to 5 bits and WUNIT to 4 bits and have 4 bits leftover.
|Old Field||Bits||New field||New bits|
|CLL||1||Part of OP||4|
|CPL||1||Part of OP|
|STI||1||Part of OP|
|STPDR||1||Part of OP|
|STPAC||1||Part of OP|
|DEC||1||Part of OP|
|MEM||1||Part of OP|
|IO||1||Part of OP|
|R||1||Part of OP|
|WEN||1||Part of OP|
QuestionHow can we best use these 4 bits?
Follow-up QuestionWhat would the updated µCV lights on the front panel look like?
Suggested layout of extended RUNIT and WUNIT fields.
|00001||Write to IR (moved)|
|00010||Read from AGL||Idle (was ‘Write to AR’)|
|00011||Read from PC||Write to PC|
|00100||Read from DR||Write to DR (moved)|
|00101||Read from AC||Write to AC (moved)|
|00110||Read from Data Bus||Write to Data Bus|
|00111||Write to ALU Port B|
|01000||ALU Add||Write MB0:IBUS to AR.|
|01001||ALU AND||Write MB1:IBUS to AR.|
|01010||ALU OR||Write MB2:IBUS to AR.|
|01011||ALU XOR||Write MBn:IBUS to AR (n is from IR[0..2])|
|01110||Constant Store 1|
|01111||Constant Store 2|
IdeaReally old versions of the CFT microcode treated the data bus as another read/write unit. How about doing the same now? I could add two read sources (‘Read from Memory Space’, ‘Read from I/O Space’) and two corresponding write destinations (‘Write to Memory Space’ and ‘Write to I/O Space’). If this is done, these four operations can be removed from the OP field, which can now be three bits wide instead of four. We now have five unused bits and nearly nothing to do with them. Luxury!
Instruction Set Changes
|LJSR||0000||0||R||mmmmmmmmmm||Push PC, Push MB, MB0=[m], PC=[m+1]|
|TRAP||0000||1||1||mmmmmmmmmm||Push PC, Push MB, MB0=[m], PC=[m+1]|
| LJSR | 0000 | 00001Rmmmmmmmmmm | | | | | Push PC, Push MB0, MB0=[m++], PC=[m] [++]=PC; [++] [sp++]Push PC and MB0 to stack. Save MB0 and PC and jump to trap. =PC; PC=a | | TRAP | 0000 | 00000Rmmmmmmmmmm | | | | | Save PC and jump to trap. =PC; PC=a |