Chapter B1. Programming Model

The conversion from the LaTeX source of The CFT Book is very much a work in progress. Things are quite broken for the time being, but I expect them to improve soon, and by leaps and bounds. Please ignore any weird colours and typesetting!

This chapter discusses the architecture of the CFT processor from a programmer's perspective. The CFT is a solid-state, 16-bit architecture reminiscent, among others, of the DEC PDP-8, the first computer to be famously described as a ‘mini’ with a goodly portion of MOS 6502 thrown in.

B1.1. CFT—The Little Processor That Couldn't

Look, let's face it. You're here to be amazed. ‘Hey look, this person built his own computer! Cool!’ (said no-one ever) You're probably reading this on a computer with a multi-core CPU with billions of transistors, billions of bits of memory, running at billions of cycles a second. If a device can't push literally 15 billion bits of data second to its screen every second, you threaten to murder whoever designed it. It's okay. We're all jaded. So, time to face facts. You're here to be amazed.

You won't. It's best you face the disappointment now, and then we can manage your expectations together. Sounds good? Good.

So. The CFT processor itself is a fairly unsurprising design on the whole. It is a stored program computer, a Von Neumann architecture architecture machine. Just like the device you're reading this on, its programs and data are stored in the same type of memory. It just has much, much less memory.

All instructions are exactly 16-bits wide and are made up of a single word each. Each instruction word includes a four-bit instruction code and two bits that select the addressing mode. The remaining ten bits are the instruction's operand.

Like the 6502, the PDP-8, and many of the oldest computers, and unlike the device you're reading this on, the CFT is an Accumulator Architecture: most instructions operate on a single, general purpose register known as the AC. Including the Accumulator, there are ten registers|registers available: three 16-bit major registers, two 16-bit minor registers, and five single-bit flag registers that can be tested to make decisions. Of course, only the Accumulator is accessible directly.

In keeping with the mini theme, the processor is built around a 16-bit word length. It can only access memory in 16-bit quantities. Depending on how you want to see it, it either lacks the ability to process bytes, or its bytes can be anywhere from 5 to 16 bits wide. The PDP-8 was the same, except it could only access 12-bit values and stored 6-bit bytes in ‘SIXBIT’ encoding. In most of this book, I be using the term Word and KW to denote 16-bit words and blocks of 1,024 16-bit words respectively. To make up for the lack of registers, I follow the PDP-8 and 6502 school of thought: adressing modes to access the lower 1,024 words and use them as global variables or other registers. Unlike the 6502, but like the PDP-8, 128 of these locations can auto-increment after they're accessed, making them behave like crude index registers. This makes coding loops a lot tighter than you'd expect from a processor that's 90% limitations, 10% happy accidents.

Like many recent home-designed processors, the CFT is microcoded. Rather than relying on a hardwired control unit, the instruction set is built as a huge truth table using ROMs. This allows instructions to be debugged without rewiring the processor, and has been a great boon in developing new behaviours and new instructions whose need became apparent as the scope of the project ~~got more and more Byzantine~~ extended.

The instruction set itself started off being very simple and orthogonal, just like the PDP-8's. After several versions, this is no longer the case. There are some unusual non-orthogonal instructions.

As you'd expect from a tiny home-designed machine, there are many things missing from this design.

There are no privilege levels. Bye bye multi-user operating systems.
There are no processor exceptions. Bye bye memory management.
There are no memory managing features. So far so bad: the CFT will never run Unix. But wait, there's less.
There is no pipelining. Bye bye speed.
There is no speculative execution. Bye bye even more speed. At least we'll never be vulnerable to Spectre exploits either.
There is no hardware stack. Bye bye Forth. Oh, wait. We do have Forth because masochism.
There is no floating point arithmetic. So no, it won't run Crysis.
There is no integer division.
There is no integer multiplication.
Hell, there's no integer subtraction.
You can't shift or roll by an arbitrary number of bits.
Actually, you can't shift, full stop. But we fake that with trickery.

On the other hand, the PDP-8 had less than even this, and did pretty well for itself. The CFT has many instructions with cunning semantics that more or less alleviate some of the shortcomings. Sure, there's no stack. But like the PDP-8, there is support for hardware subroutines. The CFT also has a dedicated instruction for OS system calls. And lacks a dedicated instruction to return from either. A lot is achieved with very little using microcode tricks and simple semantics that turn many instructions into machine-language Swiss-army knives.

In the style of 1960s computers, the CFT's simplicity blurs the dividing line between processor and peripherals and allows for instruction set extensions to be provided by peripherals or co-processors.

Speaking of peripherals, the CFT is a pure 16-bit machine. It can address up to 64 kiloWords of RAM—remember, that's 65,536 words, not 65000 words (or, you know 50,000 words if your job is making hard drives). A processor extension we'll discuss later removes this limitation. (while adding other, more ~~severe shortcomings~~ refreshingly challenging problems to solve)

So far the design has been very Accumultor-y, and what's more Accumulator-y than the 6502? (and isn't a Hollerith tabulating machine?). To attract the Z80 crowd too, the CFT can also address up to 65,536 words of I/O space—what Zilog and Intel termed ‘I/O ports’. This is separate from memory space and is accessed using different instructions. Limitations in the instruction make accessing the top 64,512 I/O addresses challenging or slow, whichever has fewer letters. This makes it easier and faster to access the lower 1,024 I/O addresses. 1,024 I/O addresses should be enough for anyone.

The next sections will discuss this trainwreck of a design in (possibly literally) painful detail.

B1.2. Processor Speed and Synchronisation

The processor runs at 4 MHz. Each of these 250 ns periods is one processor cycle. Each instruction takes between two and and 11 clock cycles to execute, with one microprogram step per processor cycle. The shortest microprogram is currently three processor cycles (two steps, 750 ns). The biggest possible microprogram is 16 cycles (4 µs), but processor extensions and I/O devices may cause the processor to wait indefinitely.

The clock source may be overridden by an external source, or stopped altogether. Single-step operation is also available—it steps the processor one cycle at a time. This feature is used by the Debugging Front Panel (DFP) to slow the computer down for debugging, or to single step either by one microstep (processor cycle) or a whole instruction.

B1.3. Power On and Reset

The processor may be reset using the front panel, a push button on PB1, or the power supply. During initial power on, the power supply keeps the processor in the reset state. Once the power supply stabilises, the reset condition is deasserted.

The processor then goes into a Reset Hold condition and waits for a preset number of processor cycles for its clock generator and other units to stabilise. During this time, a number of registers are reset to their initial values. When the cycle count expires, the Reset Hold condition is also lifted, and execution begins at address FFF0—sorry Z80 people, the CFT boots just like the 6502.

B1.4. Word Size

The word size is 16 bits. There are no facilities for accessing quantities smaller than one word, and no single-instruction facilities for accessing quantities longer than one word.

In this book, one kilo-Word (kW) is 2¹⁰ = 1,024 Words.

B1.5. Data Types

The CFT processor's hardware is dimly aware of the following data types:

16-bit signed integers representing the range 0 to 65,535.
16-bit unsigned integers representing the range -32,768 to 32,767 in two's complement.
16-bit unsigned integers representing the range -32,767 to 32,767 in one's complement.

Since the only arithmetic operation supported is addition, the architecture is data type agnostic: the same circuitry can handle both signed and unsigned numbers. Facilities to detect numeric overflow exist for both unsigned and two's complement signed integers. One's complement support is pure happenstance and sketchy: the hardware doesn't consider -0 to be the same as +0. (yes, one's complement has two representations for zero)

All other support (and even signed integer support, really) is implemented in software.

B1.6. Addressing

The CFT architecture can address up to 64 kW of memory, plus up to 64 kW of I/O space, although the first 1,024 Words of I/O space are considerably easier to use.

B1.6.1. Memory Space

Main memory is split up into Page|pages, each 1,024 Words. Instructions usually reference memory addresses relative to the page they are executing in. This is a little bit like the universally loathed segmented architecture of the Intel 8086, minus the popularity. Other than convenience and the fact that instruction operands are ten bits wide, nothing stops a program from accessing any memory page and 63 of the 64 pages in the memory space don't have any special semantics.

The one exception is the first page, Zero Page, is given special treatment by the instruction set. All instructions that access memory can also reference memory or I/O addresses in Zero Page, no matter what page they are executing in. As such, Zero Page is always used for system variables, constants, operating system vectors and other data that must be globally accessible.

The 128 words in Zero Page addresses 0080–00FF (inclusive) are so-called Autoindex Registers. Using the Autoindex addressing mode, each time one of these addresses is referenced, it is automatically incremented by one. This allows loops to be coded very tightly, with index register increments happening at the microcode level. The PDP-8 implemented autoindex registers in the same way, it just had fewer of them.

Given this information, here is a memory map of the CFT processor.

Address	Contents
0000	RETV — return address for last JSR instruction.
0001	RTTV — return address for last TRAP instruction.
0002	RTIV — return address for last ISR.
0080	First autoindex register.
00FF	Last (128th) autoindex register.
⋮	⋮
03FF	Last word of Zero Page.
0400	First word of Page 1.
⋮	⋮
8000	First word of Page 32.
⋮	⋮
FC00	First word of Page 63.
FFF0	Boot/reset address.
FFF8	Interrupt service routine.
FFFF	Highest memory address.

B1.6.2. I/O Space

Although there are 64 kW of I/O space, this is limited by practical restrictions. I/O space accesses are subject to the same limitations as memory space: either the local page or Zero Page can be accessed. However, Page Local mode makes no sense in I/O space (the device accessed could change depending on where in memory the accessor routine is executing), so it is never used. Instead, all I/O space transactions are done in ‘Zero Page’, effectively limiting I/O space to 1,024 addresses.

Accessing addresses in the range 0400-FFFF is still possible using indirect addressing, but slower than Zero Page accesses.

For speed and convenience, all I/O devices thus occupy the first 1,024 words of the I/O address space.

To make decoding addresses easier for peripherals, the processor decodes bits 8 to 15 of the I/O address and provides four signals like this:

Addresses	SYSDEV	IODEV1XX	IODEV2XX	IODEV3XX
0000-00FF	0	1	1	1
0100-01FF	1	0	1	1
0200-02FF	1	1	0	1
0300-03FF	1	1	1	0
0400-FFFF	1	1	1	1

Most peripheral devices can then get away with minimal address decoding.

B1.7. Registers

There are ten registers in the CFT architecture. They are split into the major registers, minor registers, and flag registers. Major and minor registers are all 16 bits wide. Flag registers are one bit wide.

Major Registers

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

Accumulator (AC)
Program Counter (PC)
Data Register (DR)

Minor Registers

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

Instruction Register (IR)
Address Register (AR)

Flags

0

L

0

N

0

Z

0

V

0

I

B1.7.1. Major Registers

These registers are used directly by the programming model. These are all 16-bit registers. The hardware provides facilities for these registers to be read from, written to, and incremented.

B1.7.1.1. Accumulator (AC)

This is a 16-bit register, or perhaps the 16-bit register. It is the only register directly and fully accessible via the instruction set and the one almost all instructions operate on. The hardware allows this register to be read from, written to, incremented and decremented.

B1.7.1.2. Program Counter (PC)

A 16-bit register. It contains the address in memory of the next instruction to be executed. The hardware provides facilities to read from, write to, and increment this register. The instruction set lacks a direct means of reading from the register, although there are indirect ways of doing this. Instructions that skip, jump, call subroutines or traps, and interrupts modify this register.

B1.7.1.3. Data Register (DR)

This is a 16-bit register used internally by the CFT to buffer addresses during indirect addressing. The hardware can read from, write to, increment and decrement this register. The current instruction set can't access this register directly.

B1.7.2. Minor Registers

These registers are used internally by the processor, and are built with less functionality than major registers.

B1.7.2.1. Instruction Register (IR)

The Instruction Register (IR) holds the instruction currently being executed. It is 16 bits wide and there is no way to access it programmatically.

B1.7.2.2. Address Register (AR)

The Address Register (AR) is a 16-bit write-only register that drives the Address Bus. The AR is responsible for all memory and I/O addressing. It cannot be accessed by the user directly.

B1.7.3. Flag Registers

These are single-bit flags. They are used to sense or set the state of the system and form the basis of flow control.

B1.7.3.1. Link Register (L)

Used as a carry bit during arithmetic, effectively extending the AC register to 17 bits, and as the 17th bit during roll instructions. It may be used as a generic flag by user programs and may be tested, set or cleared by user programs.

B1.7.3.2. Negative Flag (N)

This flag always follows the value of the most significant bit of the AC, which can be used to denote negative numbers. If set, interpreting the AC as a signed number indicates a negative value. If treating the AC as an unsigned quantity, the N flag is a good means of testing the highest-order bit.

B1.7.3.3. Zero Flag (Z)

Cannot be controlled directly by the user. This flag register is set when AC is zero, i.e. all 16 bits are clear. This is among the fastest means of numerical comparison on the CFT architecture.

B1.7.3.4. Overflow Flag (V)

Cannot be controlled directly by the user. This flag is set when a two's complement signed addition yields a result that will not fit in 16 bits.

The Z80 Did It Differently

After the Z80 performs a logic operation, its V flag is used to store the parity of the result. This makes it trivial to calculate parity. It looks like the CFT's ALU can handle this and all it would take is regenerating the ALU ROM tables and renaming a few signals in the schematics to account for the new semantics.

Whether there's a point to this exercise is another thing entirely: the CFT, like all minis, delegates such menial tasks to peripheral hardware. Serial cards check their own parity, disk interfaces have their own checks, and the CFT's RAM is parity-less because there's nothing as exhilarating as debugging memory corruption errors. (spot the exaggeration)

B1.7.3.5. Interrupt Register (I)

This single-bit register controls the computer’s behaviour on detecting an interrupt request. The register may be manipulated by the user to allow or mask interrupts. This flag is write-only.

B1.7.4. Zero Page Registers

In addition to the processor registers, the programming model treats memory addresses 0000-03FF as 1,024 Zero Page registers. These are often termed simply ‘registers’ in the context of CFT Assembly. This is identical to the way both the PDP-8 and 6502 treated their own equivalent pages. In practical use, a ‘register’ in this context is the equivalent of a global variable, and it is up to the operating system to decide how Zero Page is laid out and utilised.

There are three exceptions to this rule. These three Zero Page addresses are used by the processor itself:

Address 0000 stores the return address for the last jump to subroutine (JSR) instruction.
Address 0001 stores the return address for the last jump to trap (TRAP) instruction.
Address 0002 stores the return address for the last jump to the interrupt handler routine.

The PDP-8 Did It Differently

The PDP-8 stored the return address for a subroutine at the first address of the subroutine itself, then jumped to the next address. This is a fantastic idea, and it would fit the CFT like a glove. If it weren't for the fact it was designed for a computer without ROM. You can't return from a call to a subroutine in ROM because you can't write the return address to it!

So the painful decision was made to store the return address to a single location in RAM, and let the subroutine decide what to do with it. Generally, CFT subroutine calling conventions involve pushing the return address onto a (software) stack immediately.

B1.8. Instruction Format

The CFT uses a single instruction format. All instructions are exactly 16 bits wide and made up of the same fields. Each instruction contains four fields, as follows:

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

Opcode				I	R	Operand

From most to least significant, they are:

Instruction opcode

(most significant 4 bits). This field identifies the instruction to be performed.

Indirection Mode

(1 bit). Depending on the instruction, this bit selects between the literal and direct, or the direct and indirect addressing mode. Note that in some cases, entirely different instructions (for which indirection does not apply anyway) are selected when this bit is set.

Register Mode

(1 bit). This bit controls whether addresses and literals are relative to the current page, or relative to Zero Page (the register page).

Operand or address offset

(least significant 10 bits). This allows ten bits of literals or addresses to be specified in an instruction. The most significant six bits are filled in from one of two sources as follows:

If the Register Mode bit is set (1), the six most significant bits of the Program Counter (PC) are used. This is page-relative addressing.
If the Register Mode bit is clear (0), the six most significant bits of the operand are zero. This is Register addressing, also known as Zero Page addressing, since Zero Page is used for system registers.

The instruction format imposes some limitations, but limitations are fun, aren't they?

An instruction may only access 2,048 locations of memory. If it sets the Register Mode (R Field (R)) bit, only the 1,024 words in Zero Page may be accessed. If the bit is clear, only the 1,024 word page the instruction is executing in may be accessed. The PDP-8 was plagued by the same issue, but then again so was every single RISC microprocessor out there. The RISC solutions were considerably less masochistic than the PDP-8 and CFT ones. Here's what's done on the CFT to mitigate the problem somewhat.

RAM-based subroutines store temporary data on their own page, or use special ‘scratch’ registers in Zero Page. Both variants take the same time to execute.

In the case of literals, we either limit ourselves to constants in the range 000-3FF, store commonly-used large constants (such as -1, FFFF and -2, FFFE) in a Zero Page constant table, or a combination of these techniques. Assemblers on the PDP-8 did their best to deal with this problem.

We must take special care when subprograms cross a page boundary. When that happens, code referencing any local data will instead refer to the same offset within the new page, and will be invalid. The CFT Assembler issues a cross-page warning when using symbolic names (labels) for such local data.

Jumps, subroutine calls and traps suffer from the same issue. Indirect Register Addressing is commonly used as a solution to this problem: the address (vector) of the subprogram in question is stored in a Zero Page location, and the jump is made using indirection. This has the added benefit of allowing the vectors to be changed so that system services may be overridden, but costs an extra memory access cycle.

The side effect of the 10-bit literals is particularly visible in I/O addressing—enough that we've covered this twice already. When performing page-relative I/O operations (IN, OUT and IOT), the most significant six bits of the I/O address will still be taken from the six most significant bits of the PC. Because of this, different I/O devices will be accessed depending on the location of the program in memory.

There are two ways to avoid this: the primary one is to only use 10 bits of I/O space. Either the R field in all I/O instructions must be set, or the hardware must decode only the 10 least significant address lines, or a combination of both. Alternatively, we can use indirect addressing at the cost of an extra memory access cycle. Understandably then, the plan is to fit all peripheral I/O locations in 10 bits. This isn't a huge problem. All current I/O devices are happy co-existing at addresses below 0400, and so the issue is effectively moot for now.

B1.9. A Note About Semantic Notation

Semantic notation shows you what an instruction does at a high level.

If you're reading this, you've probably read micropocessor handbooks and can probably grok semantics notations osmotically. If now, though, here's a few tips. Starting soon, I'll be talking about processor semantics, which is a way to say what the processor at a high level, and while appearing very scientific. (win-win)

Semantics are shown like this: PC ← mem[a]. This means that memory is read at address a, and the result is stored in the PC. The left hand side can be:

A register of some sort: PC ← mem[a].
The <L,AC> vector when both L and AC are modified at once as a 17-bit value: <L,AC> ← AC + mem[a].
A memory write cycle: mem[a] ← PC. (write the PC to address a)
An I/O space write cycle: io[a] ← AC. (write the AC to I/O address a)

The right hand side can be:

Another register: PC ← DR.
A memory or I/O read. These can be nested if indirection is used. mem[a] is the value at address a. mem[mem[a]] is the value in memory at the address pointed to by a.
A constant: PC ← 0. The CFT has a tiny store of constants which it can write to things.
A simple arithmetic expression: PC ← PC + 1.
A conditional involving a flag: L ⇒ PC ← PC + 1. ‘If L is set, increment the PC.’
A conditional involving a negated flag: ¬L ⇒ PC ← PC + 1. ‘If L is clear, increment the PC.’

Lower-case variables always denote the instruction operand. The name differs to hint at the addressing mode, but this isn't consistent.

Many instructions have complex semantics. Those are shown as multiple statements like this, comma-separated. The comma indicates execution of those semantics in the order shown. We don't do parallelism. The semantics of POP are mem[a] ← mem[a] - 1, AC ← mem[mem[a]].

Note: the retro reference card uses a much denser way to show the same semantics.

B1.10. Addressing Modes

The choice of addressing modes is dicated by the R and I fields in the instruction, as well as the instruction itself. The R field was discussed above. The I bit generally (but not always—there's always an exception, right?) adds one level of indirection to an instruction.

Combined, these two bits allow up to four Addressing Modes per instruction. An additional Autoindex variant mode is available for most instructions. This is automatically enabled when both I and R fields are set, and addresses in the range 0080-00FF are used.

This makes each instruction support up to five addressing modes from the full range of 17. No instruction supports all addressing modes, and some instructions perform tasks that do not conform to the canonical definition of ‘addressing mode’, so I gratuitously consider them addressing modes to bulk up the list. Aren't you impressed a processor built in someone's basement has sixteen addressing modes? Such weird instructions use the operand field for purposes other than addressing or literals, and ignore either or both of the R and I fields.

To Complicate Matters Further…

The top five bits of each instruction select the microprogram to execute. So the use of the I bit to select Indirect mode is a matter of convention, and I lied: there are 32 entirely different instructions, not 16. In recent microcode versions, I've taken advantage of this to implement a couple of additional instructions to make Forth performance less disgraceful.

Despite this somewhat limited model, because of the microcoded nature of the CFT and the PDP-8 style instruction set, a surprising eleven addressing modes are available, some of which are quite complex. Brace yourselves for a table.

Addressing Mode	I	R	Extra Cycles	Used with
Page Relative	0	0		Most instructions
Register	0	1		Most instructions
Indirect	1	0	+2	Most instructions
Register Indirect	1	1	+2	Most instructions
Autoindex	1	1	+4	Most instructions with operand 0080–00FF
Conditional	0	X		OP1 and OP2 instructions
Double Indirect	1	0	+6	JMPII instruction
Register Double Indirect	1	1	+6	JMPII instruction
Autoindex Double Indirect	1	1	+6	JMPII instruction with operand 0080–00FF
Autoindex pre-decrement	1	0	+4	POP instruction
Register Autoindex pre-decrement	1	1	+4	POP instruction

The table shows the addressing modes, and what instructions each applies to. Also listed are the values of the I and R fields, and how many extra cycles are needed for the instruction to execute.

B1.10.1. Page Relative Mode

When using applicable instructions with R=0 and I=0, the value field in the instruction is used as the least significant ten bits of a value. The most significant six bits are taken from the PCto generate a 16-bit value relative to the current page’s origin. How the 16-bit value is used depends on the instruction, with most (but not all) instructions using it to address memory.

For example, executing the instruction JMP 9 at address FFF8 (page 3F, page origin FC00) would set the PC to FC09. Executing the instruction LOAD 9 at the same address would load the value of address FC09into the AC.

Using this addressing mode doesn't require any additional processor cycles.

Page Relative Mode — Figure 2. Operation of Address Mode. The AC or PC (in the case of flow control instructions) is loaded with the bit pattern in the operand field of the IR. The six most significant bits are taken from the six most significant bits of the PC, forming the current page number.

B1.10.2. Register Mode

When using applicable instructions with R=1 and I=0, the value field in the instruction is used as-is. The 10-bit operand is expanded to 16 bits by padding it with zeroes, so operands in the range 000–3FF are possible. How the value is used depends on the instruction, with most (but not all) instructions using it to access Zero Page memory locations.

The naming follows the PDP-8's use of Zero Page addresses to store ‘registers’.

Like Page Relative Mode, this addressing mode does not involve any extra cycles aside from the instruction fetch.

Register Address Mode — Figure 3. Operation of Register Mode.

B1.10.3. Indirect Mode

Indirect Mode is selected when R=0 and I=1. The value field of applicable instructions specifies a page-relative memory address. The 10 least significant bits of the operand are expanded using the 6 most significant bits of the PC. The resultant 16-bit number is the address in memory of the value that will be used by the instruction. Like all indirect modes, this is used for pointers, vectors, and the like.

This addressing mode involves one additional memory read (two processor cycles) in addition to the one required to fetch the instruction.

B1.10.4. Register Indirect Mode

Indirect Mode is selected when R=1 and I=1. The value field of applicable instructions specifies a memory address in Zero Page by taking 10 least significant bits of the operand and padding them with zeroes to 16 bits. The resultant Zero Page address is read from and that value is used by the instruction. Like all indirect modes, this is used for pointers, vectors, and the like.

This addressing mode involves one additional memory read (two processor cycles) in addition to the one required to fetch the instruction.

B1.10.5. Autoindex Mode

So far so boring. We have two bits to select the addressing mode, so we get four addressing modes per instruction. With a little bit of trickery shamelessly lifted from the PDP-8, a fifthaddressing mode is possible. The Autoindex mode is selected when R=1, I=1, and the operand is in the range 080–0FF inclusive.

It works just like Register Indirect Mode, with a twist: the value at the specified address is incremented after being used as a pointer. This allows the CFT to manipulate contiguous blocks of data in a very compact manner, without having to increment a pointer explicitly. This saves three instructions per loop iteration.

The semantics of the LOAD instruction using Autoindex mode would be AC ← mem[mem[ri]], mem[ri] ← mem[ri] + 1, where ri is an operand in the range 0080-00FF. The value at the memory location can be seen as an index pointer, except it's implemented in RAM rather than inside the processor. No flags are modified by the incrementation itself.

Even though the complete instruction set has not yet been discussed, a short example can show how to set up a pointer to an array of words, and add the first eight of them using Autoindexing and an unrolled loop.

&2000:               ; Assemble code at address 8192 (decimal).
        LI &300      ; AC = 768 (decimal)
        STORE &80    ; Store 768 at mem[128].
        LOAD I &80   ; Iteration 0. AC = mem[mem[128]], i.e. AC = mem[768]
        ADD I &80    ; Iteration 1. AC = AC + mem[769]
        ADD I &80    ; Iteration 2. AC = AC + mem[770]
        ADD I &80    ; Iteration 3. AC = AC + mem[771]
        ADD I &80    ; Iteration 4. AC = AC + mem[772]
        ADD I &80    ; Iteration 5. AC = AC + mem[773]
        ADD I &80    ; Iteration 6. AC = AC + mem[774]
        ADD I &80    ; Iteration 7. AC = AC + mem[775]

In addition to the memory read needed to fetch the instruction, Autoindex mode requires one memory read to fetch the index pointer from memory, one or more memory or I/O cycles as dictated by the instruction, plus one memory write cycle to write back the incremented index pointer.

That Damned PDP-8 Again!

As if I don't mention this enough in this document, you have to remember the instruction set was heavily influenced by the PDP-8's. The PDP-8 used core memory, and the control unit andinstruction set were built around that. Core memory loses the value of a bit after reading, so core memory controllers had to rewrite addresses after a read—with core memory, reading is slower than writing! The PDP-8 avoided an expensive controller by building the instruction set around this. The control unit would rewrite the pointer value to Page Zero after it use anyway. It only cost a little extra to increment it before doing so. Modern RAM isn't like this, and so the CFT's Autoindex Mode isn't quite as fast—but it's still faster than the alternative!

B1.10.6. Conditional Mode

This mode is not an actual addressing mode, in that it does not address memory or I/O space and the operand isn't even interpreted as data. Only two instructions that use this, OP1 and OP2. They treat the operand as a 10-bit bitfield. Each bit, or in some cases groups of bits, may be set to conditionally perform a particular task, one task per processor cycle. Available tasks include clearing or complementing the AC, manipulating various flags, carrying out rolls and shifts, and conditonal flow control.

If no bits are set, the instruction does nothing. If all bits are set, all of the instruction’s minor operations are performed, one at a time, in a predefined order.

The concept of these instructions comes straight from the PDP-8’s ‘microcoded’ instructions. The term doesn't denote microcode in the sense commonly understood these days, so to avoid confusion I've taken to calling them minor operations. This may be a misnomer as well, since they include conditional skips, essential operations for Turing Completness.

B1.10.7. Double Indirect Mode

This mode is only supported by the JMPII instruction instruduced in Microcode Version 6. JMPII is the CFT's only instruction that always performs indirection, and it's double indirection. The semantics are PC ← mem[mem[mem[a]]], where a is a page-local operand.

Double Indirect Mode requires three memory reads: one to find the address of the pointer-to-the pointer, one to find the address of the pointer, and one to load the final address.

B1.10.8. Register Double Indirect Mode

Again, this mode is only available with the JMPII instruction. It's the Register variation of the double indirection, so that its semantics are PC ← mem[mem[mem[r]]], where ris an operand in the range 0000-03FF.

Double Indirect Mode requires three memory reads (plus instruction fetch): one to find the address of the pointer-to-the pointer, one to find the address of the pointer, and one to load the final address.

B1.10.9. Autoindex Double Indirect Mode

This is the expected Autoindex variant of Register Double Indirect Mode. When the operand ri is in the range 080-0FF, the JMPII semantics become PC ← mem[mem[mem[ri]]], mem[ri] ← mem[ri] + 1.

This mode is used in implementing the Forth virtual machine's program counter, since Forth programs are merely lists of addresses of machine code sub-programs to execute.

Double Indirect Mode requires four memory accesses, the most of any CFT addressing mode: one to find the address of the pointer-to-the pointer, one to find the address of the pointer, one to load the final address, and one to write the incremented address back.

B1.10.10. Autoindex Decrement Mode

This mode is only available with the POP instruction. It decrements the value at the page-relative memory address specified in the instruction operand, stores the value back to memory, then uses that value to load the AC. The semantics for this are mem[i] ← mem[i] - 1, AC ← mem[i].

B1.10.11. Register Autoindex Decrement Mode

Again, this mode is only available with the POP instruction. It decrements the value at the Zero Page memory address specified in the instruction operand, stores the value back to memory, then uses that value to load the AC. The semantics for this are mem[r] ← mem[r] - 1, AC ← mem[r].

B1.11. Instruction Set Reference

The first version of the CFT microcode allowed for 15 different instructions, mostly orthogonal. Subsequent microcode updates added IOT, ISZ (which replaced an early instruction), POP and JMPII, which makes for 18 instructions in total:

TRAP: save the PC and jump to the specified location. Used for operating system services.
IOT: write accumulator to a specified output device, then read a result back from it. Used to implement computer extensions via I/O-addressable extension units. As of version 6c of the Microcode, IOT skips the next instruction if the external device requests it. This allows processor extensions to be build incrementally.
LOAD: load accumulator from memory. Introduced in version 2 of the microcode. (no, I have no recollection why version 1 didn't have a LOAD instruction!)
STORE: write value of AC to memory.
IN: read from an input device and write to AC.
OUT: write AC to an output device.
JMP: unconditional jump to the specified location.
JSR: save the PC and jump unconditionally to the specified location.
ADD: load from memory and add to the AC.
AND: load from memory and perform a bitwise AND with the value in the AC.
OR: load from memory and perform a bitwise OR with the value in the AC.
XOR: load from memory and perform a bitwise Exclusive OR with the value in the AC.
OP1: minor operations, group one. This includes unary bitwise operations and some conditionals.
OP2: minor operations, group two. This includes conditionals.
ISZ: new in version 4 of the microcode, replacing the previous INCM instruction. Loads a memory value into AC, increments it, and writes it back. The incremented value remains in AC. If AC is zero, the next instruction is skipped. This simplifies coding loops.
LIA: load accumulator with literal address or value.
JMPII: new in version 6 of the microcode: doubly indirect unconditional jump.
POP: new in version 6 of the microcode: decrement a pointer and load the AC from the decremented location.

Where's the rest?

Sure, this isn't a huge instruction set, but we do have POP and JSR. Where are the natural counterparts of such instructions like PUSH, RET, and even Return from Interrupt, RTI? It turns out we can reduce these operations to the instructions above, so the Assembler does just that, and the actual microcode doesn't need to implement them directly. For instance, PUSH is a STORE using Autoindex Mode. These instructions are discussed in standard-assembler-macros.

B1.11.1. Some Brief Notes on Assembly Notation

To simplify understanding of the instruction set, opcode mnemonics are used rather than actual machine code. In many cases, the notation used is that of the Standard CFT Assembly language which merits some description. This is not a full definition of CFT Assembly language, merely enough of it to facilitate discussing the instruction set in a more human-readable form.

CFT Assemblers parse space-separated lexical tokens denoting either symbolic instruction names or hexadecimal numbers. Literals can be decimal (e.g. 15, hexadecimal (e.g. &e) or binary (e.g #1110). For ease, and since the CFT uses a lot of bitfields, binary notation treats - as 0, and ' is ignored: so #--------'-11001-- is the same as #0000000001100100. Symbols are converted to numbers using symbol tables. The Assembler defines some, the user can define others. The resultant numbers, which must be 16 bits in width, are ORred together to form an instruction.

This provides great simplicity and generality at the expense of making the code slightly less readable from a modern Assembly perspective (although PDP-8 Assembly programmers will feel right at home).

Anything after the first semicolon (;) or slash (/) on a line is considered a comment and ignored.

Like most modern Assemblers (and against PDP-8 assembly conventions), labels are denoted by a colon (:) suffix. Literals may be used as labels: they change the address of the next word to be assembled.

The Standard Assembler has a number of additional features, but we won't need those to discuss the instruction set.

Here's a brief example of CFT Assembly showing some basic syntax, and also how space-separated fields are ORred together to build 16-bit instructions. For example, the I mnemonic is simply defined as 000010000000000 which simply sets the I bit in an instruction. Combined with, e.g. the LOAD mnemonic we can form LOAD I &342 or, if we're veryperverse (we are), we could even write &342 I LOAD. Both would assemble to the same single instruction, but the first form is a lot more conventional.

&fff0:                 ; Set the assembly address
        JMP I 1        ; Boot code: cross-page (long) jump
        .word start    ; The address of the 'start' label below

&1000:                 ; The rest of the program starts at address &1000
start:                 ; A label
        LOAD 0         ; Load direct (decimal operand)
        LOAD I 834     ; Load indirect (decimal operand)
        &342 I LOAD    ; Perversion! Valid, but against conventions.
        LOAD I R &007F ; Load Zero Page and indirect
        LOAD I R &0080 ; Load, autoindexing
        IN R PANEL 0   ; Read panel switches
        CLL RBL        ; Shift left one bit
        HALT           ; Halt the system (a macro)

B1.11.2. Memory

The following instructions operate on memory space. Most of them use Page Relative, Register, Indirect, Register Indirect, and Autoindex modes.

B1.11.2.1. LOAD — Load Accumulator

Load a word from memory into AC. Page Relative, Register, Indirect, Register Indirect, and Autoindex modes are available.

        LOAD &21       ; Load Page-Relative
        LOAD I &21     ; Load Indirect
        LOAD R &21     ; Load Register
        LOAD I R &21   ; Load Register Indirect
        LOAD I R &80   ; Load Autoindex

B1.11.2.2. POP — Decrement and Load

Added as of Microcode version 6, this instruction is useful in implementing upwards growing stacks. The location specified is decremented, and its value is used to address memory. The value at the addressed location is loaded into the Accumulator.

The canonical name of this instruction should have been DAL (Decrement And Load), but POP (which was an alias originally) made more sense and it superseded it. The only mode used with this instruction is the unique Autoindex Decrement mode, which is selected regardless of the value of R and the instruction operand. I must always be 1 to use this instruction, but this is implicit in the machine code value of this instruction (D800).

The instruction is meant to be the opposite of STORE in Autoindex Mode. Where that instruction is often used to implement a single-instruction stack push, POP can be used to implement the corresponding pop.

This is the only instruction that supports the Autoindex Pre-Decrement and Register Autoindex Pre-Decrement addressing modes, and it supports no other addressing modes.

        POP &21        ; Pop (Autoindex Pre-Decrement)
        POP I &21      ; The same (POP implies I)
        POP R &21      ; Register Autoindex Pre-Decrement
        POP R &80      ; The same (POP always autoindexes)

Pulverising User Expectations in Style

The microprogram for this instruction gleefully, willfully and otheradverbly ignores the address of the operand: any location in memory may be decremented. The instruction could be abused to perform other decrementation tasks.

B1.11.2.3. STORE — Store Accumulator

Write AC to the specified memory location. Page Relative, Register, Indirect, Register Indirect, and Autoindex modes are available.

        STORE &21      ; Store Page-Relative
        STORE I &21    ; Store Indirect
        STORE R &21    ; Store Register
        STORE I R &21  ; Store Register Indirect
        STORE I R &80  ; Store Autoindex

B1.11.2.4. ISZ — Increment Memory and Skip if Zero

Load a word from memory into AC. Increment AC, and write its value back to the same memory location. Page-Relative, Register, Indirect, Register Indirect, and Autoindex modes are available. In the autoindex modes, autoincrement occurs only once — after the memory write. Don't worry if you can't see the point of the autoindexing modes in this instruction: neither can I.

After the memory write, if AC is zero, the next instruction is skipped. This allows loops to be coded very tightly, using two instructions to set up the loop, and two instructions to iterate:

start:  NEG          ; AC = -AC
        STORE R 10   ; Loop variable
loop:
        ...          ; Loop body
        ISZ R 10     ; ++loop
        JMP loop     ; Skipped if R 10 == 0.
        ...          ; Exit from loop

B1.11.3. Device I/O

The following instructions operate on I/O space. Operands for these instructions address I/O space, not memory. Regardless, because of the orthogonality of the control unit, the standard addressing modes still apply. Page-Relative mode is nearly useless since it combines the PC (which points to memory) and the operand (which addresses I/O space). Register and Register Indirect modes are the most useful.

B1.11.3.1. IN — Read from I/O Device

Read a word from I/O space into AC. Page-Relative, Register, Indirect, Register Indirect, and Autoindex modes are available. Of which, Register mode provides the highest throughput and is the most useful. Page-Relative mode is the least useful but is provided as a side-effect of the orthogonality of the instruction set. IN instructions may have additional side-effects that depend on the peripheral being addressed.

The selected device may introduce wait states as required.

        IN I &21       ; Input Indirect
        IN R &21       ; Input Register
        IN I R &21     ; Input Register Indirect
        IN I R &80     ; Input Autoindex
        IN &21         ; Input Page-Relative (exotic)

B1.11.3.2. OUT — Write to I/O Device

Write the contents of AC to the specified I/O space address. Page-Relative, Register, Indirect, Register Indirect, and Autoindex modes are available. Of which, Register mode provides the highest throughput and is the most useful. Page-Relative mode is the least useful but is provided as a side-effect of the orthogonality of the instruction set. OUT instructions almost always have side-effects that depend on the peripheral being addressed.

The selected device may introduce wait states as required.

        OUT I &21      ; Output Indirect
        OUT R &21      ; Output Register
        OUT I R &21    ; Output Register Indirect
        OUT I R &80    ; Output Autoindex
        OUT &21        ; Output Page-Relative (exotic)

B1.11.3.3. IOT — I/O Transaction

Performs an I/O transaction or implements an instruction set extension. The IOT instruction writes AC to the specified address in I/O space, then reads AC back from the same address. The I/O-mapped processor extension can introduce wait states at the end of the I/O space read cycle, in order to get more time to perform its task. Page-Relative, Register, Indirect, Register Indirect, and Autoindex modes are available. Of which, Register mode provides the highest throughput and is the most useful. Page-Relative mode is the least useful but is provided as a side-effect of the orthogonality of the instruction set.

The selected device may assert the SKIPEXT signal during the execution of this instruction. If this happens, the instruction will also increment the PC, skipping over the next instruction. This allows external hardware to implement conditional skips.

        IOT I &21      ; Transact Indirect
        IOT R &21      ; Transact Register
        IOT I R &21    ; Transact Register Indirect
        IOT I R &80    ; Transact Autoindex
        IOT &21        ; Transact Page-Relative (exotic)

B1.11.4. Arithmetic and Logic

The following instructions perform common arithmetic and logic operations. Like memory-oriented instructions, these ones support Page-Relative, Register, Indirect, Register Indirect, and Autoindex modes apply to this group of instructions.

B1.11.4.1. ADD — Add to Accumulator

Read a word from memory and add it to AC. If the addition results in a carry-out, L is toggled. This implements a 17-bit sum, or allowing for the carry out to be detected. Page-Relative, Register, Indirect, Register Indirect, and Autoindex modes are available.

        ADD &21        ; Add Page-Relative
        ADD R &21      ; Add Register
        ADD I &21      ; Add Indirect
        ADD I R &21    ; Add Register Indirect
        ADD I R &80    ; Add Autoindex

Wot, no SUB instruction?

Simplicity. With two's complement, the subtraction y ← a - b is equivalent to b ← -b, y ← a + b, which is how subtraction is performed on the CFT (and indeed the PDP-8). One way to do this is in subtraction.

B1.11.4.2. AND — Bitwise And with Accumulator

Read a word from memory and perform a bitwise AND operation with AC. Page-Relative, Register, Indirect, Register Indirect, and Autoindex modes are available.

        AND &21        ; Bitwise AND Page-Relative
        AND I &21      ; Bitwise AND Indirect
        AND R &21      ; Bitwise AND Register
        AND I R &21    ; Bitwise AND Register Indirect
        AND I R &80    ; Bitwise AND Autoindex

B1.11.4.3. OR — Bitwise Or with Accumulator

Read a word from memory and perform a bitwise OR operation with AC. Page-Relative, Register, Indirect, Register Indirect, and Autoindex modes are available.

        OR &21         ; Bitwise OR Page-Relative
        OR I &21       ; Bitwise OR Indirect
        OR R &21       ; Bitwise OR Register
        OR I R &21     ; Bitwise OR Register Indirect
        OR I R &80     ; Bitwise OR Autoindex

B1.11.4.4. XOR — Bitwise Exclusive Or with Accumulator

Read a word from memory and perform a bitwise exclusive OR operation with AC. Direct, Register Direct, Indirect, Register Indirect, Autoindex and Register Autoindex modes are available.

        XOR &21        ; Bitwise XOR Page-Relative
        XOR I &21      ; Bitwise XOR Indirect
        XOR R &21      ; Bitwise XOR Register
        XOR I R &21    ; Bitwise XOR Register Indirect
        XOR I R &80    ; Bitwise XOR Autoindex

B1.11.5. Flow Control

The following instructions implement unconditional flow control via modification of the PC register.

B1.11.5.1. TRAP — Jump to System Service

Writes the value of the PC to memory location 0001, then sets the PC to the address specified in the instruction. Page-Relative, Register, Indirect, Register Indirect, and Autoindex modes are available.

        TRAP &21       ; TRAP Page-Relative
        TRAP I &21     ; TRAP Indirect
        TRAP R &21     ; TRAP Register
        TRAP I R &21   ; TRAP Register Indirect
        TRAP I R &80   ; TRAP Autoindex

B1.11.5.2. JMP — Jump to Address

Sets the PC to the address specified in the instruction's operand. Page-Relative, Register, Indirect, Register Indirect, and Autoindex modes are available.

        JMP &21       ; Jump Page-Relative
        JMP I &21     ; Jump Indirect
        JMP R &21     ; Jump Register
        JMP I R &21   ; Jump Register Indirect
        JMP I R &80   ; Jump Autoindex

B1.11.5.3. JMPII — Jump to Address with Double Indirection

Sets the PC to the address specified at the location pointed to by the address specified in the operand. This instruction is useful in executing chains of subroutines in a table, such as compiled Forth words. A pointer to the next location to jump to is stored in an autoindex Zero Page location, and JMPII is used to dereference that address twice and jump to it.

Double Indirect, Register Double Indirect and Autoindex Double Indirect modes are available—and these addressing modes are exclusive to this instruction. In Autoindex Double Indirect mode, the pointer is incremented after setting the PC. This allows a Forth inner interpreter to be written in a single instruction and is similarly invaluable in executing other interpreted, tokenised languages such as BASIC.

Beware: this instruction is only available with I=1:

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

1	1	1	1	1	R	Address

Some examples:

        JMPII &21       ; Jump Double Indirect
        JMPII I &21     ; same (I is implicit)
        JMPII R &21     ; Jump Register Double Indirect
        JMPII I R &21   ; same (I is still implicit)
        JMPII I R &80   ; Jump Autoindex Double Indirect

B1.11.5.4. JSR — Jump to Subroutine

Writes the value of the PC to memory location 0000, then sets the PC to the address specified in the instruction. Page-Relative, Register, Indirect, Register Indirect, and Autoindex modes are available.

        JSR &21       ; Jump to subroutine Page-Relative
        JSR I &21     ; Jump to subroutine Indirect
        JSR R &21     ; Jump to subroutine Register
        JSR I R &21   ; Jump to subroutine Register Indirect
        JSR I R &80   ; Jump to subroutine Autoindex

B1.11.6. Specials

These are operations that do not fit in the categories above.

B1.11.6.1. OP1 — Operations 1

The OP1 instruction provides a number of minor operations, any combination of which (including none) may be performed in a preset order. This is very similar to the PDP-8 ‘microcoded’ instructions. This instruction uses the Conditional ‘addressing’ mode.

The operand of this instruction is seen as a bitfield, where set bits trigger a particular minor operation in a specific order. The full instruction format for OP1 is as follows:

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

1	1	0	0	0	0	IFL	IFV	CLA	CLL	NOT	INC	CPL	Rolls

These operations are performed in order, from most significant to least significant, where - indicates a ‘don't-care’ value.

Bitfield	OP1 sub-instruction	What it does
#1---------	IFL	Execute the rest only if L set
#-1--------	IFV	Execute the rest only if V set
#--1-------	CLA	Clear AC
#---1------	CLL	Clear L
#----1-----	NOT	Complement AC
#-----1----	INC	Increment (L,AC) by one
#------1---	CPL	Complement L
#-------010	RBL	Roll Bit Left (L,AC)
#-------001	RBR	Roll Bit Right (L,AC)
#-------110	RNL	Roll Nybble Left (L,AC)
#-------101	RNR	Roll Nybble Right (L,AC)

These operations are performed in the order in which they appear above, from top to bottom. If no operations are specified, i.e. the instruction C000, the instruction is effectively a eleven-cycle (two to fetch, nine to execute) no-operation instruction called NOP11. The four roll instructions are mutually exclusive. The notation (L,AC) indicates that both L and ACare used together as a 17-bit quantity, with L being the most significant bit.

The first two minor operations (IFL and IFV) end execution of the rest of the instruction unless the L or Overflow flag (V) flags are set, respectively. These operations simplify the rippling of carry, borrow or overflow effects.

To clear both AC and L the instruction would be OP1 CLA CLL.

If the INC instruction is issued when AC is FFFF, L is toggled and (as expected) AC wraps around to 0000.

To calculate the two’s complement of the AC, OP1 NOT INC (machine code C030) will first invert AC, then increment it by one. Note that OP1 INC NOThas exactly the same result, as the order in which minor operations are performed is fixed. The convenient macro NEG performs this commonly used operation with more brevity.

A binary left shift by one bit can be performed using OP1 CLL RBL (C102): clearing L and rolling left.

The standard Assembler defines many of these combinations of instructions as convenient macros. When using these macros, specifying OP1 is unnecessary and is conventionally left out.

The exact operation of the four roll instructions is illustrated below in a way that will both explain how they work and confuse you more.

Here are some examples of how versatile this instruction is:

        CLL           ; Clear L.
        IFL CLL       ; Clear L if necessary. (7 cycles faster if L is already clear!)
        CLL CLA       ; Clear L and A.
        CLA CLL       ; The same. (the order is always the same)
        CLA INC       ; Slow, PDP-8 way to set AC=1.
        NOT INC       ; Two's complement negation.
        NEG           ; The same. (assembler macro)
        IFV CLA       ; AC = 0 on overflow.
        RBL           ; Roll one bit left.
        CLL RBL       ; Bitwise shift one bit left.

The microprogram for OP1 is one of the most complex in the CFT, with numerous decisions and branches. The weirdness came about because of microcode limitations when adding the IFL and IFV minor operations.

Fetch, cycle one.
Fetch, cycle two.
Is IFL bit set?
If L=0, end here. Else, is IFV bit set?
If V=0, end here. Else, is CLA bit set?
If so, clear AC. Is CLL bit set?
If so, clear L. Is NOT bit set?
If so, invert AC. Is INC bit set?
If so, increment AC. Toggle L if it wraps around. Is CPL bit set?
If so, toggle L. Are any of the roll bits set?
If not, end here. Else, perform a roll.
End here.

Fix me, maybe?

Review the microcode: it seems the 12th cycle which just ends the microprogram may not actually be necessary. It can probably be done on cycle 11.

Based on this, here are the execution times of OP1 in processor cycles:

Instruction combination	Cycles
IFL	4 if L clear, otherwise 11.
IFL IFV	4 if L clear, 5 if V clear, otherwise 11.
IFV	5 if V clear, otherwise 11.
Any other excluding rolls	11.
Any other including rolls	12.

So, any combination including IFL completes on the fourth cycle if L=0. Any combination including IFV completes on the fifth cycle if V=0. Any other case will complete in 11 cycles if rolls aren't required, or 12 if they are. Yes, that's a bit excessive, but it couldn't be helped.

B1.11.6.2. OP2 — Operations 2

This instruction is very similar to the OP1 instruction. The main feature of the OP2 instruction is conditional flow control: skipping over the following instruction if the corresponding condition is true. The following table lists the OP2 bitfield values that may be ORred together. Like the OP1 instruction, this instruction operates on the Conditional ‘addressing’ mode with the following instruction format:

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

1	1	0	1	0	0	0	0	CLA	CLI	STI	Skips

Bitfield	OP2 Instruction
#-----01---	SNA — Skip if AC negative (G1)
#-----0-1--	SZA — Skip if AC zero (G1)
#-----0--1-	SSL — Skip if L set (G1)
#-----0---1	SSV — Skip if V set (G1)
#-----10000	SKIP — Always skip (G2)
#-----11---	SNN — Skip if AC non-negative (G2)
#-----1-1--	SNZ — Skip if AC non-zero (G2)
#-----1--1-	SCL — Skip if L clear (G2)
#-----1---1	SCV — Skip if V clear (G2)
#--1-------	CLA — Clear AC
#---1------	CLI — Clear I flag
#----1-----	STI — Set I flag

If none of the bits are set, this instruction becomes the CFT's NOP instruction, pausing execution for seven clock cycles.

There are two groups of branching instructions: G1 (bit 4 of the instruction operand is 0) and G2 (bit 4 of the instruction operand is 1. When G1 instructions SNA, SZA, SSL and SSV are specified together, the skip is performed when any of the specified conditions hold (logical disjunction, OR). For example, the instruction SZA SNA is ‘skip AC if zero or negative’ (which is the standard CFT Assembly macro SNP).

When G2 instructions are specified together, the skip is performed when all of the specified conditions hold (logical conjunction, AND). For example, SNN SNZis ‘skip if AC is non-zero and non-negative’, or ‘skip if AC is positive’ (the standard Assembler macro for this is SPA). The full set of combinations to check the value of AC is as follows:

Instruction	Macro	Alternate Name	Action
SNA		IFNN	Skip if A < 0
SNA SZA	SNP	IFPA	Skip if A ≤ 0
SZA		IFNZ	Skip if A = 0
SNZ		IFZA	Skip if A ≠ 0
SNN		IFNA	Skip if A ≥ 0
SNN SNZ	SPA	IFNP	Skip if A > 0

Why Two CLA operations?

Superficially, having two CLA operations is unnecessary, and wastes space that could be used by another minor operation. However, clearing the AC is very useful and common, and having it in both OP1 and OP2 makes it available to more use cases.

Both CLA operations have the same bitfield value, 080. The OP1 CLA operation is C080, and the OP2 CLA operation is D080. To resolve the obvious ambiguity in the naming, standard CFT Assembly always defines CLA as C080. This works in all three combinations of CLA appearances in instructions:

CLA: the symbol table resolves this to C080, which is OP1 CLA.
OP1 CLA: since Assembly ORs all fields together, this results in C000 OR C080, which is again C080, or OP1 CLA.
OP2 CLA: again, ORring fields together yields D000 OR C080 which is D080, which is OP2 CLA.

The only minor drawback of this arrangement is that, should the programmer somehow require the CLA operation to be executed by the OP2 instruction explicitly, OP2 CLA must be specified in full, but this is necessary only if no other minor OP2 operations are used.

…And Why The Alternate Names?

Sometimes I get confused with the semantics of skip instructions. I guess the concept of an ‘if’ statement is too ingrained in me, and ‘skip if condition’ is the same as ‘execute if not condition’. So I keep those handy IF macros around for good measure.

The instuction always executes in exactly seven processor cycles, regardless of the combination of minor operations or whether or not the skip was taken.

Another Microcode Improvement

There are no documented cases in the ROM of combining skips and the CLA, CLI or STI minor operations. The microprogram can terminate execution on the fourth cycle 4 if any skip operation was requested, which would save three cycles in those cases.

B1.11.6.3. LIA — Load Immediate Address

Loads the AC with the literal value specified in the instruction. The Page-Relative form of this instruction is used to load the AC with a page-relative address. LIA works either in the Page-Relative or Register modes. Beware, there are no indirect modes available and the I bit should always be clear.

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

1	1	1	1	0	R	Address (LIA)

When R=1, LIA sets the AC to a Page Zero address, but it may be also be seen as a Load Immediate (LI) instruction, which can load a literal value in the range 000-3FF.

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0

1	1	1	1	0	1	Register (LIA) or Literal (LI)

Some examples:

&1000:
        LIA &21         ; Set Page-Relative Address (sets AC to &1021)
        LIA R &21       ; Set Register Address (sets AC to &21)
        LI &21          ; Same as above (note that R is implied in LI)

B1.11.7. Standard Assembler Macros

Commonly used instructions benefit from convenient aliases. The single-instruction macros discussed here round off the instruction set and endow it with instructions the CFT otherwise (and counter-intuitively) lacks — for example, a Return From Subroutine instruction.

B1.11.7.1. RET — Return from Subroutine

This is defined as JMP I R 0. It jumps to the return address saved by the JSRinstruction, which stores it at memory address 0000 (also known as RETV).

B1.11.7.2. RTT — Return from Trap

This is defined as JMP I R 1. It jumps to the trap return address saved by the TRAPinstruction, which stores it at memory address 0001 (also known as RTTV).

B1.11.7.3. RTI — Return from Interrupt

This is defined as JMP I R 2. It jumps to the trap return address saved by the interrupt microprogram before the ISR is called. The interrupt return vector (0002) is also known as RTIV.

B1.11.7.4. NEG — Negate Accumulator

Obtains the two’s complement of the AC by performing OP1 NOT INC.

B1.11.7.5. ING — Increment and Negate Accumulator

Obtains the two’s complement of the AC and increments it, thereby calculating AC ← −(AC + 1). Due to the use of two’s complement for arithmetic negation, this is equivalent to a simple OP1 NOT. The ING instruction is useful in some types of loops in conjunction with ISZ, and is also the one's complement operation (though that is more clear as simply NOT).

B1.11.7.6. SEL — Set L

This macro sets L by executing CLL CPL: clearing, then complementing L.

B1.11.7.7. LI — Load Immediate

This is simply equivalent to LIA R. It loads AC with the 10-bit literal value specified as instruction operand. The most significant six bits will be zero. It may be used to cheaply set small literal constants.

B1.11.7.8. SPA — Skip if Positive

This is equivalent to OP2 SNN SNZ. It skips the next instruction if the ACis neither negative, nor zero (thus positive).

B1.11.7.9. SNP — Skip if Non-Positive

This is equivalent to OP2 SNA SZA. It skips the next instruction if AC is less than or equal to zero (non-positive).

B1.11.7.10. IFNN — If Non-Negative

Defined as OP2 SNA and reversing the semantics: rather than ‘skip if negative’, it's ‘[don't skip] if non-negative’. This simplifies some code.

B1.11.7.11. IFPA — If Positive

Defined as OP2 SNA SZA and reversing the semantics: rather than ‘skip if non-positive’, it's ‘[don't skip] if positive’.

B1.11.7.12. IFNZ — If Non-Zero

Defined as OP2 SZA and reversing the semantics: rather than ‘skip if zero’, it's ‘[don't skip] if non-zero’.

B1.11.7.13. IFZA — If Zero

Defined as OP2 SZA and reversing the semantics: rather than ‘skip if non-zero’, it's ‘[don't skip] if zero’.

B1.11.7.14. IFNA — If Negative

Defined as OP2 SNN and reversing the semantics: rather than ‘skip if non-negative’, it's ‘[don't skip] if negative’.

B1.11.7.15. IFNP — If Non-Positive

Defined as OP2 SNN SNZ and reversing the semantics: rather than ‘skip if positive’, it's ‘[don't skip] if non-positive’.

B1.11.7.16. SBL — Shift One Bit Left

Implements a simple bitwise shift one bit to the left by executing OP1 CLL RBL. At the end of execution, L will contain the AC's most significant bit before the shift. The AC's least significant bit will be clear.

This is the only single-instruction way to produce a left shift on the CFT. There is no way to perform a nybble shift or a shift by an arbitrary number of bits, and there are no arithmetic (sign-preserving) shifts.

Why aren't there more complex shifts?

Two reasons: a barrel shift unit is complex and expensive and the manual point-to-point wiring would have too complex and bug-prone. A sign-extending (arithmetic) shift unit would be even more complex.

Even after the Arithmetic and Logic Unit (ALU) was designed using ROM tables (and thus wiring was a non-issue), rolls of 1 and 4 bits were thought to be more generally useful. The PDP-8 is unusual in that it supports single and two-bit rolls. The CFT implements single and four-bit rolls. For comparison, both the MOS 6502 and Zilog Z-80 could perform single-bit rolls and shifts (both bitwise and arithmetic). The barrel shift unit on the ARM v1 processor was bigger than the rest of the processor's ALU!

Given the extensible nature of the CFT, it would be relatively easy to implement a table-based shift unit as an add-on card, but this is far from crucial.

B1.11.7.17. SBR — Shift One Bit Right

Implements a simple bitwise shift one bit to the right by executing OP1 CLL RBR. At the end of execution, L will contain the AC's least significant bit before the shift. The AC's most significant bit will be clear.

This is the only way to produce a right shift on the CFT. There is no way to perform a nybble shift or a shift by an arbitrary number of bits, and there are no arithmetic (sign-preserving) shifts.

B1.11.7.18. PUSH — Push a value onto an upwards-growing stack.

Defined as simply STORE I, this must be used in Autoindexing mode. It treats the operand as a stack pointer, stores the AC at the address pointed to by the operand, then increments that address. Since the CFT has no hardware stacks, there are no stack exceptions for full stacks and you're on your own.

Note that STORE's counterpart, POP, is actually implemented in microcode and isn't a macro.

B1.12. Some Common Examples

This section shows how some common, simple Assembly language tasks can be performed using the CFT instruction set.

B1.12.1. Addition With Carry

The L flag is used as carry in this short program:

adc:   IFL INC      ; Skip if L=0
       ADD addr

The program increments the AC by one if L is set (which is used as carry in/out here), then performs addition as normal.

B1.12.2. Subtraction

There is no explicit subtraction instruction, but the benefit of two’s complement is that one is unnecessary. Subtraction can be reduced to addition as follows:

sub:    NEG         ; OP1 NOT INC
        ADD addr

The program does a one’s complement (binary negation) of the accumulator, then increments it by one, which is a two’s complement (decimal negation). This is the NEGmacro. The addition is then performed with the negative value of the AC, to obtain the desired result.

B1.12.3. Negation of a 32-bit Quantity

The following code negates the two’s complement representation of a 32-bit signed integer stored in locations xh (high word) and xl (low word):

neg32:  SEL         ; Set carry
        LOAD xl     ; Negate low word
        NEG         ; Two's complement
        STORE xl

        LOAD xh     ; Negate high word
        IFL INC     ; Propagate carry
        NEG         ; Two's complement
        STORE xh

B1.12.4. Addition of two 32-bit Values

To add two 32-bit values stored in locations al, ah (AC low and high words respectively), and bl, bh (B low and high words respectively), it is necessary to propagate the carry bit stored in L:

add32:  CLL         ; Clear carry
        LOAD al     ; Load low A
        ADD bl      ; Add to low B
        STORE xl    ; Store it

        LOAD ah     ; Load high A
        IFL INC     ; Propagate carry
        ADD bh      ; Add to high B
        STORE xh

B1.12.5. Subtraction of two 32-bit Values

This program subtracts two signed, two’s complement 32-bit values stored in locations al, ah (AC low and high words respectively), and bl, bh (B low and high words respectively). It does so by negating B, storing it to the result location X, and then adding AC to that location while propagating carry.

sub32:  SEL         ; Set carry
        LOAD bl     ; Negate low word
        NEG         ; Two's complement
        STORE xl

        LOAD bh     ; Negate high word
        IFL INC     ; Propagate carry
        NEG         ; Two's complement
        STORE xh

        CLL         ; Clear carry
        LOAD al     ; Load low A
        ADD xl      ; AC = AC + -B
        STORE xl    ; Store it

        LOAD ah     ; Load high A
        IFL INC     ; Propagate carry
        ADD xh      ; Add to high B
        STORE xh    ; Store it.

B1.12.6. Arithmetic Shifts

Sign-extending shifts are only slightly more involved than simple shifts.

asr:    CLL       ; Clear L (L=0)
        SNN       ; Skip if AC >= 0
        CPL       ;   AC < 0: toggle L (L=1)
        RBR       ;   Roll 1 bit right.

At the end of this short program, the most significant (i.e. sign) bit of AC will be the same as before. Due to the use of two's complement, this is still equivalent to an integer division by 2, but the behaviour is now specialised to signed numbers.

B1.12.7. Bitwise Or of a Small Array

The autoindex registers can be used to simplify short loops. Here, the AC is loaded with the address of the first word of the array. The first 5 elements of it will be ORred together, and the result left in AC.

or6:    STORE R &80  ; Autoindex register
        LOAD I R &80 ; Load 1st value
        OR I R &80   ; OR with 2nd value
        OR I R &80   ; OR with 3rd value
        OR I R &80   ; OR with 4th value
        OR I R &80   ; Or with 5th value

B1.12.8. Simple Loops

This is a simple loop. It iterates as many times as the value of AC on entry to the subroutine. In this example, the loop body sends the hexadecimal value 2A to an I/O device designated TTY0 TX. The routine is expected to be called with JSR stars.

stars:  NEG          ; AC = -AC
        STORE R &10  ; Loop variable
loop:   LIA &2A      ; AC = 002A (ASCII '*')
        OUT TTY0 TX  ; Send it out.
        ISZ R &10    ; Step the loop counter.
        JMP loop     ; Loop again.
        RET          ; Return if R &10 == 0.

B1.12.9. Sum of a Block of Words

A somewhat more complex example leverages the Autoindex feature to calculate the sum (modulo 65,536) of a block of words. A pointer to the block of words should be stored at location 0010, and the size (in words) should be in AC. On exit, memory address 0012 will contain the sum of the words.

sum_n:  NEG
        STORE R &11   ; Loop variable
        LOAD R &10    ; Array base
        STORE I R &80 ; Autoindex
        LI 0          ;
        STORE I R &12 ; Sum = 0
loop:   LOAD R &12    ; Load running total
        ADD I R &80   ; Sum a word
        STORE R &12   ; Store it back
        ISZ R &11     ; Step loop variable
        JMP loop      ; Loop again
        RET           ; Done.

B1.12.10. Printing out a packed string.

The CFT conventionally stores strings in a number of formats, including null-terminated or length-prefixed unpacked forms (where each character takes up 16 bits), and null-terminated packed forms, where two 8-bit characters (string terminator included) are packed in every 16-bit word, the first character occupying the last significant 8 bits. This short program prints out the latter form, provided the string base address is in AC when the subroutine is called. The hypothetical output device TTY0 TX is used for printing.

putsp:          STORE R &10     ; Store the string's base address

loop:           LOAD I R &10    ; Read a pair of characters
                SNZ             ; Done?
                RET             ; Yes
                STORE R &11     ; No.
                AND bytelo      ; Get the least significant 8 bits
                OUT TTY0 TX     ; Output the character

                LOAD R &11      ; Load the pair of characters again
                RNR             ; Roll 8 bits right
                RNR
                AND bytelo      ; Keep the least significant 8 bits
                SNZ             ; Are we done now?
                RET             ; Yes
                OUT TTY0 TX     ; Output the character

                JMP loop        ; Loop again

bytelo:         .word &00ff     ; A useful constant.

B1.13. Instruction Table

A retro-style printable table of all CFT instructions, standard macros, and minor operations is available as the CFT Reference Card. Due to its width, it cannot easily be shown here. The table shows mnemonics of instructions, hexadecimal values, the instruction bitfield format, number of clock cycles, processor flags modified, addressing modes used, instruction type, name and semantics.