I have always been a fervent believer of backtracking in technology to learn or experience something that may not be at all available today or even available at all in the future.
Recently I was thinking, is it possible. Can it be done ? HOW would it be done ? To build a fully-working and useful 4-bit assembler ?
That is, a fantasy console that uses only 16-commands and is a fully working and useful computer.
My idea was that it would have a 16x16 pixeled screen, B&W, 8-colored lights, a 3-digit LED - red on the left, green on the right. And can read and react to standard input, U D L R A B, either waiting for the key or strobing it.
Now I might be able to lay out memory. You have only 256-bytes to work in.
The top 32-bytes are reserved to display 8 4-character messages (A-Z, 0-9, etc) where the font is small: 3x5, which could further be displayed by one of the 16-commands available to the 4-bit processor.
You also have a PINK and GREEN pixel overlaying the 16-B&W. These are to be treated as individual sprites. So you cannot have more than one PINK or GREEN pixel at a time. You can hide one or the other, but only their LOCATION is recorded.
The 16x16 B&W screen is a true 32-bytes, so here is the memory layout so far:
$00-$7F ... actual machine-code space and storage, editor allows you to run or stop any position
$80-$8F ... system variables if needed, A, X, Y, etc.
$90-$9F ... pushed/pulled data storage and stack
$A0-$AF ... I'm a little fuzzy here, maybe this would be needed for math or variable use ?
$B0-$B3 ... coordinates of PINK and GREEN pixel
$B4-$B7 ... numeric contents of LED digits (Red on left, green on right, each 3-digit decimal)
$B8-$BF ... Lit status of 8-colored lights, can all be on or off or mixed
$C0-$DF ... Actual 16x16 B&W screen contents
$E0-$FF ... 8 four-character messages. They are each displayed for 1/2 second, called from a single machine-language command, then return back to the standard 16x16 screen with 2-colored sprites. I.E.: You could flip between "BASE" and "BALL" or "WIN" or "LOSE." They are used to relay simple messages only.
So, with that out of the way, HOW would it be possible to make a useful and working fantasy 4-bit assembler where the only machine instruction total you have is 16. Perhaps zero being RTS or BRK ?
How could this be done ? What possible 16-commands for a 4-bit assembler could be formed and still have a fully working computer capable of playing simple games and doing calculations ?
Here are some tests it should be able to do:
- Draw a square or rectangle.
- Bounce one of the sprites around on the screen in diagonal movements.
- Roll one or more 6-sided dice and return the results
- Have a paint program, the PINK pixel is the plotter, arrow keys (A) and (B) draw or erase
- Have a rudimentary RPG with weapons, armor, monsters, and dungeons to explore
- Draw a circle or ellipse
- Draw a trapezoid, triangle, or other multi-edged object
- Have the ability to make a 1- or 2-player SNAKE game
- Play 21 (Blackjack)
- Calculate and display division in decimal places and remainders
How would you do it ?
And more interestingly, could this be implemented and emulated from PICO ?
I'm a bit confused as to how somethings would work; in your memory map you only allocate 32 nibbles for the screen (I suspect you're using one bit per pixel), you're talking about both 8 colored-lights but you're also saying the screen is black and white (I'm assuming that the lights are separate from the screen) and I straight up don't understand the 3-digit LEDs. Please help.
Hi @stinkerbo6. Actually it's been done, and in even less of an instruction set. 3-bit. It has a terrible name but a powerful instruction set. Called, "BrainF---."
You can find more information on it here:
As for memory mapping in my monster, I want it be exactly 256-bytes for total access, graphics, sound, and RAM space of code. 256-bytes of storage is just such a magical thing.
At some point I may approach this again, I'm still working on a different project for now.
That's a clever way to do it, @Matcha155. It's not so much the accessing that I want to limit, but the memory size. To 256-bytes maximum which includes absolutely everything.
I fully believe it is possible to make an interesting and useful virtual 256-byte computer.
Oho ! He means the real thing. Yeah that's completely beyond my ability, @Matcha155. Interesting to look at though ... No, what I would write will be pretty close to the Atari 2600 BASIC.
While the display would obviously be better, the concept will be similar and I have already written some code for the interface a few years back.
A "4-bit" computer doesn't just refer to the size of the instruction set—and often doesn't refer to the size of the instruction set either. It refers to the word size, which implies the size of a value stored at an address and the number of addresses. You've chosen a 256-byte address space, but if you want an address to fit in 8 bits, then each address must refer to a byte-sized (8-bit) word. If each address refers to a 4-bit word, then you have 512 possible addresses and need 9 bits to represent an address, which is inconvenient. (Hence Matcha155's suggestion to kick it to 4K, which would make use of all 12 bits in a three-word address.) This also impacts how instructions are stored. If each address is byte-aligned, then each instruction is byte-aligned. A 4-bit instruction without an operand would have to waste four bits to align the next instruction to the byte boundary. Compare the Intel 4004, a 4-bit processor with 46 instructions ranging from 8 to 16 bits wide.
You've assigned memory addresses to "system variables," but given your examples of "A, X, Y, etc." it sounds like you're referring to registers. Registers are special memory locations built into a CPU with dedicated instructions to manipulate them, and do not have addresses. It sounds like you're expecting to have a status register (flags like zero and carry), a program counter (the address of the current instruction), and a stack pointer. You probably want an accumulator ("A"): without one, nearly every instruction would need one or two address arguments, making each such instruction very wide and eating up the limited code space.
You'll want clear, set, and branch instructions for each flag in the status register. Some CPUs use "relative" addressing for branch instructions, so their argument isn't a full address but an offset of limited range to save space. In this case, maybe a branch could be limited to three 4-bit words: one for the instruction and two for the offset. Using just one word for the offset is probably too little (-7/+8 words). Or maybe branch instructions have full addresses as arguments, so they're four words wide. You'd also want an unconditional jump instruction that takes a full address.
You'll want an instruction for loading the contents of a memory address into the accumulator, and one for storing the accumulator in a memory address. You'll also want another instruction for loading an "immediate" value (a value stored with the instruction) into the accumulator. Add and subtract instructions, with immediate (add/subtract this value to the accumulator) and absolute (add/subtract the contents of this memory address to the accumulator) addressing variants. Bitwise math operations. Stack operations: jump to subroutine, return from subroutine; push and pull the accumulator; push and pull the status register; copy stack pointer to accumulator, copy accumulator to stack pointer.
And so forth. I'm citing concepts from traditional vintage CPUs (like the 6502) because it sounds like that's what you're after. There's a lot to think about if you want to store instructions in addressable memory, even for a "fantasy" CPU/virtual machine and even if you don't want to emulate a real CPU. Of course, there are theoretical extremes in terms of instruction set size: see One-instruction set computers. Notice that smaller instruction sets tend to need more memory for code. Also check out the 4-bit processors in this Sharp Microcomputers Data Book, some of which were used in Nintento Game and Watch toys.
Yes, but @dw817 explicitly stated that the opcodes would be 4-bit.
What could be done is using the first bit to determine the data format, for example if the first bit is a 0 then the data format could be 0(format bit) 0000(opcode) 000+next byte(address) and if the first bit is a 1 then the data format coulde be 1(format bit) 0000000(opcode) next byte(address(or not)).
0000: nop(no operation) 0000 01: out(to out register) 0000 10: in(to in register)
0001: jmp(jump to address (only 15-bits)) 0001 1: jsr(jump to subroutine and store return value on stack)
0010: lda(load a)
0011: sta(store a)
0100: jzs(jump to subroutine if zero flag is set) 0100 01: jzc(jump to subroutine if zero flag clear) 0100 10: jcs(jump to subrotine if carry flag set) 0100 11: jcc(jump to subroutine if carry flag clear)
0101: rts(return from subroutine) 0101 1: rti(return from interupt)
0110: ldb(load b)
0111: stb(store b)
1000: log(logic) 1000 00: not(logical not to a) 1000 01: or(logical or with memory and a) 1000 10: and(logical and with memory and a) 1000 11: xor(logical xor with memory and a)
1001: stz(set zero flag) 1001 001: clz(clear zero flag) 1001 010: stc(set carry flag) 1001 011: clc(clear carry flag) 1001 100: sti(set interupt falg) 1001 101: cli(clear interupt flag)
1010: add(add memory to a)
1011: sub(subtract memory from a)
1100: pta(pop from stack and store in a) 1100 1: pas(push a to stack)
1101: bsl(bit shift left) 1101 1: bsr(bit shift right)
1110: adi(add immediate value to a(only 4-bits))
1111: sbi(subtract immediate value from a(only 4-bits))
This is my proposed instruction set, for the cases of multiple instructions on one line, the extra cases use some extra bits hence sacrificing some addressing(the jump instruction can only jump to the second half of memory).
I'm not objecting to a 4-bit instruction set, I'm just saying you have to think about the rest of the design along with it. For example, does your lda take an immediate value or an address? If you reserve a bit to make that distinction, then you've effectively doubled the size of the instruction set and it's not meaningfully a 4-bit instruction code.
Variable width instructions are totally a thing, that's fine. You can ignore the electronic implications for a fantasy CPU and just encode it however you like, but you still have to think about how it'll be used. For example, notice that your "1100" instruction is actually "11000" to distinguish it from "11001". Your machine must read both nibbles, and there's no sense in which the "1100" instruction is one nibble wide. In other words, "1100" is always a prefix for a double-wide instruction code. It's probably worth thinking carefully about which instructions will be most commonly used, and giving them one-nibble instruction codes that aren't prefixes for two-nibble codes.
Other decisions like nibble addressing vs. byte addressing are important for instruction encoding because they affect instructions that take arguments. Keep in mind that packing instructions into bit patterns (e.g. 5 bits for an instruction and 9 bits for an address across two bytes) makes self-modifying code difficult, a possibly important technique for a memory constrained machine.
Try it! It's not difficult to make a virtual machine for a tiny instruction set, even in PICO-8. Implement a memory visualizer and an interpreter for your encoding, and see where the pain points are in writing simple programs for it.
You'll have to spend time setting those bits before calling the instruction:
ldb #1 0110 0001 stb $0f 0111 0000 1111 log 1000
So "not a" is six nibbles. (I assumed nibble-sized registers and byte-sized addresses in this example.)
Slightly more realistic is to affect the behavior of an instruction with a CPU register instead of a memory-mapped register. There'd be a dedicated instruction (or several) for manipulating the CPU register that does the job of "ldb" and "stb" here, possibly saving a nibble or two. Imagine a nibble-sized "VAR" register and a "svar" instruction:
svar #1 ???? 0001 log 1000
Variable-length instruction encoding is probably more efficient overall, especially with good choices for which instructions are the longer ones.
This is my revised suggestion for the design. Please tell me if I efed something up.
BASIC ARCHITECTURE memory: 256 bytes of sram memory layout: 0x00-0x0f general purpose variable use. 0x10-0x1f stack. 0x20-0x3f screen data. 0x40 controller input. 0x7e reset vectore. 0x7f interrupt vector. 0x10-0x1f stack. 0x80-0xff program data. controller input: bit 0 is a d-pad left, bit 1 is d-pad right, bit 2 is d-pad up, bit 3 is d-pad down, bit 4 is d-pad a/x, bit 5 is d-pad b/o, bit 6 is pause All of the memory layout is optional and just recomended, except for the reset and interrupt vectors. registers: 8-bit program counter aka. pc. 8-bit var. 4-bit stack pointer aka. sp. 8-bit accumulator aka. a. 8-bit general use register b INSTRUCTIONS 0000: nop (no operation) 0001: jmp (jump to address) (saving the return address is determined by bit 5: 0=no uses b offset, 1=yes) 0010: lda (load a) (addressing mode determined by bit 2-3: 00=immediate, 01=with b offset, 10=indirect, 11=absolute) 0011: sta (store a) 0100: bcf (branch if flag) (condition is determined by bits 6-7: 00=zero set, 01=zero clear, 10=carry set, 11=carry clear) 0101: jpf (jump to subroutine if flag) (condition is determined by bits 6-7: 00=zero set, 01=zero clear, 10=carry set, 11=carry clear) 0110: ldb (load b) (addressing mode determined by bit 2-3: 00=immediate, 01=with b offset, 10=indirect, 11=absolute) 0111: swp (swap a and b) 1000: log (logic a and memory) (operation determined by var bits 0-1: 00=not, 01=or, 10=and, 11=xor) 1001: lgi (logic a and operand) (operation determined by var bits 0-1: 00=not, 01=or, 10=and, 11=xor) 1010: add (add a and memory) 1011: sub (subtract a and memory) 1100: ps (push or pop stack (uses a seperate stack pointer)) (operation determined by bit 4-5: 00=pop stack to var, 01=push a to stack, 10=pop stack to a, 11=pop) 1101: abs (arithmatic bit shift on a) (direction determined by var bit 4: 0=left, 1=right) 1110: adi (add b and operand) 1111: sbi (subtract b and operand) VAR LAYOUT 0 0 (logical operator) 0 0 (load instruction addressing modes) 0 (bit shift direction) 0 (save bool) 0 0 (condition flag)
Have you tried writing programs with it?
Your next step is to test your design. You're proposing that people write programs for a machine with 128 bytes of code, and you've chosen certain instructions to be the wide ones. Are those the right choices? Is your specification complete? Is it efficient? You won't know until you try using it.
Just do it on paper. Write a program that sets the accumulator. Write a program that sets memory address 0. Write a program that sets memory address 1 to the contents of memory address 0. Write a program that increments memory address 1 by 1 in a loop, from 0 to whatever is stored in address 0. Hand-assemble these programs and see how much space they take. Evaluate them by hand as if the computer will do it. Have you made all of the design decisions needed to write complete and accurate programs that do these things?
You've described stacks and interrupts, as well as memory-mapped I/O. How do these features work in your computer? At what point does memory-mapped I/O change when input changes? How would a program know that I/O has changed? Write a short program that uses these features, then try to evaluate them.
With I/O and a display, write a program that displays a dot of a particular color: if the value at address 0 is greater than the value at address 1, make the dot red; if the value at address 0 is less than the value at address 2, make it blue; otherwise make it white. D-pad up and down adjust the value at address 0. How long is this program? Does it fit into 128 bytes?
You've made some design decisions that diverge from historical designs for small processors. That's fine! I think you'll find that those historical designs made those decisions for good reasons. The best way to learn those reasons is to try your design and find out.
Take it a step at a time. Validate your instruction set and encoding with simple experiments before writing an interpreter. Write a simple interpreter without consideration for timing before making decisions about cycles. Get it updating memory locations before implementing peripherals (display, input). Hand-assemble before writing an assembler. You'll be updating your design decisions as you go, so you want to spend the least amount of effort getting to those decisions and updating your implementation.
You describe a "var" register but haven't specified a way to set it. That's the kind of thing you want to notice before putting too much effort into an implementation.
[Please log in to post a comment]