Log In  

I have always been a fervent believer of backtracking in technology to learn or experience something that may not be at all available today or even available at all in the future.

Recently I was thinking, is it possible. Can it be done ? HOW would it be done ? To build a fully-working and useful 4-bit assembler ?

That is, a fantasy console that uses only 16-commands and is a fully working and useful computer.

My idea was that it would have a 16x16 pixeled screen, B&W, 8-colored lights, a 3-digit LED - red on the left, green on the right. And can read and react to standard input, U D L R A B, either waiting for the key or strobing it.

Now I might be able to lay out memory. You have only 256-bytes to work in.

The top 32-bytes are reserved to display 8 4-character messages (A-Z, 0-9, etc) where the font is small: 3x5, which could further be displayed by one of the 16-commands available to the 4-bit processor.

You also have a PINK and GREEN pixel overlaying the 16-B&W. These are to be treated as individual sprites. So you cannot have more than one PINK or GREEN pixel at a time. You can hide one or the other, but only their LOCATION is recorded.

The 16x16 B&W screen is a true 32-bytes, so here is the memory layout so far:

[code]
$00-$7F ... actual machine-code space and storage, editor allows you to run or stop any position
$80-$8F ... system variables if needed, A, X, Y, etc.
$90-$9F ... pushed/pulled data storage and stack
$A0-$AF ... I'm a little fuzzy here, maybe this would be needed for math or variable use ?
$B0-$B3 ... coordinates of PINK and GREEN pixel
$B4-$B7 ... numeric contents of LED digits (Red on left, green on right, each 3-digit decimal)
$B8-$BF ... Lit status of 8-colored lights, can all be on or off or mixed
$C0-$DF ... Actual 16x16 B&W screen contents

$E0-$FF ... 8 four-character messages. They are each displayed for 1/2 second, called from a single machine-language command, then return back to the standard 16x16 screen with 2-colored sprites. I.E.: You could flip between "BASE" and "BALL" or "WIN" or "LOSE." They are used to relay simple messages only.

So, with that out of the way, HOW would it be possible to make a useful and working fantasy 4-bit assembler where the only machine instruction total you have is 16. Perhaps zero being RTS or BRK ?

How could this be done ? What possible 16-commands for a 4-bit assembler could be formed and still have a fully working computer capable of playing simple games and doing calculations ?

Here are some tests it should be able to do:

  • Draw a square or rectangle.
  • Bounce one of the sprites around on the screen in diagonal movements.
  • Roll one or more 6-sided dice and return the results
  • Have a paint program, the PINK pixel is the plotter, arrow keys (A) and (B) draw or erase
  • Have a rudimentary RPG with weapons, armor, monsters, and dungeons to explore

Advanced:

  • Draw a circle or ellipse
  • Draw a trapezoid, triangle, or other multi-edged object
  • Have the ability to make a 1- or 2-player SNAKE game
  • Play 21 (Blackjack)
  • Calculate and display division in decimal places and remainders

How would you do it ?

And more interestingly, could this be implemented and emulated from PICO ?

P#57402 2018-10-02 17:30 ( Edited 2018-10-02 21:39)

I'm a bit confused as to how somethings would work; in your memory map you only allocate 32 nibbles for the screen (I suspect you're using one bit per pixel), you're talking about both 8 colored-lights but you're also saying the screen is black and white (I'm assuming that the lights are separate from the screen) and I straight up don't understand the 3-digit LEDs. Please help.

P#100753 2021-11-24 13:57

And also wondering is every memory address a nibble or a byte.

P#100768 2021-11-24 20:38

@Matcha155 It assumes the memory layout is addressed in 8-bit bytes (instead of nibbles), despite this being called a "4-bit" assembler. Each byte would hold two 4-bit opcodes.

P#100769 2021-11-24 23:23
:: dw817
1

Hi @stinkerbo6. Actually it's been done, and in even less of an instruction set. 3-bit. It has a terrible name but a powerful instruction set. Called, "BrainF---."

You can find more information on it here:

As for memory mapping in my monster, I want it be exactly 256-bytes for total access, graphics, sound, and RAM space of code. 256-bytes of storage is just such a magical thing.

At some point I may approach this again, I'm still working on a different project for now.

P#100776 2021-11-25 00:34 ( Edited 2021-11-25 00:54)

I have thought about it and I think that memory could be boosted to 4KB by using the first four bits for the opcode and the next four bits plus the next byte for 4KB of memory, similar to some pretty old 4-bit computer.

P#101375 2021-12-03 20:44
:: dw817

That's a clever way to do it, @Matcha155. It's not so much the accessing that I want to limit, but the memory size. To 256-bytes maximum which includes absolutely everything.

I fully believe it is possible to make an interesting and useful virtual 256-byte computer.

P#101380 2021-12-03 23:00

True.

P#101381 2021-12-03 23:28

Instruction wise, they would have to be carefully chosen for maximum capability.
I was thinking something similar to Ben eater's 8-bit cpu.

P#101382 2021-12-03 23:29
:: dw817

Link ?

P#101383 2021-12-03 23:32
P#101384 2021-12-03 23:35
:: dw817

Oho ! He means the real thing. Yeah that's completely beyond my ability, @Matcha155. Interesting to look at though ... No, what I would write will be pretty close to the Atari 2600 BASIC.

While the display would obviously be better, the concept will be similar and I have already written some code for the interface a few years back.

https://youtu.be/aSvsjvake_0?t=499

P#101386 2021-12-03 23:46 ( Edited 2021-12-03 23:47)

But there would have to be some underlying hardware(probably digital) and an instruction set, basic on the other hand is quite high level.

P#101427 2021-12-04 08:39
3

A "4-bit" computer doesn't just refer to the size of the instruction set—and often doesn't refer to the size of the instruction set either. It refers to the word size, which implies the size of a value stored at an address and the number of addresses. You've chosen a 256-byte address space, but if you want an address to fit in 8 bits, then each address must refer to a byte-sized (8-bit) word. If each address refers to a 4-bit word, then you have 512 possible addresses and need 9 bits to represent an address, which is inconvenient. (Hence Matcha155's suggestion to kick it to 4K, which would make use of all 12 bits in a three-word address.) This also impacts how instructions are stored. If each address is byte-aligned, then each instruction is byte-aligned. A 4-bit instruction without an operand would have to waste four bits to align the next instruction to the byte boundary. Compare the Intel 4004, a 4-bit processor with 46 instructions ranging from 8 to 16 bits wide.

You've assigned memory addresses to "system variables," but given your examples of "A, X, Y, etc." it sounds like you're referring to registers. Registers are special memory locations built into a CPU with dedicated instructions to manipulate them, and do not have addresses. It sounds like you're expecting to have a status register (flags like zero and carry), a program counter (the address of the current instruction), and a stack pointer. You probably want an accumulator ("A"): without one, nearly every instruction would need one or two address arguments, making each such instruction very wide and eating up the limited code space.

You'll want clear, set, and branch instructions for each flag in the status register. Some CPUs use "relative" addressing for branch instructions, so their argument isn't a full address but an offset of limited range to save space. In this case, maybe a branch could be limited to three 4-bit words: one for the instruction and two for the offset. Using just one word for the offset is probably too little (-7/+8 words). Or maybe branch instructions have full addresses as arguments, so they're four words wide. You'd also want an unconditional jump instruction that takes a full address.

You'll want an instruction for loading the contents of a memory address into the accumulator, and one for storing the accumulator in a memory address. You'll also want another instruction for loading an "immediate" value (a value stored with the instruction) into the accumulator. Add and subtract instructions, with immediate (add/subtract this value to the accumulator) and absolute (add/subtract the contents of this memory address to the accumulator) addressing variants. Bitwise math operations. Stack operations: jump to subroutine, return from subroutine; push and pull the accumulator; push and pull the status register; copy stack pointer to accumulator, copy accumulator to stack pointer.

And so forth. I'm citing concepts from traditional vintage CPUs (like the 6502) because it sounds like that's what you're after. There's a lot to think about if you want to store instructions in addressable memory, even for a "fantasy" CPU/virtual machine and even if you don't want to emulate a real CPU. Of course, there are theoretical extremes in terms of instruction set size: see One-instruction set computers. Notice that smaller instruction sets tend to need more memory for code. Also check out the 4-bit processors in this Sharp Microcomputers Data Book, some of which were used in Nintento Game and Watch toys.

P#101429 2021-12-04 09:37

Yes, but @dw817 explicitly stated that the opcodes would be 4-bit.
What could be done is using the first bit to determine the data format, for example if the first bit is a 0 then the data format could be 0(format bit) 0000(opcode) 000+next byte(address) and if the first bit is a 1 then the data format coulde be 1(format bit) 0000000(opcode) next byte(address(or not)).

P#101432 2021-12-04 11:07

0000: nop(no operation) 0000 01: out(to out register) 0000 10: in(to in register)
0001: jmp(jump to address (only 15-bits)) 0001 1: jsr(jump to subroutine and store return value on stack)
0010: lda(load a)
0011: sta(store a)
0100: jzs(jump to subroutine if zero flag is set) 0100 01: jzc(jump to subroutine if zero flag clear) 0100 10: jcs(jump to subrotine if carry flag set) 0100 11: jcc(jump to subroutine if carry flag clear)
0101: rts(return from subroutine) 0101 1: rti(return from interupt)
0110: ldb(load b)
0111: stb(store b)
1000: log(logic) 1000 00: not(logical not to a) 1000 01: or(logical or with memory and a) 1000 10: and(logical and with memory and a) 1000 11: xor(logical xor with memory and a)
1001: stz(set zero flag) 1001 001: clz(clear zero flag) 1001 010: stc(set carry flag) 1001 011: clc(clear carry flag) 1001 100: sti(set interupt falg) 1001 101: cli(clear interupt flag)
1010: add(add memory to a)
1011: sub(subtract memory from a)
1100: pta(pop from stack and store in a) 1100 1: pas(push a to stack)
1101: bsl(bit shift left) 1101 1: bsr(bit shift right)
1110: adi(add immediate value to a(only 4-bits))
1111: sbi(subtract immediate value from a(only 4-bits))

This is my proposed instruction set, for the cases of multiple instructions on one line, the extra cases use some extra bits hence sacrificing some addressing(the jump instruction can only jump to the second half of memory).

P#101461 2021-12-04 20:23

I'm not objecting to a 4-bit instruction set, I'm just saying you have to think about the rest of the design along with it. For example, does your lda take an immediate value or an address? If you reserve a bit to make that distinction, then you've effectively doubled the size of the instruction set and it's not meaningfully a 4-bit instruction code.

Variable width instructions are totally a thing, that's fine. You can ignore the electronic implications for a fantasy CPU and just encode it however you like, but you still have to think about how it'll be used. For example, notice that your "1100" instruction is actually "11000" to distinguish it from "11001". Your machine must read both nibbles, and there's no sense in which the "1100" instruction is one nibble wide. In other words, "1100" is always a prefix for a double-wide instruction code. It's probably worth thinking carefully about which instructions will be most commonly used, and giving them one-nibble instruction codes that aren't prefixes for two-nibble codes.

Other decisions like nibble addressing vs. byte addressing are important for instruction encoding because they affect instructions that take arguments. Keep in mind that packing instructions into bit patterns (e.g. 5 bits for an instruction and 9 bits for an address across two bytes) makes self-modifying code difficult, a possibly important technique for a memory constrained machine.

Try it! It's not difficult to make a virtual machine for a tiny instruction set, even in PICO-8. Implement a memory visualizer and an interpreter for your encoding, and see where the pain points are in writing simple programs for it.

P#101476 2021-12-04 22:26

True, kinda a brain fart on my part.
Well, what about using a dedicated memory address for these extra bits, for example: log(logic) could use the first two bits of some reserved address to determine what logical operation to perform.

P#101477 2021-12-04 22:37

You'll have to spend time setting those bits before calling the instruction:

ldb #1    0110 0001
stb $0f   0111 0000 1111
log       1000

So "not a" is six nibbles. (I assumed nibble-sized registers and byte-sized addresses in this example.)

Slightly more realistic is to affect the behavior of an instruction with a CPU register instead of a memory-mapped register. There'd be a dedicated instruction (or several) for manipulating the CPU register that does the job of "ldb" and "stb" here, possibly saving a nibble or two. Imagine a nibble-sized "VAR" register and a "svar" instruction:

svar #1   ???? 0001
log       1000

Variable-length instruction encoding is probably more efficient overall, especially with good choices for which instructions are the longer ones.

P#101480 2021-12-04 22:54

Ok.

I'll refine the instructions.

P#101487 2021-12-04 23:20

0000: nop
0001: jmp
0010: lda
0011: sta
0100: jsr
0101: rtj
0110: ldb
0111: swp
1000: var
1001: log
1010: add
1011: sub
1100: ldi
1101: lds
1110: adi
1111: sbi

Here it is.

P#101490 2021-12-04 23:32

The special ones are: rtj which by default is the same as rts but if say bit 7 of var is a one is the same as rti, log obviously the first two bits and possible all of the load and store instruction for immediate modes.

P#101491 2021-12-04 23:35

Ok, now try writing programs with it. For a machine this small you can probably do it on paper without building an interpreter.

P#101498 2021-12-05 00:19

Wait are we talking about 256 bytes or 4-kiloBytes?

P#101530 2021-12-05 09:21

I noticed a major flaws with the instructions, such as the fact that it is not turing complete because I forgot to add a jpf(jump if flag) instruction.

P#101535 2021-12-05 12:58

This is my revised suggestion for the design. Please tell me if I efed something up.

BASIC ARCHITECTURE

memory: 256 bytes of sram

memory layout: 0x00-0x0f general purpose variable use. 0x10-0x1f stack. 0x20-0x3f screen data. 0x40 controller input. 0x7e reset vectore. 0x7f interrupt vector. 0x10-0x1f stack. 0x80-0xff program data.

controller input: bit 0 is a d-pad left, bit 1 is d-pad right, bit 2 is d-pad up, bit 3 is d-pad down, bit 4 is d-pad a/x, bit 5 is d-pad b/o, bit 6 is pause

All of the memory layout is optional and just recomended, except for the reset and interrupt vectors.

registers: 8-bit program counter aka. pc. 8-bit var. 4-bit stack pointer aka. sp. 8-bit accumulator aka. a. 8-bit general use register b

INSTRUCTIONS

0000: nop (no operation)
0001: jmp (jump to address) (saving the return address is determined by bit 5: 0=no uses b offset, 1=yes)
0010: lda (load a) (addressing mode determined by bit 2-3: 00=immediate, 01=with b offset, 10=indirect, 11=absolute)
0011: sta (store a)
0100: bcf (branch if flag) (condition is determined by bits 6-7: 00=zero set, 01=zero clear, 10=carry set, 11=carry clear)
0101: jpf (jump to subroutine if flag) (condition is determined by bits 6-7: 00=zero set, 01=zero clear, 10=carry set, 11=carry clear)
0110: ldb (load b) (addressing mode determined by bit 2-3: 00=immediate, 01=with b offset, 10=indirect, 11=absolute)
0111: swp (swap a and b)
1000: log (logic a and memory) (operation determined by var bits 0-1: 00=not, 01=or, 10=and, 11=xor)
1001: lgi (logic a and operand) (operation determined by var bits 0-1: 00=not, 01=or, 10=and, 11=xor)
1010: add (add a and memory)
1011: sub (subtract a and memory)
1100: ps (push or pop stack (uses a seperate stack pointer)) (operation determined by bit 4-5: 00=pop stack to var, 01=push a to stack, 10=pop stack to a, 11=pop)
1101: abs (arithmatic bit shift on a) (direction determined by var bit 4: 0=left, 1=right)
1110: adi (add b and operand)
1111: sbi (subtract b and operand)

VAR LAYOUT

0 0 (logical operator) 0 0 (load instruction addressing modes) 0 (bit shift direction) 0 (save bool) 0 0 (condition flag)
P#101560 2021-12-05 18:53 ( Edited 2021-12-05 23:36)

Have you tried writing programs with it?

Your next step is to test your design. You're proposing that people write programs for a machine with 128 bytes of code, and you've chosen certain instructions to be the wide ones. Are those the right choices? Is your specification complete? Is it efficient? You won't know until you try using it.

Just do it on paper. Write a program that sets the accumulator. Write a program that sets memory address 0. Write a program that sets memory address 1 to the contents of memory address 0. Write a program that increments memory address 1 by 1 in a loop, from 0 to whatever is stored in address 0. Hand-assemble these programs and see how much space they take. Evaluate them by hand as if the computer will do it. Have you made all of the design decisions needed to write complete and accurate programs that do these things?

You've described stacks and interrupts, as well as memory-mapped I/O. How do these features work in your computer? At what point does memory-mapped I/O change when input changes? How would a program know that I/O has changed? Write a short program that uses these features, then try to evaluate them.

With I/O and a display, write a program that displays a dot of a particular color: if the value at address 0 is greater than the value at address 1, make the dot red; if the value at address 0 is less than the value at address 2, make it blue; otherwise make it white. D-pad up and down adjust the value at address 0. How long is this program? Does it fit into 128 bytes?

You've made some design decisions that diverge from historical designs for small processors. That's fine! I think you'll find that those historical designs made those decisions for good reasons. The best way to learn those reasons is to try your design and find out.

P#101591 2021-12-06 00:38

Ok, just one question, do you think that the emulator needs to be cycle exact or just a simple one?

P#101611 2021-12-06 09:15

I'm making an emulator in c now.

P#101630 2021-12-06 17:05
:: merwok

You should write a virtual machine (a program that implements your instruction set), and it should follow its specs (that’s the purpose of it!), or lead you to change the specs when impractical.

P#101639 2021-12-06 18:19

Well that's what I'm doing.

P#101645 2021-12-06 18:51

Take it a step at a time. Validate your instruction set and encoding with simple experiments before writing an interpreter. Write a simple interpreter without consideration for timing before making decisions about cycles. Get it updating memory locations before implementing peripherals (display, input). Hand-assemble before writing an assembler. You'll be updating your design decisions as you go, so you want to spend the least amount of effort getting to those decisions and updating your implementation.

You describe a "var" register but haven't specified a way to set it. That's the kind of thing you want to notice before putting too much effort into an implementation.

P#101650 2021-12-06 20:26

Yes.

P#101657 2021-12-06 20:49

[Please log in to post a comment]

Follow Lexaloffle:        
Generated 2021-12-07 21:30:53 | 0.094s | Q:45