Skip to main content


I've been working on implementing callable words in #uxntal.

Instead of the typical PUSH->POP->JSR sequence each time you want to run a routine, these tokens are not pushing to the stack - they'll run immediately.

They'll be impractical for doing pointer arithmetic, but save 1 byte for every subroutine call, and will save millions of cycles in a project of the size of a text editor or drawing software.

They don't break compatibility with old roms, and make source files a bit more readable.
Unknown parent

Devine Lu Linvega
@tty

a0 12 34 = #1234, for reference
20 xx xx = !square, think JMP2
40 xx xx = ?square, think JCN2
60 xx xx = square, think JSR2
in reply to Devine Lu Linvega

@tty This looks awesome for JMP2 and JSR2!

I'm wondering a little about JCN2… we have a lot more JCNs in the project/ dir's files (they're 18× the abundance of JCN2), so while opcode 40 does save cycles, they take up the same amount of space.

One idea that keeps coming back to me is shoe-horning in a opcode for a conditional JSR2 instead of conditional JMP2… I struggle with what the opcode's name could be, but if we use it as “?square” in Uxntal then that might not be so bad!
Unknown parent

Devine Lu Linvega
@tty yep, they fill the unused opcodes, they follow the same format the LIT opcodes. I'll slowly roll out this feature as I experiment with it, I won't migrate apps to the new opcodes to give time to test this further, but I'll add it to uxnasm and drifblim soon so anyone can start noodling with it.
in reply to Andy Alderwick

@alderwick @tty it's true that that one won't save space, but it's still a crutial one in terms of cycles in JCN because it's BY FAR, the most used opcode.
in reply to Devine Lu Linvega

@alderwick @tty I've done a test implementation of Left where each jump using a static is using one of these 3 opcodes and it saves 470 bytes 👀
in reply to Devine Lu Linvega

@tty Oh! You can't argue with that pie chart 😁

It's tricky to know how much having a conditional JSR would save us, because the workarounds to not having it are so varied. But I shouldn't have gone for JCN as the opcode to eliminate to make room for it! 😅
Unknown parent

Devine Lu Linvega
@alderwick @tty I didn't think you could do that with switch cases! Won't it be slower tho? I really struggled with where to put this stuff. I chucked it into 0x00, but I can't be sure if it wouldn't be faster above the mode checks. 🤔
in reply to Andy Alderwick

@tty Oh, POPk, POP2k, POPkr and POP2kr… didn't see you there 🧐

If we break out the handling of ((opcode & 0x1f) == 0x00), we could do that again for ((opcode & 0x1f) == 0x02) :flan_cleaver:
in reply to Devine Lu Linvega

@tty I can't say which approach in the C code would be the fastest, thanks to compiler magic and so on, but I wouldn't say the situation is dire. My first optimisation would be to replace the if-else with another switch: http://okturing.com/src/14768/body (untested)

POP is pretty rare in your pie chart too, so even if POPs were made slower to implement the extra features, that isn't that bad either.
in reply to Devine Lu Linvega

@tty It's super late for me, but my point is we can make it fast one way or another! It might need more rejigging anyway to get the best performance, so my link is just for illustration. Let's focus on the features we want first, then we can optimise it later.
in reply to Andy Alderwick

@alderwick @tty good point 😀 Right now I'm just testing things. I'm not sure if that'd be interesting to you but I've been reading the book Stack Machines: A New Wave, and there's a TON of useful insights in there, maybe you'll get a kick out of it too.

https://users.ece.cmu.edu/%7Ekoopman/stack_computers/sec4_4.html
in reply to Devine Lu Linvega

@tty Oh please. Stack machines behave like particles, not waves! Please?

🥺 🙏

Please don't be both… not again…
Unknown parent

Devine Lu Linvega
@makeworld A macro is a string replacement, it puts in the code, copies of the same sequence of operations. This is more like a subroutine.
Unknown parent

I noticed that the entry for JSI on https://wiki.xxiivv.com/site/uxntal_reference.html says "Pushes the program counter to the return-stack" first, but wouldn't it be pushing PC+2 onto the stack, to skip the address to jump to?
Unknown parent

@makeworld
instead of

;function JSR2

which assembles to "LIT $ADDR JSR2"

function

assembles to "JSI $ADDR"

The instruction looks ahead of itself and jumps to that address while saving the proper return address on the r-stack.
in reply to Kira, feral fox 🦊 🏳️‍⚧️

@tty technically yes, I wasn't sure how technical I wanted to get with this. Literals are also doing this sort of thing where they move the PC over the length of the literal, I thought it might be best to keep it light? What do you prefer?
Unknown parent

Devine Lu Linvega
@tty okay then, I'll add the missing details, maybe I'll make a little div with a different background, to mean like, more technical details.
in reply to Devine Lu Linvega

fascinating. wonder if there are other obvious patterns in LIT usage. still feels like an enormous chunk of cycles on literally constants.
Unknown parent

Devine Lu Linvega
@parnikkapore @ddlyh @tty @makeworld I'll keep both, having gone over a couple of apps now to see how they convert, there's many situations where I can't just use one of the new words. Anything even a tiny bit fancy like UI components, you won't be able to handle with the new opcodes alone.
Unknown parent

~Parnikkapore
@ddlyh @tty @makeworld JSR2 would be useful if you want to do some pointer arithmetic (@JumpTableRoot ADD2 JSR2); if it was removed, you will have to STA2 to the address bytes to do this

But yeah, I think it's a better idea to keep only one. Maybe having both would be more "pona", but idk

this feels a lot like the "guix environment" vs "guix shell" situation

Lo, thar be cookies on this site to keep track of your login. By clicking 'okay', you are CONSENTING to this.