Skip to main content


Hello :uxn: foks!

Just updated uxngba with a much more performant ARM-UXN core. In addition to that, the last ppu changes have been implemented and there is a new system for loading roms, which now enable bank switching, meaning you can load bigger roms like Oquonie on it!

The new core makes it almost playable and I believe it could be locked at 60Hz if the graphics updated only every other frame (needs to happen on the TAL side)

https://git.sr.ht/~rabbits/uxn-gba
https://git.badd10de.dev/uxngba/

#uxn #gbadev

in reply to Bad Diode

The new assembly core should be compatible with other ARM system with a few minor modifications (for example using the right .section directive, etc.).

https://git.badd10de.dev/uxngba/tree/src/uxn-core.s

Eager to see if it would improve the performance on the Playdate and raspberry pi ports 😀

There may still be room for optimization, but I'm quite happy with it for the most part. Let me know if you find errors or believe you could improve it in any way!

in reply to Devine Lu Linvega

sure I can take a look at it, been a while since then, where can I download the emu?
in reply to Bad Diode

added to repo, screen device is missing sprite flipping bit
in reply to Devine Lu Linvega

good, I downloaded the latest sdk and pulled the latest changes from uxn-playdate, seems to be working. I'll take a bigger look tomorrow 😀
in reply to Bad Diode

keep me posted, looking forward to test the acceleration of the new ARM core on hardware! ❤
in reply to Devine Lu Linvega

will do! still, the core is just part of it, but from a quick glance I see a lot of room for improvement on the drawing routines as well. For sure we will get oquonie running at 60fps here without an issue
in reply to Devine Lu Linvega

sooo how deep do you want me to go with this? I can setup the same build system I have for the gba so that we can include binary files directly (I think we are currently including the rom as a boot.c file, right?).

I can go nuts and just check out everything since there is no audio (yet), so only the main functions, build system and ppu to look at.

Do you want me to work on a separate branch or should I wing it on main?

in reply to Bad Diode

I want you to go full on 😀 work in main. I'm planning to go full time on oquonie playdate when we return from stange loop(october-ish) in the meantime, if you wanna try and see how we can make it fast, it'd be awesome.

If you make big changes that could also improve sdl/x11 implementation, let me know. I try to keep all the different device implementations in sync if I can.

in reply to Devine Lu Linvega

not planning super big changes to the devices but I think for more limited devices it may be needed to do a few things to squeeze some perf. Still, the X11 and SDL implementations are fine imo, easier to read for a reference implementation 😀
in reply to Bad Diode

Run into a bit of an issue. I can compile ARMASM and link it into the elf file, so technically it would run on hardware. HOWEVER, the way that the simulator works is that it compiles the C code into a native .so file in the host architecture (in my case x86_64) using gcc, so my ARM assembly can't compile for the simulator.

I guess that's why it's called a simulator instead of an emulator.

For this reason I can't really test the assembly implementation 🙁

Any ideas?

#playdate

CC: @neauoire

in reply to Bad Diode

ah well it really doesn't matter, since it looks like the cortext-m3 is thumb-only I created the core in ARM mode... I guess the only alternative would be to create the equivalent uxn-fast implementation in C based on my ARM concept lol :ouroboros:
This entry was edited (8 months ago)
in reply to Bad Diode

haha daaamn 🙁 did you generate your asm file using c macros?
in reply to Devine Lu Linvega

nope I wrote the entire thing by hand, though I did use macros on the assembly file. Still given my previous point (simulator not being able to load arm assembly) I think it’s probably better if I sink unholy amounts of time in making a fast C core lol
in reply to Devine Lu Linvega

I may just give it a try, I would like to use the same interface im using on the ASM core, that way I can really see the effect of the handwritten assembly core vs the C one (given them both use the same decoding algorithm). I may write an email on the mailing list with some of the changes I made for the ASM implementation
in reply to Bad Diode

go for it 😀 i'd love to know which implementation strategy you've used
in reply to Devine Lu Linvega

it’s mostly what you mention, with 256 individual functions, but also keeping the DIO/DEO functions as another virtual table. The trick is that I kept the stacks and pc addresses on individual registers to avoid many extra memory loads. Only DIO/DEO requires restoring registers, since it has to go back to C land for those
in reply to Bad Diode

aah! that's a good trick. I'm not sure how this could be enforced in C space tho?
in reply to Sigrid Solveig Haflínudóttir

guess you could also try a local register, but that's another hassle.
0/10 do not recommend.
in reply to Sigrid Solveig Haflínudóttir

Nah at that point you are just asking for trouble, best to let the compiler do what it can to optimize it and move on. I already did an ASM implementation either way, and who knows maybe the compiler is actually faster than my assembly with a couple of tweaks 😛
This entry was edited (8 months ago)
in reply to Bad Diode

yes. I don't think going full-on fomg-optimize via registers in C is worth the time. Manual asm might make sense for some time-consuming tasks, like converting pixels to the native screen format using simd (see aarch64 branch) and even then you might want to use bultins, which are subjectively easier.
in reply to Bad Diode

Got an initial fast (?) uxn-core in C for the playdate.

Need to do some testing still, but so far the testing rom, fibonacci and fizzbuzz seem to work. I probably should clean this up but there is some charm to this nightmarish code lol.

The eval function is just 267 LOC long!

The eval function:

in reply to Bad Diode

So, this is a fine approach, but if you want to make this even further, combine checks for under/overflow into one, don't do it on every PUSH/POP. You already know how far you have to move the stack pointer, so check the bounds right at the start.
https://git.sr.ht/~rabbits/uxn/tree/main/item/src/uxn.c#L36
in reply to Bad Diode

soooo.. what happens if you pop an empty stack? Do you get a stack of -1 length?
This entry was edited (8 months ago)
in reply to Devine Lu Linvega

you will probably crash and seg-fault... if you want speed you remove these bound checkings for release. You can switch the macros for debugging so that they include bound checkings with some IFDEF statements.
in reply to Bad Diode

hahaha, that has some strong bathtub catapult energy.

"Like, if you build a catapult strong enough that it can hurl a bathtub with someone crouching inside it from London to New York, it will feel very fast both on take-off and landing, and probably during the ride, too, while a comfortable seat in business class on a transatlantic airliner would probably take less time but you would not feel the speed nearly as much."
- Erik Naggum

This entry was edited (8 months ago)
in reply to Devine Lu Linvega

I mean... if you are releasing a program, what difference does it make if it tells you "error: i crashed because over/underflow" or that just crashes? The result is the same in both cases.

Furthermore, we just need to make programs that don't crash at all :3

in reply to Bad Diode

I like it, I just need to get into the habit of validating more routines, I'm kind of lazy on that front. I tend to check the util functions and don't bother with the more complicated ones.
in reply to Devine Lu Linvega

a good thing about this platform, is that if we have parity in the core implementations, you can develop with error checking and all the debugging thingies on your pc, then the rom can just be hurl into a faster interpreter once everything is bug free ^^
in reply to Bad Diode

I never considered this but, maybe you're onto something.. stacks could loop around(it's a bit nicer than a hard crash), and push validation to the assembler so you always have predictable arity, making bound checking obsolete?

uxn agda core when

in reply to Devine Lu Linvega

but a stack looping around is still invalid, how is that better than just crashing? I like crashing! It lets me know something is wrong haha
in reply to Bad Diode

in any case I do like the idea of more static validation on compilation/assembly, I saw that you already did some light type checking which is a good idea!
in reply to Bad Diode

arity checking for most routine types is easy, I'll need help from someone who's more familiar with this because I'm not sure I have the right approach when dealing with pointer arithmetics, like pulling a routine pointer out of an array in which case I loose the arity of the routine.
This entry was edited (8 months ago)
in reply to Devine Lu Linvega

i don't remember exactly where i saw it, maybe a talk about eForth, where the stack was "circular", so when you pop an empty stack, the stack pointer becomes 255
in reply to Devine Lu Linvega

https://groups.google.com/g/comp.lang.forth/c/qEndNz42bw0

"After twenty years of looking for a solution he said he
was very pleased to find one. He decided to make the
stacks circular. Stacks cannot overflow or underflow,
the wrap. He said, "Problem solved once and for all."

(he = Chick Moore)

that's just what @bd said ^^

Devine Lu Linvega reshared this.

in reply to max22-

I really really like this actually.. The more I think about this, the more I like it.

The system device has system/wst DEI that one can use to check for the stack state when needed, so people can write their own error handling and checking if when and where they need. It might not need to be done at every step after all..

in reply to Devine Lu Linvega

I like it! I still think error handling is best done during development either way. Oh one change I've been doing on my dei functions is have them return u16 instead of u8. From the assembly perspective, these are equivalent, and makes it so that you don't have to call the dei functions twice to get 16bit values:

typedef u16 (*DeiFunc)(u8 *dev, u8 port);
typedef void (*DeoFunc)(u8 *dev, u8 port);

Lo, thar be cookies on this site to keep track of your login. By clicking 'okay', you are CONSENTING to this.