Finally fairly happy with VarvaraPSP's performance. Still a little sluggish in Oquonie, but playable if you're patient. A big thanks to @asie for PSP optimization ideas/support, and @bd for uxngba's screen_blit lookup table implementation! #uxn https://codeberg.org/tbsp/VarvaraPSP
This entry was edited (1 year ago)
Devine Lu Linvega
Unknown parent • • •tbsp
in reply to Devine Lu Linvega • • •Devine Lu Linvega
Unknown parent • • •tbsp
in reply to Devine Lu Linvega • • •Using my very rough, "ticks per screen vector eval" the old core uses 28% more ticks during the 100r animation on Oquonie startup, and 6% more ticks during major gameplay updates. Considering compiler flags just dropped those numbers by close to 2/3 it might be worth the clarity.
Using the simple screen_blit instead of uxngba's increases ticks by 40-60%, which is quite a bit though, even if it makes the code far simpler.
Devine Lu Linvega
in reply to tbsp • • •So this core(https://git.sr.ht/~rabbits/uxn/tree/main/item/src/uxn.c) is 28% more demanding.
And this(https://git.sr.ht/~rabbits/uxn/tree/main/item/src/devices/screen.c#L50) is 40-60% more demanding?
~rabbits/uxn: src/uxn.c - sourcehut git
git.sr.htDevine Lu Linvega
Unknown parent • • •tbsp
in reply to tbsp • • •kelp
in reply to Devine Lu Linvega • • •@neauoire
not bd, but was able to confirm edge smearing while testing varvarapsp and oquonie
Devine Lu Linvega
in reply to kelp • • •tbsp
in reply to Devine Lu Linvega • • •tbsp
in reply to tbsp • • •Devine Lu Linvega
in reply to tbsp • • •tbsp
in reply to Devine Lu Linvega • • •6-28% for the Cpu core, depending on the code being run, but yes. Those measurements are very rough though, and I wouldn't base too much on them. I've updated VarvaraPSP to allow using either core/blit based on flags for now.
Edit: To clarify the screen_blit is 40-60% more demanding than the uxngba lookup/mask based approach (when run on PSP). Not when compared to your recent inlined screen_blit.
manifoldslug
in reply to Devine Lu Linvega • • •@neauoire sorry for reviving this thread, but in my defense: I come bringing measurements
the on_screen event handler of my spaceship thingy over ~2M instructions (w/ chibicc): 42.8 -> 39.2 MIPS
10M instructions of generating and popping fibonacci: 97.1 -> 63.2 MIPS
it seems that in my use case the logic was dense with emulator-specific dei/deo handling
Devine Lu Linvega
in reply to tbsp • • •Devine Lu Linvega
in reply to manifoldslug • • •@manifoldslug so from now on, let's call the classic core abc-style, and reg style tnl-style so we know what we're talking about haha.
https://git.sr.ht/~rabbits/uxn/tree/main/item/etc/cores
These two cores have the exact same APIs now so they're interchangeable 🙌
~rabbits/uxn: etc/cores/ - sourcehut git
git.sr.htBad Diode
in reply to Devine Lu Linvega • • •Devine Lu Linvega
in reply to Bad Diode • • •Bad Diode
in reply to Devine Lu Linvega • • •Devine Lu Linvega
in reply to Bad Diode • • •rhizomatic arcade
Unknown parent • • •Bad Diode
Unknown parent • • •Devine Lu Linvega
in reply to rhizomatic arcade • • •@arcade https://merveilles.town/@bd/110854709921561932
Bad Diode
2023-08-08 15:30:35
Bad Diode
Unknown parent • • •tbsp
in reply to Bad Diode • • •Devine Lu Linvega
in reply to tbsp • • •I just realized something pretty cool this afternoon.
All the POP() POP(), or PUSH() PUSH(), sequences of macros in the abc core can be merged(a bit like the tnl core), so the short-mode flag is checked only once, and the overflow/underflow is checked once too.
It makes the binary size for uxn quite a lot smaller too? It's kindda neat, check it out 😀
https://git.sr.ht/~rabbits/uxn/tree/main/item/src/uxn.c#L85
~rabbits/uxn: src/uxn.c - sourcehut git
git.sr.ht