Skip to main content


Finally fairly happy with VarvaraPSP's performance. Still a little sluggish in Oquonie, but playable if you're patient. A big thanks to @asie for PSP optimization ideas/support, and @bd for uxngba's screen_blit lookup table implementation! #uxn https://codeberg.org/tbsp/VarvaraPSP
This entry was edited (1 year ago)
Unknown parent

Devine Lu Linvega
I'm surprised that it's slow on the PSP.. I'll go over oquonie this fall and make a bunch of optimizations.
Unknown parent

Devine Lu Linvega
Could you try something for me? I've replaced the core with the old one, it's a bit slower but it's nicer. Could you pull it and try the PSP with it, and tell me if it's slower? If you have the time.
in reply to Devine Lu Linvega

Using my very rough, "ticks per screen vector eval" the old core uses 28% more ticks during the 100r animation on Oquonie startup, and 6% more ticks during major gameplay updates. Considering compiler flags just dropped those numbers by close to 2/3 it might be worth the clarity.

Using the simple screen_blit instead of uxngba's increases ticks by 40-60%, which is quite a bit though, even if it makes the code far simpler.

This entry was edited (1 year ago)
Unknown parent

Devine Lu Linvega
do you also see edge smearing with uxnemu?
in reply to tbsp

@neauoire As an aside, when using the newest Oquonie I'm still seeing edge smearing when using the updated color 0 behaviour. Are there outstanding updates there, or have I managed to produce a wonky binary?
in reply to Devine Lu Linvega

@neauoire

not bd, but was able to confirm edge smearing while testing varvarapsp and oquonie

in reply to kelp

damn, why am I not seeing edge smearing.. I'll look into it. thanks 😀 You have oquonie and uxn latest commits? Could I see a screenshot?
This entry was edited (1 year ago)
in reply to Devine Lu Linvega

@neauoire I see it in uxnemu with the Oquonie ROM I'm using on PSP, but I just tried to rebuild Oqnonie from source and am getting strange results (messed up palettes). So I'm probably doing something wrong.
in reply to tbsp

@neauoire I deleted my old save file and that seemed to fix it! So, everything looks fine, no smearing or anything unexpected. Sorry for confusion!
in reply to Devine Lu Linvega

6-28% for the Cpu core, depending on the code being run, but yes. Those measurements are very rough though, and I wouldn't base too much on them. I've updated VarvaraPSP to allow using either core/blit based on flags for now.

Edit: To clarify the screen_blit is 40-60% more demanding than the uxngba lookup/mask based approach (when run on PSP). Not when compared to your recent inlined screen_blit.

This entry was edited (1 year ago)
in reply to Devine Lu Linvega

@neauoire sorry for reviving this thread, but in my defense: I come bringing measurements

the on_screen event handler of my spaceship thingy over ~2M instructions (w/ chibicc): 42.8 -> 39.2 MIPS

10M instructions of generating and popping fibonacci: 97.1 -> 63.2 MIPS

it seems that in my use case the logic was dense with emulator-specific dei/deo handling

in reply to tbsp

ah! okay, well that's reassuring. I think I'll keep the abc-style core for now. it's a lot more readable for a reference implementation and the speed loss isn't too bad.
in reply to manifoldslug

@manifoldslug so from now on, let's call the classic core abc-style, and reg style tnl-style so we know what we're talking about haha.
https://git.sr.ht/~rabbits/uxn/tree/main/item/etc/cores

These two cores have the exact same APIs now so they're interchangeable 🙌

in reply to Devine Lu Linvega

@neauoire agreed, actually I haven't looked at the playdate code in a while, but maybe we could add some of the uxngba optimizations to get oquonie running full speed
in reply to Bad Diode

that would be amazing! When we're back in Victoria this fall, releasing oquonie to the playdate will be our focus, I'd love to have your help on this
in reply to Devine Lu Linvega

@neauoire sure! I don't have a playdate to test, but if I recall the emulator worked decently well 😀
in reply to Bad Diode

yeah, the emulator is surprisingly solid, the devtools are great.
Unknown parent

rhizomatic arcade
what about -O3 (I do not know how optimization flags work I just know that O3 exists and it's larger than 1 or 2) @tbsp @neauoire
Unknown parent

in reply to rhizomatic arcade

@arcade https://merveilles.town/@bd/110854709921561932


@neauoire O3 is generally not worth it, sometimes things break and the assembly is generally larger. O2 is pretty much the standard, and don't tend to cause issues, not sure why you wouldn't want the best codegen 😛 sometimes you can choose to optimize for size, but in this case I don't think is necessary

Unknown parent

Bad Diode
@neauoire O3 is generally not worth it, sometimes things break and the assembly is generally larger. O2 is pretty much the standard, and don't tend to cause issues, not sure why you wouldn't want the best codegen 😛 sometimes you can choose to optimize for size, but in this case I don't think is necessary
in reply to Bad Diode

@neauoire I tried all the way up to O3 but 2/3 didn't seem to yield any notable gains. The way I'm profiling isn't great though, so it's very possible O2 is worth using as it doesn't seem to break anything. Things run so much better now I'll have to try dropping the CPU frequency to see how low I can drop before dropping frames.
in reply to tbsp

I just realized something pretty cool this afternoon.

All the POP() POP(), or PUSH() PUSH(), sequences of macros in the abc core can be merged(a bit like the tnl core), so the short-mode flag is checked only once, and the overflow/underflow is checked once too.

It makes the binary size for uxn quite a lot smaller too? It's kindda neat, check it out 😀

https://git.sr.ht/~rabbits/uxn/tree/main/item/src/uxn.c#L85

This entry was edited (1 year ago)

Lo, thar be cookies on this site to keep track of your login. By clicking 'okay', you are CONSENTING to this.