SDL hardware acceleration

I have been researching a little bit about SDL hardware and software surface performance to get a way to improve rendering in non OpenGL contexts. Right now Player uses 20~30% cpu to just draw graphics (windowed mode, 640x480, 32 bpp, Core 2 Duo@1.9 Ghz), which isn’t too bad. However, in older CPUs, Player won’t even be able to reach 60 fps just with the rendering (my CPU set to 800 Mhz limit maxes out at 40 fps). Right now (SVN rev. 754), almost 90% of cpu consumption goes to drawing. With OpenGL support, this is obviously not a problem.

Player now works like this (for those of you who don’t know, or aren’t sure): we load all graphics (AKA sprites) to system memory, do all calcs, blit everything to a secondary 320x240 screen and zoom that screen up to 2X directly to another 640x480 surface, which is the one shown (flipped). In order to achieve and efficient hardware surface manipulating, no direct pixel accesing can be done in real time. This means that we can no longer zoom “on the fly”, because we would end up getting poor performance. Instead all surfaces would be loaded in system memory, zoomed 2X and then stored in video memory. That of course, will increase memory usage, but it would be hard to fill even a 32 Mb Ram graphics card with simple 2D. Large graphics that don’t need to be used lots of times can be stored in system memory and accelerate blit them to video (I’ll explain this better later).

There are still a few questions unanswered. First is how to handle complex animations that require direct pixel accesing, like image effects rotation, waving or changing hue. Well, this isn’t certainly hardware accelerated and cpu will have to do the job. Is this going to be slow as hell then? No! It just won’t be “as fast” as they were in video memory. I have made a real quick application that uses SDL and the directx driver in Windows and shows which operations are hardware accelerated and which are not (Click here to see its code). Here’s the output:

[quote]Is it possible to create hardware surfaces? yes

Is there a window manager available? yes

Are hardware to hardware blits accelerated? yes

Are hardware to hardware colorkey blits accelerated? yes

Are hardware to hardware alpha blits accelerated? no

Are software to hardware blits accelerated? yes

Are software to hardware colorkey blits accelerated? yes

Are software to hardware alpha blits accelerated? no

Are color fills accelerated? yes

Total amount of video memory in Kilobytes: 850652

Video driver: directx[/quote]

As we can see, software to hardware blits are accelerated (this means, a surface stored in system memory will be blitted to video memory very fast). What causes the really bad performance then? Software to hardware goes fast because the CPU can write to video memory very fast, but not the other way around. CPU cannot read fast from video memory, and it is the reason for many bottlenecks. This is because of how AGP buses work*. So to summarize, large images/pictures should be stored and manipulated in system memory and then blitted to video memory.

Second question is, how do we handle transitions effectively? Indeed, transitions operate on the whole screen, which is in video memory. Since we cannot take the screen from video memory, modify it and blit it back, the solution would be to freeze all drawings to prepare for transition, so no more blittings take place while we modify the screen, take the screen surface and copy it to system memory, modify it there and then send every frame to video screen. This kind of transitions will make heavy use of cpu, which is expected (no OpenGL support is being used here, remember) and may slow down fps in old computers. Fade in (not fade out) won’t be able to be accelerated yet, either. That will come with SDL 1.3. :wink:

Finally, I have to say that if we are unable to get hardware surfaces, which is whenever we don’t run in fullscreen and some video drivers won’t give hardware surfaces at all, this method will work well too, with the increased cpu usage that comes from losing hardware acceleration.

So tell me what is your opinion about this and ask if you have any question! I will be happy to answer.

*Source: linuxdevcenter.com/pub/a/linux/2 … _anim.html

Great post, Zhek. The great video driver variety show how we need some testing. Maybe some micro-benchmark in first run could decide which rendering option by default use if none selected.

Now, the test. The shitty NVIDIA Unix driver sponsors this log:

Thank you, NVIDIA driver.

My CPU is Centrino (Pentium M) only 1 core, 1,73 GHz. CPU usage in windowed mode with Compiz enabled: about 50%. At 320x240 windowed the cpu usage is about 25%.

Fortunately OpenGL is available and a serious rendering option to keep in mind. By the way, SDL 1.3 will support XRender extension, cool for 2D compositing including alpha blits.

P.S.: Other SDL_VIDEODRIVER options for SDL: sdl.beuc.net/sdl.wiki/SDL_envvars

Thanks for your testing. Have you tried DGA? Because afaict, x11 doesn’t support hardware surface at all.
Try this and test again:

#ifdef __linux__ putenv("SDL_VIDEODRIVER=dga"); #endif

Some interesting info from the x.org webpage:

[quote]The XFree86-DGA extension is an X server extension for allowing client programs direct access to the video frame buffer. This is a brief description of the programming interface for version 2.0 of the XFree86-DGA extension.

XFree86-DGA is not intended as a direct rendering API, but rather, as a mechanism to “get the X Server out of the way” so that some other direct rendering API can have full access to the hardware. With this in mind, DGA does provide clients some direct access to the hardware without requiring a separate rendering API, but this access is limited to direct linear framebuffer access. [/quote]
Source: x.org/archive/X11R6.8.2/doc/ … html#sect2

DGA extension is no more in NVIDIA driver since 2006:

osdir.com/ml/linux.suse.multimed … 00076.html