view README.NanoX @ 1542:a8bf1aa21020

Fixed bug #15 SDL_blit_A.mmx-speed.patch.txt -- Speed improvements and a bugfix for the current GCC inline mmx asm code: - Changed some ops and removed some resulting useless ones. - Added some instruction parallelism (some gain) The resulting speed on my Xeon improved upto 35% depending on the function (measured in fps). - Fixed a bug where BlitRGBtoRGBSurfaceAlphaMMX() was setting the alpha component on the destination surfaces (to opaque-alpha) even when the surface had none. SDL_blit_A.mmx-msvc.patch.txt -- MSVC mmx intrinsics version of the same GCC asm code. MSVC compiler tries to parallelize the code and to avoid register stalls, but does not always do a very good job. Per-surface blending MSVC functions run quite a bit faster than their pure-asm counterparts (upto 55% faster for 16bit ones), but the per-pixel blending runs somewhat slower than asm. - BlitRGBtoRGBSurfaceAlphaMMX and BlitRGBtoRGBPixelAlphaMMX (and all variants) can now also handle formats other than (A)RGB8888. Formats like RGBA8888 and some quite exotic ones are allowed -- like RAGB8888, or actually anything having channels aligned on 8bit boundary and full 8bit alpha (for per-pixel alpha blending). The performance cost of this change is virtually 0 for per-surface alpha blending (no extra ops inside the loop) and a single non-MMX op inside the loop for per-pixel blending. In testing, the per-pixel alpha blending takes a ~2% performance hit, but it still runs much faster than the current code in CVS. If necessary, a separate function with this functionality can be made. This code requires Processor Pack for VC6.
author Sam Lantinga <slouken@libsdl.org>
date Wed, 15 Mar 2006 15:39:29 +0000
parents 26dafefeebb2
children
line wrap: on
line source

  =================================================================
  Patch version 0.9 of SDL(Simple DirectMedia Layer) for Nano-X API
  =================================================================
  
  Authors: Hsieh-Fu Tsai, clare@setabox.com
           Greg Haerr, greg@censoft.com

  This patch is against SDL version 1.2.4.
  It enhances previous patch 0.8 by providing direct framebuffer
  access as well as dynamic hardware pixel type support, not
  requiring a compile-time option setting for different framebuffer
  modes.
  Tested against Microwindows version 0.89pre9.

  Older Microwindows versions
  ===========================
  If running on a version older than Microwindows 0.89pre9,
  the following items might need to be patched in Microwindows.

  1. Patch src/nanox/client.c::GrClose()
  It fixes the client side GrClose(). In the original version, 
  GrOpen() can only be called once. When the GrOpen() is called at 
  the second time, the program will terminate. In order to prevent
  this situation, we need to insert "nxSocket = -1" after 
  "close(nxSocket)" in GrClose(). If you do not have this problem,
  you may skip this step. 

  2. Patch src/nanox/clientfb.c to return absolute x,y coordinates
  when using GrGetWindowFBInfo().  Copy the version 0.89pre9
  of src/nanox/clientfb.c to your system, or configure
  using --disable-nanox-direct-fb.

  =============
  Quick Install 
  =============

  1. ./configure --disable-video-x11 --disable-video-fbcon \ 
       --enable-video-nanox \ 
       --with-nanox-pixel-type=[rgb/0888/888/565/555/332/pal] 
  2. make clean 
  3. make 
  4. make install (as root) 

  ============
  Nitty-gritty 
  ============

  --enable-nanox-direct-fb       Use direct framebuffer access
  --enable-nanox-debug           Show debug messages 
  --enable-nanox-share-memory    Use shared-memory to speed up 

  When running multi-threaded applications using SDL, such
  as SMPEG, set THREADSAFE=Y in Microwindows' config file,
  to enable GrXXX() system call critical section support.

  =============================================
  Some programs can be used to test this patch. 
  =============================================

  1. http://www.cs.berkeley.edu/~weimer/atris (a tetris-like game) 
  2. http://www.libsdl.org/projects/newvox/
  3. http://www.libsdl.org/projects/xflame/
  4. http://www.libsdl.org/projects/optimum/ 
  5. http://www.gnugeneration.com/software/loop/ 
  6: http://www.lokigames.com/development/smpeg.php3 (SMPEG version 0.4.4)

  =========
  Todo List 
  =========

  1. Create hardware surface
  2. Create YUVOverlay on hardware
  3. Use OpenGL
  4. Gamma correction
  5. Hide/Change mouse pointer
  6. Better window movement control with direct fb access
  7. Palette handling in 8bpp could be improved

  =====================
  Supporting Institutes
  =====================
  
  Many thanks to go to Setabox Co., Ltd. and CML (Communication and
  Multimedia Laboratory, http://www.cmlab.csie.ntu.edu.tw/) in the 
  Department of Computer Science and Information Engineering of 
  National Taiwan University for supporting this porting project.
  
  Century Embedded Technologies (http://embedded.censoft.com)
  for this patch.

  ===================
  Contact Information
  ===================

  Welcome to give me any suggestion and to report bugs.
  My e-mail address : clare@setabox.com or niky@cmlab.csie.ntu.edu.tw
                      or greg@censoft.com