Memory usage

This page aims at providing some details on memory usage inside REDsdk. We'll review different parameters that are important to be aware of when using REDsdk. Due to the fact that REDsdk is an hybrid engine, capable of CPU and / or GPU rendering, memory management needs to be detailed to get a global understanding of the engine behavior.

Running REDsdk hardware, hybrid or software

The RED::OPTIONS_RAY_ENABLE_SOFT_TRACER option value rules the overall behavior of REDsdk. This value is set on startup, as detailed here: Hardware, hybrid or software start. This freezes the REDsdk memory management:

Hardware image size limits

Keep this in mind: some hardware systems have limited image dimensions; For instance, it may not be possible to create a 32.768 x 16.384 pixel image on a given GPU, simply because the image is consuming too much memory for it to handle such large dimensions. Therefore, REDsdk will silently reduce the size of the image before uploading it on the GPU, so that it has a chance to fit in the video memory. Then:

Recent GPUs rarely suffer from that problem, as the maximal image size they can handle is generally 16.384 or above.

Exceeding the available video memory

Hardware or hybrid applications may for one reason or another exceed the amount of video memory. First, on some systems, there's no real video memory: video memory is shared with the main CPU memory, so there's no real amount of memory which is only dedicated to graphics. Then, exceeding the amount of video memory will cause some slowdowns to occur, but some applications can bear this, as the graphic driver does a good paging job here: it'll upload data from the system memory to the video memory regularly.

REDsdk offers you several tricks to reduce the memory consumption. We review them below:

Handling the RED_DRV_ALLOC_FAILURE return code

A call to RED::IWindow::FrameDrawing may return RED_DRV_ALLOC_FAILURE. If this arise, the first thing to be aware of is that this is not fatal to the application. The application is still running normally, but it could not complete the rendering of the frame due to insufficient resources available on the GPU. So we need to reduce the memory usage on the GPU and start rendering again.

To do this we can change several parameters:

Turning on immediate mode geometry rendering

The RED::OPTIONS_IMMEDIATE_MODE can be set to force REDsdk to render geometries using old style OpenGL graphics through immediate mode calls. Every geometry will be processed from the CPU using glBegin(), glVertex(...), glEnd() sequences. This slows down the rendering a lot, but in this case, no video memory gets involved in the rendering of the geometries.

REDsdk can decide on a per material basis on using immediate mode rendering or not. This is ruled by both the option and the RED::IMaterial::SetImmediateMode call.

Flushing geometries out of the GPU

This is a mechanism that can be used to reduce the amount of video memory used by geometries on the GPU. Sometimes it may be useful for an application to keep geometries outside of any scene graph: think to geometries that need to be rendered on a certain event, or to geometries that have been preloaded but not yet displayed. The default behavior of REDsdk is to load all these geometries on the GPU so they are ready to draw once linked to a camera through a scene graph. So loaded geometries will use video memory, even if not rendered yet.

It's possible to remove geometries from the GPU in calling RED::IShape::RemoveFromGPU. Removed geometries won't consume any video memory until they get rendered again after having been linked again to a scene graph being displayed.

Memory analysis tools

The RED::MemoryAllocator and RED::MemoryLeakTracker classes can be used to hunt unwanted memory allocations, or to get global statistics on the memory being used by REDsdk at some point during the life of the application.

Hardware driver overhead

The OpenGL driver provided by hardware vendors uses some memory too. This can be a concern for hardware based or hybrid based REDsdk applications. The OpenGL specification forces the OpenGL driver to keep a copy of the data it manages, so that it can do the paging with the GPU whenever needed or so that it can answer to queries backward to the calling application. This memory usage should be reminded in establishing the global memory footprint of an application. While not part of the application itself, it does have an effect.

Reducing data memory on the GPU

The amount of data being loaded in the engine may become critical for large datasets that have to consider millions of triangles to render a single frame. If we consider a simple dataset with vertices, normals and a set of UV coordinates, the basic memory footprint it'll have on the GPU will follow - more or less - this equation:

AttributeByte size
Vertex12 bytes (vertex coordinates) +
12 bytes (normal coordinates) +
8 bytes (UV values).
Triangle12 bytes (indices for P0, P1 and P2).
Sum44 bytes.

So, if we assume that our data structure is roughly 1 vertex for 1 triangle (models with solids such as CAD models will use less vertices, models with many surfaces will use more vertices, but on average, we believe that this ratio is a good starting point), we can store the following datasets for the given video memory below:

Video memoryNumber of triangles
128 Mb2.9 millions
256 Mb5.8 millions
512 Mb11.6 millions
1 Gb23 millions

Consequently, this may not be enough for many applications, or the video memory requirements may get too high and force expensive GPUs to be used.

Note that we can exceed the total video memory and still maintain correct performances, but the more we exceed the available video memory, the more the frame rate drops.

Redsdk can be fine tuned to reduce the memory footprint used by the loaded data in two ways:

Reducing geometry channels accuracy

Many applications do not need channels with full accuracy, as they're visualizing large datasets in real-time and are not using high quality shaders. Therefore, we can save a lot of memory in reducing the accuracy of our input channels. We'll consider the normals example. Normals are usually loaded using 3 floats. We can load them using unsigned byte (see the hardware support list for channel formats below) instead after having remapped their values:

// Assuming our initial normal array is 'fnor' with 3 floats per vertex, and 'nb_vertices' vertices:
RC_TEST( imesh->SetArray( RED::MCL_NORMAL, NULL, nb_vertices, 3, RED::MFT_UBYTE, iresmgr->GetState() ) );

// Accessing our new normal array, re-encoding it:
unsigned char* unor;
RC_TEST( imesh->GetArray( (void*&)unor, RED::MCL_NORMAL, iresmgr->GetState() ) );

for( int i = 0; i < nb_vertices; i++ )
  unor[ 3 * i ] = (unsigned char)( 255.0f * ( fnor[ 3 * i ] + 1.0f ) / 2.0f );
  unor[ 3 * i + 1 ] = (unsigned char)( 255.0f * (fnor[ 3 * i + 1 ] + 1.0f ) / 2.0f );
  unor[ 3 * i + 2 ] = (unsigned char)( 255.0f * (fnor[ 3 * i + 2 ] + 1.0f ) / 2.0f );

Then, a pair of vertex and pixel programs can be used to decode these parameters:

// Transmit input normals (RED_VSH_NORMAL = 2) to the pixel shader stage:
vsh.Add( "MOV result.texcoord[0], vertex.attrib[2];\n" );

// Decode and renormalize normals for a pixel shader usage:
psh.Temp( "normal" );
psh.Add( "ADD normal, fragment.texcoord[0], { -127.5 }.x;\n" );
psh.Normalize( "normal", "normal" );

The quality loss resulting of this compression is hardly visible for most models. Then, this reduces our vertex cost from 44 bytes to 36 bytes, saving 8 bytes per vertex (we need to maintain 4 unsigned bytes per vertex for memory alignment, otherwise performances drop!).

A side effect of this optimization is that as we're using less memory on the GPU, the frame rate may slightly improve, as the GPU has to move less memory to do the rendering of a frame.

Most of the time, the same kind of technique can applied to UVs. If UVs are bounded - which is very often the case - then we can consider using short values or unsigned bytes again to reduce the memory footprint of a single data vertex.

Reducing triangle index space

REDsdk uses an internal implicit optimization for all meshes that have less than 65536 vertices. Index arrays are loaded using (unsigned short) values on the GPU rather than using (int) values. This can divide the amount of memory stored by indices by 50%.

Summary after reduction

If we apply all these optimizations, then our numbers become:

AttributeByte size
Vertex12 bytes (float vertex coordinates) +
4 bytes (unsigned byte normal coordinates) +
4 bytes (unsigned short UV values).
Triangle6 bytes (unsigned short indices for P0, P1 and P2).
Sum26 bytes.

And our average capacity is raised up to:

Video memoryNumber of triangles
128 Mb5.05 millions
256 Mb10.1 millions
512 Mb20.2 millions
1 Gb40.4 millions