The memory-analyzing tools tell you how much total memory a process is using,
the sizes of its memory segments, and the history and breakdown of its heap usage. This knowledge helps you determine
what programming steps are needed to reduce an application's memory footprint, which can greatly improve performance.
Memory efficiency is often critical in embedded systems, where memory is limited (especially with the absence of swapping)
and many processes need to run continuously. The optimization steps you'll want to take depend on what the analysis results
reveal about memory type distribution. For example, you can spend considerable time optimizing the heap but if your program
uses more static memory than it should, this other problem must be dealt with.
Memory distribution of processes
Virtual memory occupied by a process is separated into these categories:
- Code — Executable code (instructions) belonging to the application or static libraries.
- Shared Code — Executable code from shared libraries. If many processes use the same library, their virtual
segments containing its code are mapped to the same physical segment.
- Data — A data segment for the application and data segments for the shared libraries.
This memory type is usually referred to as static memory.
- Stack — Memory required for function stacks (there's one stack per thread).
- Heap — All memory dynamically allocated by the process.
- Shared Heap — Other memory allocated by different means, including shared and mapped memory.
The IDE has several tools for viewing process memory distribution. In the System Information,
the Memory Information view shows the memory breakdown
by type and provides details about individual segments. Note that type is different from virtual memory category;
the correspondance is given in How memory types relate to virtual memory categories.
You can view the heap distribution through the
Malloc Information view, which displays the used, overhead, and free heap memory sizes.
The Memory Analysis tool graphs this same information as well as all heap allocations and deallocations, in an
interactive editor window.
Through the Valgrind UI controls, you can run Massif to collect heap snapshots,
then analyze the heap breakdown measured at the detailed snapshots.
After examining the memory distribution data with these tools, you should focus on the areas of high consumption for
nonshared memory. Note that nonshared memory can include stack and heap memory used by shared libraries.
This term covers anything not created as a shared memory object; this last concept is explained in the
Shared memory entry of the System Architecture guide. Optimizing shared
memory is unlikely to notably reduce the overall memory consumption on the target machine.
The techniques for improving memory efficiency greatly vary for different memory types.
We outline some of these techniques below.
Heap optimizations
You can use the following techniques to optimize the heap:
- Eliminate explicit memory leaks
- The easiest way to begin optimizing the heap is to eliminate explicit memory leaks, which occur when blocks become
inaccessible because their pointer values aren't kept properly.
Memory Analysis lets you check for leaks at fixed intervals
and outputs a list of memory errors and tags any leaks with a keyword.
Valgrind Memcheck can check for specific leak types,
to identify leaks resulting from incorrect pointer values or broken pointer chains.
- Eliminate implicit memory leaks
- After fixing the explicit leaks, you should fix the implicit leaks.
These are leaks caused by heap objects that keep growing in size but remain accessible through pointers.
To find such cases, Memory Analysis lets you
filter the results to see only events
for unmatched allocations or deallocations or for blocks that remain in memory for the program's duration.
Viewing these events lets you find places where the program is steadily accumulating memory.
- Valgrind Massif gathers heap data that reveal the change in heap breakdown over time, which helps you spot increasing memory usage at precise
locations. Note that the Valgrind User Manual refers to these situations as space leaks.
- Reduce heap fragmentation
- Heap fragmentation occurs when a process accumulates many free blocks of varying size in noncontiguous addresses.
In this case, the process will often allocate another physical page even if it seems to have enough free memory.
- The QNX Neutrino memory allocator already solves most of this problem by preallocating many
small, fixed-size blocks known as bands. Using bands lets the allocator quickly find a free block
that fits the request size well, thereby minimizing fragmentation.
- In the Memory Analysis editor,
you can inspect the heap fragmentation by reviewing the Bins or Bands graphs.
An indication of serious fragmentation is if the number of free blocks of smaller sizes grows over time.
To deal with this, you can reorder heap allocations in your program. By allocating the largest blocks first,
you'll reduce how often the allocator must divide large blocks into smaller ones. Whenever this happens,
the smaller blocks can't be used later for bigger blocks because the address space is not contiguous.
- If your program logic allows for it, you can store data in multiple smaller structures that each fit within
the largest preallocated band size (typically, 128 bytes). Whenever a request exceeds this size, the block is
allocated in the general heap list, which means a slower allocation and more fragmentation.
- Reduce the overhead of allocated objects
- There are several sources of overhead for heap-allocated objects:
- User overhead — The application might request more heap memory than it really needs.
This often results from predictive algorithms, such as those used by realloc().
You can reduce this overhead by better estimating the average data size. To do this for a particular
call chain, examine the related allocation backtraces in the Memory Backtrace view.
Or, if your data model allows it, truncate the memory to fit into the actual size of the object,
after the data growth stops.
- Padding overhead — In programs that run on processors with alignment restrictions,
the fields in a struct type can get arranged in a way that makes the overall size of
the structure larger than the sum of the sizes of its individual fields.
You can save some space by rearranging the fields; usually, it's better to put fields of the same type
together. You can measure the result by writing a sizeof test. Typically, this task
is valuable when the resulting overall size matches a preallocated band size (see below).
- Block overhead — Sometimes there's extra space in heap blocks because the memory allocated is more
than what's requested. In the Memory Analysis results, the Memory Events view shows
the requested versus actual allocation sizes and the
Usage tab shows
what percentage of the heap is overhead (extra space). Whenever possible,
choose an allocation size that matches a size for preallocated bands (you can see their sizes in
the Bands tab),
especially for realloc() calls. Also, if you can, try to align data structures with
these band sizes.
- Tune the allocator
- Occasionally, application-driven data structures have fixed sizes and you can improve memory efficiency by
customizing the allocated block sizes. Or, your application may experience free blocks overhead,
when a lot of memory has been freed by the code but the process hasn't returned many pages.
This happens if the process doesn't reach the low watermark on heap usage, which causes it to return some
pages. In these two cases, you must either write your own allocator or contact QNX Software Systems to obtain
a customizable allocator.
- To estimate the benefits of custom block sizes, configure Memory Analysis to report the allocation counts
for the appropriate size ranges, by setting the Bins counters field in the
Memory Snapshots
controls. Then, examine the Bins tab in the analysis results to see the distribution of heap objects
within the bins (size ranges) that you specified.
Code optimizations
In embedded systems, it's very important to optimize the size of an executable or library binary
because it uses not only RAM memory but expensive flash memory. You can use the following techniques:
- Ensure that the binary file is compiled without debug information when you measure it.
Debug information is the largest contributor to file size.
- Strip the binary to remove any remaining symbol information.
- Remove any unused functions.
- Find and eliminate code clones.
- Try setting compiler optimization flags (e.g., -O, -O2).
Note that there is no guarantee that the code will be smaller; it can actually be larger in some cases.
- Don't use the char type to perform int arithmetics, particularly for local variables.
Converting between these types requires the compiler to insert code, which affects performance and code size,
especially on ARM processors.
- Bit fields are also very expensive in arithmetics on all platforms; it's better to use bit arithmetics explicitly
to avoid hidden costs of conversions.
Data optimizations
Static memory can produce significant overhead, similar to heap or stack memory.
You can take some steps to reduce the size of an application's data segments:
- Inspect global arrays that consume a lot of static memory. It may be better to use the heap, particularly for
objects that aren't used throughout the program's entire lifetime.
- Find and remove unused global variables.
- Determine if any structures have padding overhead.
If so, consider rearranging their fields to achieve a smaller overall size.
Stack optimizations
Sometimes, it's worth the effort to optimize the stack. For example, your application may have frequent high peaks
in stack activity, meaning that large stack segments constantly get mapped to physical memory.
These situations can be hard to detect through conventional testing. Although the program might run properly during
testing, the system could fail in the field, likely when it's busiest and needed the most.
You can watch the Memory Information view for stack allocation statistics and then locate and fix code that
uses the stack heavily. Typically, heavy stack usage occurs in two situations: recursive calls, which should be avoided in
embedded systems, and usage of many large local variables, such as arrays kept on the stack.