Posted inExperience / Information Technology

The Myth of Smart Pointers

Yad or Torah Pointer

On page 206 of Beginning C++17 ISBN: 9781484233665 published in 2018 you will find the following about smart pointers.

Always use either the std::vector<> container (to replace dynamic arrays) or a smart pointer (to dynamically allocate objects and manage their lifetimes). These high-level alternatives are much, much safer than the low-level memory management primitives and will help you tremendously . . .

Quote from the book

While using std::vector<> is usually good, the opinion expressed on smart pointers is just that, an opinion, and it is baseless. Repeating it is damning generations of C++ programmers to tracking down random crashes.

QML Programs Been Crashing From This for Years

You really should read this post before continuing. In fact I’m going to include the drunk driving across all lanes image again here.

drunk driving across all 3 lanes

You create an object with unique_ptr<T> in C++ and need to use it in some QML screen, say for a flickable list. QML, being one of the most incapable languages ever created can’t deal with it, so it has to hand it off to JavaScript. They don’t copy the object, they copy the address.

The C++ lane has no idea the other lanes are still using the thing. It determines C++ no longer has any need for it and nukes it. No good way of predicting when this type of garbage collection will happen in your application. Later on QML decides it needs to refresh the display of that flickable list because some pixel might have changed. It asks JavaScript for the data to be re-rendered. Oops!

Depending on your OS, that memory might still be owned by your process, so you get garbage instead of a crash. Normally the OS has claimed it and you get an access violation.

Oh, I Just Won’t Use QML

While that is admirable, it doesn’t solve your problem.

Do you know the programming language used to write every library you use in your program? Was it C? C doesn’t have unique_ptr<> that’s a C++ thing. The Linux kernel and many of the libraries you link with are written in C.

Most languages supported by one of the GNU compilers can generate libraries with functions using the C calling standard. On OpenVMS we had the DEC calling standard an all languages used it. We routinely had COBOL, BASIC, FORTRAN, and C modules working together in the same executable. It does not matter if you call it an object in C++, to most other languages you are simply passing the address of a buffer.

But You Can’t Copy a unique_ptr<>

That’s a steaming pile of excrement.

The standard states you cannot make to unique_ptr<> with the same pointer value within the same compilation unit. This also isn’t true. I’m on the latest version of LinuxMint with all updates applied.

#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <cstring>
#include <string>

#include "config.h"

//*********
//	Yes you can copy a unique_ptr
//*********

// make this global to simulate data held by something else
char *BUFFER_PTR {};

void fun1( char *txt);
void fun2();

int main(int argc, char **argv) {
	std::cout << "UniquePtr1" << std::endl;
	std::cout << "Version " << UniquePtr1_VERSION_MAJOR << "." << UniquePtr1_VERSION_MINOR << std::endl;

	{
		// establish a buffer of nulls
		std::unique_ptr<char[]> uptr {new char [2048]};
		fun1( uptr.get());
		std::cout << uptr.get() << std::endl;
	}
	std::cout << "out of scope now\n";

	fun2();
	std::cout << BUFFER_PTR << std::endl;

	return (EXIT_SUCCESS);
}

// this simulates something a thread might do with a pointer.
//
void fun1( char *txt)
{
	BUFFER_PTR = txt;
	strcpy( BUFFER_PTR, "Mary had a little lamb");
#if 0
	std::unique_ptr<char[]> ghost(txt);
	strcpy( ghost.get(), "just some text");
#endif
}

// depending on how fast garbage collection happens with unique_ptr
// this should crash
//
void fun2()
{
	strcpy( BUFFER_PTR, "Rock-em Sock-em Robots");
}

I built this in Eclipse as one of their CMake projects which is why you see the config.h and initial std::cout lines. Run as-is and it “works.”

UniquePtr1
Version 0.1
Mary had a little lamb
out of scope now
Rock-em Sock-em Robots

If garbage collection was immediate, this should have crashed. The memory pointed to by BUFFER_PTR should have been freed and we should get an access violation copying to memory we don’t own. Now, let’s change #if 0 to #if 1

I kind of “not like” how Eclipse puts the exit/error at the top of the output window.

What Happened?

This is a case of garbage collection not being what you think. This is why every college worth even a fraction of its tuition makes you take an Assembly language class. They also make you take an operating systems class so you can understand different classes/types of Heap.

A cold allocation of RAM from the Far Heap, or Free Store, or System RAM, depending on what your OS calls it, is expensive. There are process level limits that have to be checked and other system level decisions before that chunk of RAM is assigned to your process. Every modern OS has a minimum allocation size, usually a memory Page.

Once allocated, unless you make an OS specific call to force de-allocation, that unit of RAM remains with your process. (Some Operating Systems will request available RAM back from running processes when they get desperate, but let’s not have that discussion now.)

Near Heap

Long ago, during the horrible days of Segment::Offset addressing, near and far heap meant something different. It had to do with the Segment portion of the address.

In the modern world this is the RAM already allocated to your process but unused. Some stack based programming languages put the stack out here so it can “grow.” Every compiler for every language I’ve ever worked with during modern times utilizes this and tends to refer to it as Near Heap. Far Heap (in today’s world) is memory you have to request from the operating system. That’s expensive per above, and it will add a minimum allocation unit of RAM to your process. The standard C/C++ run-time basically does this for you behind the scenes.

When you “free” RAM either via garbage collection or delete or free(), that gets added to your Near Heap. Your process still owns it. The next allocation may re-purpose it.

The point is, the Operating System gives you another allocation unit of RAM and the C/C++ run-time can merrily bust it up any way it wants. As long as we play in that page or whatever it was, we don’t get access violations.

Double Free

We got this error because the “compiler will only allow one unique_ptr per address” mantra is mostly kaka. Garbage collection is happening as soon as we go out of scope but our RAM was not returned to the Operating System. It was returned to the Near Heap. Memory our process has but isn’t using.

When line 43 is #if 1, the memory we allocated for the first unique_ptr is garbage collected when we exit fun1. We can still write to it because it hasn’t left our process. (Note: single core processors with DOS-like simple operating systems will get different results. Bare metal can’t use this code because we are doing screen I/O.)

The return statement at line 34 is when the initial unique_ptr tries to give up the same RAM and the run-time realizes the RAM has already been freed.

Is This Contrived?

No! I run into this exact problem in people’s code for medical devices. Someone thinks it is a good idea to use Smart Pointers then passes the value to library functions written in C and other languages. Usually these things are in their own threads. It can be days between the delete and the crash. What “causes” the crash per observational debugging won’t seem to have any obvious pattern.

When trying to track down a ghost like crash, nuke all of the Smart Pointers in the code. Make them raw pointers and delete them when you know it is safe. Prior to the manual delete, put some unique values out there prior to the deletion and be certain to null out the pointer you just deleted. The unique values make it a lot easier to track down which pointer was actually the culprit.

Roland Hughes started his IT career in the early 1980s. He quickly became a consultant and president of Logikal Solutions, a software consulting firm specializing in OpenVMS application and C++/Qt touchscreen/embedded Linux development. Early in his career he became involved in what is now called cross platform development. Given the dearth of useful books on the subject he ventured into the world of professional author in 1995 writing the first of the "Zinc It!" book series for John Gordon Burke Publisher, Inc.

A decade later he released a massive (nearly 800 pages) tome "The Minimum You Need to Know to Be an OpenVMS Application Developer" which tried to encapsulate the essential skills gained over what was nearly a 20 year career at that point. From there "The Minimum You Need to Know" book series was born.

Three years later he wrote his first novel "Infinite Exposure" which got much notice from people involved in the banking and financial security worlds. Some of the attacks predicted in that book have since come to pass. While it was not originally intended to be a trilogy, it became the first book of "The Earth That Was" trilogy:
Infinite Exposure
Lesedi - The Greatest Lie Ever Told
John Smith - Last Known Survivor of the Microsoft Wars

When he is not consulting Roland Hughes posts about technology and sometimes politics on his blog. He also has regularly scheduled Sunday posts appearing on the Interesting Authors blog.

2 thoughts on “The Myth of Smart Pointers

  1. I never use such “smart pointers”. Instead, I use “polymorphic objects”, which implement the “envelope/letter” idiom originally described by James Coplien. This idiom employs an “envelope” class that creates a “letter” class object having the same interface and forwards all operations to that letter object. Unfortunately due to the lack of language support in C++, this requires a fair amount of boilerplate code to implement the forwarding, but it provides the inestimable advantage that you don’t need to deal with pointers anywhere in the application code. All destruction of letter objects is of course handled in the destructor of the envelope object so there is no problem with dangling pointers. Since there is a user count in the letter object that is managed by the envelope object, copying the letter object is fast and non-hazardous.
    I think if we ever get “.” overloaded, this will be a lot easier to implement but the increase in reliability is worth it even with the extra effort.

    1. I’m not a fan of the envelop/letter paradigm. I’m a much bigger fan of global data in a singleton class. If, as you say, “copying the letter object is fast and non-hazardous” something seems horribly rotten with the implementation. If you have 5+ threads and 3 want to copy while 2 want to destroy all at the exact same time, you’ve got some nasty mutex (or other locking) to deal with. A global singleton can single thread access no matter how many threads tell it to do something.

      Sadly, most of this in memory stuff exists because too few students and academics learn how to properly use a relational database.

Comments are closed.