Posted inInformation Technology

C++ – Defaults and Deletes – Pt. 2

Before we go much further with this discussion on Defaults and Deletes, I need you to read the section titled Near Heap in this blog post. It’s a short read with pictures. Then you need to go thank the fools . . . err . . . “learned academics”, cough cough, who re-purposed the word heap to be a binary tree data structure.

To understand why things are the way they are, I need you to understand some things you would have learned had you actually taken an Assembly Language class. I also need you to, at least temporarily, ignore what the . . . “learned academics” used the name heap for. Adding insults to your injuries you will find many on-line experts and official C++ documentation sites using the phrases stack for near heap and heap for far heap.

PSECT and DSECT

On real computers with real operating systems when you LINK objects into an executable you can add a /MAP (syntax varies) qualifier to the LINK command. Besides producing the executable binary it would also output a .MAP file.

Link map of simple COBOL program

Let me show you one for a BASIC language program as well.

Link map of simple BASIC program

I’m not going to drag you through the attribute details. Honestly I don’t remember the difference between NOPIC and PIC off the top of my head. Each PSECT has a name. Some platforms physically have DSECT terminology and others identify them as a PSECT with WRT, NOEXE attributes. (WRT = Write).

C program link map

This is one of the main distinctions between a real operating system and a home hobby platform. When you are running a program on a real operating system as a mere mortal user and your un-initialized pointer tries to write a block of text into a NWRT PSECT owned by your process the OS kills your process without a second though. On most x86 platforms that doesn’t happen.

Stack

In case it isn’t clear, PSECT is short for Program Section and DSECT is short for Data Section. In the DOS and GUI DOS (Windows) worlds where you set stack size at either compile or link time, you create a DSECT of that size which gets bound into your executable. During the early years that wasn’t just a loader value but a DSECT physically linked into your executable. If you needed 30K of stack, your executable was going to be well over 30K in size. I do not know if Microsoft has since moved this to be a loader function. I also don’t know if Windows has managed to set user process level memory limits, kind of doubt it though.

Linux doesn’t create a stack DSECT in the executable, at least not like the above. Instead it has ulimit. There tends to also be a limits.conf file which has the defaults for boot.

Stack Based Machine vs. Stack Based Language

Here is a great book on stack and other computers. I want to quote one short paragraph from it.

Three different approaches to computer design are used as reference points for this chapter. The first reference point is that of the Complex Instruction Set Computer (CISC), which is typified by Digital Equipment Corporation’s VAX series and any of the microprocessors used in personal computers (e.g. 680×0, 80×86). The second reference point is the Reduced Instruction Set Computer (RISC) (Patterson 1985) as typified by the Berkeley RISC project (Sequin & Patterson 1982) and the Stanford MIPS project (Hennesy 1984). The third reference point is that of stack machines as described in the preceding chapters.

Philip Koopman, 1989

Despite all of the claims of RISC computers from back in the day, pretty much every computer you work on today is CISC despite it’s claims of being RISC. They will have both registers and a program execution stack but they are not considered stack based machines by purists. I don’t wish to go down that rabbit hole. You really should give a quick read of my post titled The Closed Question now though.

Yes, the hardware and the operating system will mandate some use of the stack, but languages define how they use the stack. Pascal uses it differently that C and COBOL uses it differently than both of them. In the COBOL map above you see multiple DSECTs (NOSHR, NOEXE, WRT) declared. For BASIC and C you don’t find them because I didn’t declare any global or static data.

#include <stdio>
#include <stdlib>
#include <limits>
#include <time>

int main( int *args, char *argsv[])
{

    time_t  the_time;
    unsigned long   u_l_x;

    long        l_x, l_time;

    the_time = ULONG_MAX;
    u_l_x = the_time / 60u; // how many minutes
    u_l_x /= 60u;       // how many hours
    u_l_x /= 24u;       // how many days
    u_l_x /= 365u;      // how many years
    u_l_x += 1970u;     // add to base year

    printf( "time_t will run out of precision ");
    printf( "around the year %u\n", u_l_x);


    //
    //  Some platforms had it as a long not unsigned long initially
    //
    l_time = LONG_MAX;
    l_x    = l_time / 60;
    l_x    /= 60;
    l_x    /= 24;
    l_x    /= 365;
    l_x    += 1970;

    printf( "the old signed long version will run out of ");
    printf( "precision around the year %d\n", l_x);

    return 1;

}
    

Stack Overflow

You all think StackOverflow is just a Web site. It was, and still is, a constant problem on Microsoft operating systems. Under DOS, where we had Compact, Small, Medium, and Large memory models limiting the amount of stack we could have we did constant battle with it.

The stack is the near heap. When you have something like the following code snippet

void c_zill_browse_sub( int *fms_status, int *rms_status, 
                        int *tca_array, int *workspace_array)
{
    int                     l_x, l_load_direction;
    int                     l_action, l_y;
    char                    line_in[255], command_str[255];
    FILE                    *in_file;

    $DESCRIPTOR( command_str_desc, command_str);

    struct  drawing_record  m_z;
    struct  FAB             mega_fab;
    struct  RAB             mega_rab;
    struct  XABKEY          mega_xab;

    struct  browse_screen_struct    screen_rec[L_BROWSE_SCREEN_COUNT];

    $DESCRIPTOR( form_name_desc, "ZILL_BROWSE");


    //
    //  attach to the mega file
    //
    l_x = open_mega_idx( &m_z, &mega_fab, 
                         &mega_rab, &mega_xab, FAB$M_GET);

every one of those “local” variables is allocated on the stack (near heap). On Linux systems that is generally a pretty big value. On GUI DOS systems it is still bound into the executable. You can’t quick change a system setting that will let a program dying from Stack Overflow actually succeed.

I don’t really like this image from Wikipedia but it is a start. On a real computer with a real operating system the stack gets placed at the highest memory address your user process is allowed to utilize. Windows doesn’t have good user/process management so it has you create a stack size in the executable and it parks that somewhere in memory where writing just a tiny bit past it will cause a General Protection Fault.

Don’t get hung up on the lower 3 names. They are DSECTs. Conceptually all of the constants are loaded first. (They called that “text” but it is any constant.) Next are the DSECTS for writable global/static data compiled into the binary. Some will have values, some won’t. Your program’s executable binary tends to be loaded before the constant DSECTS. PSECTS before DSECTS. Doesn’t matter if your platform makes that true or not, it is how you need to remember it.

Recursion Really Burns You

Let me steal a code sample from w3schools.

int sum(int k) {
  if (k > 0) {
    return k + sum(k - 1);
  } else {
    return 0;
  }
}

int main() {
  int result = sum(10);
  cout << result;
  return 0;
}

This is the recursion example found at the link. Now imagine instead of being a tiny little function consuming only a few integers worth of storage on the stack that you had to allocate a whole bunch of data like the earlier code example. Easy to see how you could pop past the end of the stack. Those of you on a Windows machine with severely limited memory should try compiling that example, changing 10 to be a very large integer or shrinking the stack size really low.

Special DSECTs and RTOS

You can have a special DSECT in your program on any OS today.

DSECT declaration in MACRO-32

That’s the beginning of one being declared in MACRO-32. With C/C++ they will generally be created whenever you have global or static data. There are other ways to force their creation but we don’t need to go into platform specific things. If you are programming on an RTOS (Real Time Operating System) where are no dynamic memory allocations you have to allocate a stack/DSECT at load time. You, then have to parcel it out a memory block at a time.

What About that Stack Portion?

I like this image from Citizendium.

Every function/method call (including main) has at least a saved FP (Frame Pointer). It may also have some parameters. After saving the Frame Pointer the local variables are created at the “top” of the stack. I’m not going to take you deep here. Traditionally Register Zero (R0) is used for any integer return values, other return values, like a std::string have to be returned via other means.

How Does This Relate to Defaults and Deletes?

I’m glad you asked.

That Frame Pointer is reset to the saved FP value and the Stack Pointer (SP) is changed to be the old FP value, conceptually. Compiler developers and hardware geeks are rounding up torches and pitchforks for my telling you that, but as a developer you need to understand it this way.

The Default Destructor doesn’t do anything.

Returning from the function changes the SP and FP but the chunk of memory that held parameters, return address, and locals is still there and still has its values.

std::string *obscureBug( some parameters )
{
    std::string str {"My fellow Americans"};

    // some more code
    
    return (&str);
}

I love these bugs and it doesn’t matter if the above code compiles. This is the kind of bug Newbs introduce all the time. You return the address of a local variable. Depending on your compiler, this bug could exist for decades. Won’t be found until someone inserts a method/function call between the return from this function and use of its result. When yo look at the stack image above, you begin to understand why.

The data was just left lying around.

The destructor for std::string isn’t required to nuke everything when it goes out of scope. Exactly when the destructor gets called is rather fuzzy too. Let’s look at the official language.

You will notice there are no hard/fast rules stating “immediately, halt all other execution until complete.” A destructor’s only obligation is to free allocated resources. There is no scorched Earth requirement.

Happens in Plain Old C as Well

Fine, let’s take classes out of the example.

char *obscureBugToo( some parameters)
{
    char buffer[2048];

    // some code

    return (buffer);
}

Just about every Newb developer writing their first Serial Port or other random streaming type input function makes exactly this mistake. They return a pointer to a locally allocated buffer. “Works” for years until someone inserts another function call between getting the pointer return value and its first use. That function has parameters and local variables that now overwrite that unused portion of stack. Everybody points a finger at the dude who inserted the function call because “This has worked for years!”

Where This Really Matters

I don’t care if this code compiles or not. Yes, I hate it when people put code in a header file as well.

class WhizzyPuffle
{
public:
    WhizzyPuffle::WhizzyPuffle( int a, double b, char *buffer) :
        m_a( a),
        m_b( b),
        m_buffer( buffer) {};
    
    WhizzyPuffle::WhizzyPuffle() { m_buffer = new char[2048];};

private:
    int m_a {23};
    double m_b {14.56};
    char *m_buffer {};
};

If your compiler creates a “default deconstructor” it won’t free that buffer because the default deconstructor does nothing. When we allocated a new character buffer it was in the far heap. The local object containing the pointer to the allocation is on the near heap which kids today are calling the stack.

Please scroll back to the image from Citizendium. Notice the small print to the far right of parameters and locals. That (FP + K) and (FP – K) stuff. The near heap is accessed via the Frame Pointer. You cannot negative reference the Stack Pointer but you can the Frame Pointer.

Our next installment will more fully flesh out the code for the example application. You need to understand the near and far heap to be an effective C/C++ programmer. What does and doesn’t happen with default constructors and destructors should make more sense to you now.

Roland Hughes started his IT career in the early 1980s. He quickly became a consultant and president of Logikal Solutions, a software consulting firm specializing in OpenVMS application and C++/Qt touchscreen/embedded Linux development. Early in his career he became involved in what is now called cross platform development. Given the dearth of useful books on the subject he ventured into the world of professional author in 1995 writing the first of the "Zinc It!" book series for John Gordon Burke Publisher, Inc.

A decade later he released a massive (nearly 800 pages) tome "The Minimum You Need to Know to Be an OpenVMS Application Developer" which tried to encapsulate the essential skills gained over what was nearly a 20 year career at that point. From there "The Minimum You Need to Know" book series was born.

Three years later he wrote his first novel "Infinite Exposure" which got much notice from people involved in the banking and financial security worlds. Some of the attacks predicted in that book have since come to pass. While it was not originally intended to be a trilogy, it became the first book of "The Earth That Was" trilogy:
Infinite Exposure
Lesedi - The Greatest Lie Ever Told
John Smith - Last Known Survivor of the Microsoft Wars

When he is not consulting Roland Hughes posts about technology and sometimes politics on his blog. He also has regularly scheduled Sunday posts appearing on the Interesting Authors blog.