Before we go much further with this discussion on Defaults and Deletes, I need you to read the section titled Near Heap in this blog post. It’s a short read with pictures. Then you need to go thank the fools . . . err . . . “learned academics”, cough cough, who re-purposed the word heap to be a binary tree data structure.
To understand why things are the way they are, I need you to understand some things you would have learned had you actually taken an Assembly Language class. I also need you to, at least temporarily, ignore what the . . . “learned academics” used the name heap for. Adding insults to your injuries you will find many on-line experts and official C++ documentation sites using the phrases stack for near heap and heap for far heap.
PSECT and DSECT
On real computers with real operating systems when you LINK objects into an executable you can add a /MAP (syntax varies) qualifier to the LINK command. Besides producing the executable binary it would also output a .MAP file.
Let me show you one for a BASIC language program as well.
I’m not going to drag you through the attribute details. Honestly I don’t remember the difference between NOPIC and PIC off the top of my head. Each PSECT has a name. Some platforms physically have DSECT terminology and others identify them as a PSECT with WRT, NOEXE attributes. (WRT = Write).
This is one of the main distinctions between a real operating system and a home hobby platform. When you are running a program on a real operating system as a mere mortal user and your un-initialized pointer tries to write a block of text into a NWRT PSECT owned by your process the OS kills your process without a second though. On most x86 platforms that doesn’t happen.
Stack
In case it isn’t clear, PSECT is short for Program Section and DSECT is short for Data Section. In the DOS and GUI DOS (Windows) worlds where you set stack size at either compile or link time, you create a DSECT of that size which gets bound into your executable. During the early years that wasn’t just a loader value but a DSECT physically linked into your executable. If you needed 30K of stack, your executable was going to be well over 30K in size. I do not know if Microsoft has since moved this to be a loader function. I also don’t know if Windows has managed to set user process level memory limits, kind of doubt it though.
Linux doesn’t create a stack DSECT in the executable, at least not like the above. Instead it has ulimit. There tends to also be a limits.conf file which has the defaults for boot.
Stack Based Machine vs. Stack Based Language
Here is a great book on stack and other computers. I want to quote one short paragraph from it.
Three different approaches to computer design are used as reference points for this chapter. The first reference point is that of the Complex Instruction Set Computer (CISC), which is typified by Digital Equipment Corporation’s VAX series and any of the microprocessors used in personal computers (e.g. 680×0, 80×86). The second reference point is the Reduced Instruction Set Computer (RISC) (Patterson 1985) as typified by the Berkeley RISC project (Sequin & Patterson 1982) and the Stanford MIPS project (Hennesy 1984). The third reference point is that of stack machines as described in the preceding chapters.
Philip Koopman, 1989
Despite all of the claims of RISC computers from back in the day, pretty much every computer you work on today is CISC despite it’s claims of being RISC. They will have both registers and a program execution stack but they are not considered stack based machines by purists. I don’t wish to go down that rabbit hole. You really should give a quick read of my post titled The Closed Question now though.
Yes, the hardware and the operating system will mandate some use of the stack, but languages define how they use the stack. Pascal uses it differently that C and COBOL uses it differently than both of them. In the COBOL map above you see multiple DSECTs (NOSHR, NOEXE, WRT) declared. For BASIC and C you don’t find them because I didn’t declare any global or static data.
#include <stdio>
#include <stdlib>
#include <limits>
#include <time>
int main( int *args, char *argsv[])
{
time_t the_time;
unsigned long u_l_x;
long l_x, l_time;
the_time = ULONG_MAX;
u_l_x = the_time / 60u; // how many minutes
u_l_x /= 60u; // how many hours
u_l_x /= 24u; // how many days
u_l_x /= 365u; // how many years
u_l_x += 1970u; // add to base year
printf( "time_t will run out of precision ");
printf( "around the year %u\n", u_l_x);
//
// Some platforms had it as a long not unsigned long initially
//
l_time = LONG_MAX;
l_x = l_time / 60;
l_x /= 60;
l_x /= 24;
l_x /= 365;
l_x += 1970;
printf( "the old signed long version will run out of ");
printf( "precision around the year %d\n", l_x);
return 1;
}
Stack Overflow
You all think StackOverflow is just a Web site. It was, and still is, a constant problem on Microsoft operating systems. Under DOS, where we had Compact, Small, Medium, and Large memory models limiting the amount of stack we could have we did constant battle with it.
The stack is the near heap. When you have something like the following code snippet
void c_zill_browse_sub( int *fms_status, int *rms_status,
int *tca_array, int *workspace_array)
{
int l_x, l_load_direction;
int l_action, l_y;
char line_in[255], command_str[255];
FILE *in_file;
$DESCRIPTOR( command_str_desc, command_str);
struct drawing_record m_z;
struct FAB mega_fab;
struct RAB mega_rab;
struct XABKEY mega_xab;
struct browse_screen_struct screen_rec[L_BROWSE_SCREEN_COUNT];
$DESCRIPTOR( form_name_desc, "ZILL_BROWSE");
//
// attach to the mega file
//
l_x = open_mega_idx( &m_z, &mega_fab,
&mega_rab, &mega_xab, FAB$M_GET);
every one of those “local” variables is allocated on the stack (near heap). On Linux systems that is generally a pretty big value. On GUI DOS systems it is still bound into the executable. You can’t quick change a system setting that will let a program dying from Stack Overflow actually succeed.
I don’t really like this image from Wikipedia but it is a start. On a real computer with a real operating system the stack gets placed at the highest memory address your user process is allowed to utilize. Windows doesn’t have good user/process management so it has you create a stack size in the executable and it parks that somewhere in memory where writing just a tiny bit past it will cause a General Protection Fault.
Don’t get hung up on the lower 3 names. They are DSECTs. Conceptually all of the constants are loaded first. (They called that “text” but it is any constant.) Next are the DSECTS for writable global/static data compiled into the binary. Some will have values, some won’t. Your program’s executable binary tends to be loaded before the constant DSECTS. PSECTS before DSECTS. Doesn’t matter if your platform makes that true or not, it is how you need to remember it.
Recursion Really Burns You
Let me steal a code sample from w3schools.
int sum(int k) {
if (k > 0) {
return k + sum(k - 1);
} else {
return 0;
}
}
int main() {
int result = sum(10);
cout << result;
return 0;
}
This is the recursion example found at the link. Now imagine instead of being a tiny little function consuming only a few integers worth of storage on the stack that you had to allocate a whole bunch of data like the earlier code example. Easy to see how you could pop past the end of the stack. Those of you on a Windows machine with severely limited memory should try compiling that example, changing 10 to be a very large integer or shrinking the stack size really low.
Special DSECTs and RTOS
You can have a special DSECT in your program on any OS today.
That’s the beginning of one being declared in MACRO-32. With C/C++ they will generally be created whenever you have global or static data. There are other ways to force their creation but we don’t need to go into platform specific things. If you are programming on an RTOS (Real Time Operating System) where are no dynamic memory allocations you have to allocate a stack/DSECT at load time. You, then have to parcel it out a memory block at a time.
What About that Stack Portion?
I like this image from Citizendium.
Every function/method call (including main) has at least a saved FP (Frame Pointer). It may also have some parameters. After saving the Frame Pointer the local variables are created at the “top” of the stack. I’m not going to take you deep here. Traditionally Register Zero (R0) is used for any integer return values, other return values, like a std::string have to be returned via other means.
How Does This Relate to Defaults and Deletes?
I’m glad you asked.
That Frame Pointer is reset to the saved FP value and the Stack Pointer (SP) is changed to be the old FP value, conceptually. Compiler developers and hardware geeks are rounding up torches and pitchforks for my telling you that, but as a developer you need to understand it this way.
The Default Destructor doesn’t do anything.
Returning from the function changes the SP and FP but the chunk of memory that held parameters, return address, and locals is still there and still has its values.
std::string *obscureBug( some parameters )
{
std::string str {"My fellow Americans"};
// some more code
return (&str);
}
I love these bugs and it doesn’t matter if the above code compiles. This is the kind of bug Newbs introduce all the time. You return the address of a local variable. Depending on your compiler, this bug could exist for decades. Won’t be found until someone inserts a method/function call between the return from this function and use of its result. When yo look at the stack image above, you begin to understand why.
The data was just left lying around.
The destructor for std::string isn’t required to nuke everything when it goes out of scope. Exactly when the destructor gets called is rather fuzzy too. Let’s look at the official language.
You will notice there are no hard/fast rules stating “immediately, halt all other execution until complete.” A destructor’s only obligation is to free allocated resources. There is no scorched Earth requirement.
Happens in Plain Old C as Well
Fine, let’s take classes out of the example.
char *obscureBugToo( some parameters)
{
char buffer[2048];
// some code
return (buffer);
}
Just about every Newb developer writing their first Serial Port or other random streaming type input function makes exactly this mistake. They return a pointer to a locally allocated buffer. “Works” for years until someone inserts another function call between getting the pointer return value and its first use. That function has parameters and local variables that now overwrite that unused portion of stack. Everybody points a finger at the dude who inserted the function call because “This has worked for years!”
Where This Really Matters
I don’t care if this code compiles or not. Yes, I hate it when people put code in a header file as well.
class WhizzyPuffle
{
public:
WhizzyPuffle::WhizzyPuffle( int a, double b, char *buffer) :
m_a( a),
m_b( b),
m_buffer( buffer) {};
WhizzyPuffle::WhizzyPuffle() { m_buffer = new char[2048];};
private:
int m_a {23};
double m_b {14.56};
char *m_buffer {};
};
If your compiler creates a “default deconstructor” it won’t free that buffer because the default deconstructor does nothing. When we allocated a new character buffer it was in the far heap. The local object containing the pointer to the allocation is on the near heap which kids today are calling the stack.
Please scroll back to the image from Citizendium. Notice the small print to the far right of parameters and locals. That (FP + K) and (FP – K) stuff. The near heap is accessed via the Frame Pointer. You cannot negative reference the Stack Pointer but you can the Frame Pointer.
Our next installment will more fully flesh out the code for the example application. You need to understand the near and far heap to be an effective C/C++ programmer. What does and doesn’t happen with default constructors and destructors should make more sense to you now.