C++ references, continued

So I got some feedback about my last C++ post. The comment states that references are not pointers, they are just names for another object.

Sorry for reopening a topic after nearly 6 months. But I cannot stay silent.
I think you got it wrong. Completely.
Although a reference might behave like "some sort" of a pointer, it is *not* a pointer. Your statement: "A reference is effectively a pointer, but this is hidden by the language." is completely wrong.
To quote the C++ standard: "A reference is an alterantive name for an object." It is just a new name for something that you’ve defined elsewhere. That’s the very reason why it cannot be null –> You cannot have an alternative name for an object that you do not have yet.

--Willi Burkhardt

Great, in theory. Unfortunately, none of the compilers I have used treat references as anything other than pointers. References are, on some level, supposed to guarantee non-null-ness as well as that they reference a valid object. This is not true in any compiler I have ever used.

Take this example (see it run):

#include <iostream>

static int const a_const = 5;

int const& A() {
    return a_const;
}

static int const* b_ptr = 0;

int const& B() {
    return *b_ptr;
}

int main() {
    int const& a_ref = A();

    std::cout << "Called A()" << std::endl;
    std::cout << "a_ref: " << a_ref << std::endl;

    int const& b_ref = B();

    std::cout << "Called B()" << std::endl;
    std::cout << "b_ref: " << b_ref << std::endl;

    return 0;
}

If we are to believe that references are simply another name for an object, then converting *b_ptr to a reference should have caused a runtime error. After all, we dereferenced a null pointer, right? The compiler should emit code to prevent this, right?

In an ideal world, this would cause an error -- but it does not. The segmentation fault does not come until b_ref is used; indeed, we see "Called B()" in the program output, indicating that B() successfully returned a reference, which was stored in b_ref. Obviously, at runtime there was a null pointer dereference. But we didn't use a pointer, I hear you saying. We used a reference!

Then please explain this behavior to me. On a language level, sure, references are "names for objects." But this does not change the fact that the implementation is done using memory addresses -- which is fundamentally the same thing pointers do. This helps to explain why we see the behavior of this sample. As I mentioned in my last post, when you convert an expression to a reference type, it's treated exactly as though you had converted it to a pointer type, with an implicit address-of operator (&). So we can rewrite this function:

int const& B() {
    return *b_ptr;
}

Like this:

int const* B() {
    return &*b_ptr;
}

And it becomes immediately clear why the segmentation fault did not occur here -- taking the address of a dereference expression is the same thing as taking the original expression. The & and * cancel out during compilation, and we just return the pointer. Take a look at this example, which is identical to the above example, except that A() is gone, and B() now returns a pointer, with dereferences added in the appropriate places (see it run):

#include <iostream>

static int const* b_ptr = 0;

int const* B() {
    return &*b_ptr;
}

int main() {
    int const* b_ptr = B();

    std::cout << "Called B()" << std::endl;
    std::cout << "b_ref: " << *b_ptr << std::endl;

    return 0;
}

Identical behavior.

So you can throw the spec at me all you want, but every implementation I've tried uses pointer-with-automatic-dereference semantics -- if you convert every reference to a pointer, add an address-of operator to every assignment to a reference, and a dereference operator to every use of a reference, you will see identical behavior.

To preempt the "but the compiler can optimize local references" argument, the compiler can do exactly the same with pointers.

// With a reference
int A() {
    int a = 5;
    int& b = a;
    return b;
}

// With a pointer
int B() {
    int a = 5;
    int* b = &a;
    return *b;
}

I've heard the argument that the compiler can eliminate the reference in A(). Well, it can also eliminate the pointer in B(). If a local pointer is set to point at another local and the compiler can prove that it will never change, it can optimize it away just as easily as it can optimize away a local reference to a local.

So, this supports my original argument that references store memory addresses in the same way that pointers do, only with automatic dereferencing. They are effectively nothing more than syntactic sugar, allowing you to forget that you're operating on an object somewhere else in memory.

Comments