C++ Lookup Mysteries

Sven Rosvall


One day, my friend Tommy asked me why his C++ code failed. He wanted to print out a number of objects (of his own class) to a stream. It worked well with a plain for-loop and an output operator (<<), so he knew that his output operator for the class worked as intended. But when he used std::copy() and std::ostream_iterator it failed. He wanted to “go STL” because everyone, myself included, was telling him how great the STL is.

It took us a while to figure out what was wrong and it brought us down the dark sides of the inner workings of C++. It was an interesting experience and one that I would like to share.

This article investigates function lookup in C++ and also contains a suggestion of what to do when you want to use several different output formats and still use output operators and STL.

The Code

Tommy used a class developed for a toolbox. This toolbox was declared inside a namespace, following project guidelines to avoid name collisions. Namespaces were considered good and were used a lot throughout the project.

Tommy followed the guidelines and put the toolbox client code in a different namespace. In here he had a need to print out objects of this new class stored in a container. He wrote a function that iterates over the container and an output operator for this purpose. His code was something like this:

namespace Client {
  std::ostream & operator<<(std::ostream &os, 
                   Tools::Spanner const & s) {
    os << "Spanner{ID=" << s.getID() 
       << ", gapSize=" << s.getGapSize() 
       << "}";
    return os;
  }

  void printSpanners(std::ostream & os, 
                  Tools::Toolbox const & tb) {
    for(Tools::SpannerCollection
             ::const_iterator sit
                   = tb.getSpanners().begin();
        sit != tb.getSpanners().end();
        ++sit) {
      os << *sit << "\n";
    }
  }
}

This code worked nicely. He then introduced some STL-isms and rewrote the printing function to use std::copy() and std::ostream_iterator. These functions are often together in C++ books to show the power and flexibility of the STL. An std::ostream_iterator is an output iterator and is used with algorithms in the same way as any other output iterator. When an object is assigned to a dereferenced std::ostream_iterator, this object is written to the output stream that the std::ostream_iterator was constructed with, using an output operator defined for that object. The std::ostream_iterator is specialised with a type of the objects it shall print out. The constructor of std::ostream_iterator can also take an optional second parameter that will be used as separator string between the printed objects. Every time an object is assigned through an std::ostream_iterator, that object is printed to the std::ostream object using the output operator.

The rewritten output operator code looked something like this:

namespace A {
  void f(int);
  void g(int);
  
  namespace B {
    void f(double);      // hides A::f(int)
    void g(const char*); // hides A::g(int)
    void caller() {
      f(1); // calls A::B::f(double)
      g(1); // error: cannot convert '1' 
            // to a 'const char*'
    }
  }
}

In this example we see that A::B::f(double) hides A::f(int) and is thus the only function considered in the first call. The int argument can be converted to double so this call is legal.

In the same way, A::B::g(const char*) hides A::g(int). But the int argument in the second call cannot be converted to a pointer and the call is illegal.

Note that A::g(int) is not considered at all, even though A::B::g(const char*) cannot be used in the call.

After searching the current and enclosing namespaces, any functions with the same name are searched in namespaces associated with the types of the arguments to the function. This second part is called argument-dependent-lookup (a.k.a. ADL or Koenig-lookup). Consider:

class X {};
void f(const X &);

namespace A {
  class Y : public X {};
  void f(const Y &);
}

void caller() {
  A::Y y;
  f(y);  // calling A::f(const Y &);
}

Both functions f(const X &) and A::f(const Y &) are found by the lookup rules and considered for overload resolution. f(const X &) is found by looking at the nearest namespace and A::f(const Y &) is found using argument-dependent-lookup.

The argument y has a type defined in namespace A where the function A::f(const Y &) is found. The overload resolution rule looks at both functions and chooses A::f(const Y &) as a better match.

So, in the function printSpanners(), using the for loop, we find the output operator in the same namespace (Client). If the output operator was declared in the global namespace instead, we would find it there, unless there were other output operators in the namespace Client. The namespace Tools would also be looked at as the argument type Spanner is declared there, but there are no output operators there.

The problem for Tommy is that when std::copy() is used, the first stage of the search starts in the namespace std, and not in namespace Client. This is because the call to the output operator is from within the function body of std::copy(). Namespace std has a number of output operators as defined in the C++ standard in order to facilitate formatted output of any built-in type and some types defined in the C++ library. It doesn’t matter that none of these overloaded output operators can be used with Spanner. The lookup rule says that we find the function in the nearest enclosing namespace and stop. The output operator defined in the namespace Client is not considered at all as this namespace is not an enclosing namespace of namespace std. The compiler won’t even find the output operator if it was defined in the global namespace as it had already found some output operators in namespace std, its nearest namespace.

Had Tommy declared the output operator in the same namespace as the class (namespace Tools), he would have avoided this problem as the second rule (ADL) would have found it. It can be seen as part of the interface of the class and should be declared close to the class itself, preferably in the same header file. This is fine if you have control over the header file. It does not work if the header is part of a third party library. As a workaround it is possible to put the declaration in any header file by re-opening the namespace like this:

namespace Tools {
  std::ostream & operator<<(std::ostream & os, 
                          Spanner const & s) {
    ...
  }
}

The Real Problem

So what was Tommy trying to do? Why was the output operator declared in the Client namespace and not in the Tools namespace where it belongs? Tommy said that he could have added the output operator in the Tools namespace, but he wanted different output formats for different client applications. He couldn’t place the output operators beside the Spanner class definition as you can only have one of them in the same namespace. There is no way to overload two output operators with another parameter. For his project it was easy to use namespaces to separate the output operators as no client in the same namespace would use more than one format.

A Solution to the Real Problem

So how can we make a design where we can have different output formats? How can we use these formats using output operators? And how can we make a design that will work when we use std::copy() and std::ostream_iterator?

To start this off, we want some way to select different formats when a Spanner object is printed to a stream. Possibly you could derive from Spanner and then overload the output operator on these derived classes. Not a very nice design and it won’t work as you cannot downcast a Spanner object to the derived class.

A simpler approach is to use different named functions that do the formatting. We want a simple syntax such as:

std::cout << spanner.printNameAndGap()
          << std::endl;

This can be implemented by letting the member function printNameAndGap() return a string in the format we want. Nice and simple. Except that it is not always possible to add things to the class we are using, for example third party libraries. Here, the formatting belongs to the user, not to the class itself. The class designer does not know what format all clients can possibly want to use. This approach is also inefficient, as a temporary string has to be created.

Instead we want to use a non-member function and we want writes made directly to the output stream. This function can return an object of a class that can be used with an overloaded output operator. To make it easy, we use the constructor of this formatting class instead of a separate function.

class PrintSpannerNameAndGap {
public:
  PrintSpannerNameAndGap(Spanner const & s) 
      : m_s(s) {}
  void print(std::ostream & os) const {
    os << "Spanner{ID=" << s.getID() 
       << ", gapSize="
       << s.getGapSize() 
       << "}";
    return os;
  }
private:
  Spanner const & m_s;
};

std::ostream & operator<<(std::ostream & os,
                 PrintSpannerNameAndGap
                           const & spanner) {
  spanner.print(os);
  return os;
}

We can now use this class like this:

std::cout << PrintSpannerNameAndGap(spanner)
          << std::endl;

This does not look too bad. Just watch out for that member reference to the original object. The PrintSpannerNameAndGap object must not exist longer than the referenced Spanner object. This is not a problem when it is used as shown above as it only exists as a temporary object and disappears at the end of the statement.

Using std::copy() and std::ostream_iterator

std::copy() is nice but it is not possible to insert a formatting object in the way shown above. We have to look at other ways to indicate that we want different output.

If we look at the line using std::copy() and std::ostream_iterator there aren’t many opportunities for modification. We could adapt the source iterators (the begin/end pair) to return a different object when dereferenced, and define an output operator for each different object type. The mechanism for choosing the correct overloaded output operator would be similar to the approach above.

But there is no need to create the iterator adaptor. We only have to specify to the std::ostream_iterator that it shall work with PrintSpannerNameAndGap objects. This makes the code much simpler:

std::copy(tb.getSpanners().begin(), 
          tb.getSpanners().end(), 
          std::ostream_iterator<
                 PrintSpannerNameAndGap>(
                                os, "\n"));

PrintSpannerNameAndGap is the same class as above. As the std::ostream_iterator requires a PrintSpannerNameAndGap, then the Spanner objects returned by the source iterators are implicitly converted to PrintSpannerNameAndGap objects. This works because we did not make the constructor explicit. The PrintSpannerNameAndGap object is a temporary object and is deleted after the assignment statement in std::copy() has completed. The PrintSpannerNameAndGap object holds a reference to the Spanner object coming from the iterators to avoid unnecessary copying. This reference is OK as the PrintSpannerNameAndGap object has shorter lifetime than the Spanner object.

Possible Improvements

We could use templates to reduce the amount of boilerplate code. But introducing templates does not reduce enough code to motivate the extra complexity.

Another approach would be to let the formatting class PrintSpannerNameAndGap inherit from a base class that is used by all classes supporting different output formats. This base class would keep the reference to Spanner and declare the function print() pure virtual. A single output operator definition for this base class replaces all specific output operators. This only pays off when there are many different output formats for the same object type.

A specific functor object can be used with the C++ library algorithm std::for_each() to print out each element. Initialise the functor with the output stream and define an operator()(Spanner const &) that prints each object to the output stream in the required format.

Conclusion

It is not always easy to understand what happens under the hood in C++. But there are solutions to every problem even if good understanding of C++ may be required. Don’t be afraid of asking friends or other ACCU members for advice.

Acknowledgements

Thanks to Tommy Persson who had the problem originally and spent time describing the problem to me, to Richard Corden for clarifying the C++ standard and to Thaddaeus Frogley for reviewing.

[The name lookup rules in C++ are not only confusing for the non-expert, they present serious problems to the expert. One such expert (Dave Abrahams) has presented these problems, and proposed a solution, for consideration by the standards committee: http://www.boost-consulting.com/writing/n1691.html – Alan (ed)]