Sunday, May 28, 2023

Reversing C++ String And QString

After the rust string overview of its internal substructures, let's see if c++ QString storage is more light, but first we'r going to take a look to the c++ standard string object:



At first sight we can see the allocation and deallocation created by the clang++ compiler, and the DAT_00400d34 is the string.

If we use same algorithm than the rust code but in c++:



We have a different decompilation layout. Note that the Ghidra scans very fast the c++ binaries, and with rust binaries gets crazy for a while.
Locating main is also very simple in a c++ compiled binary, indeed is more  low-level than rust.


The byte array is initialized with a simply move instruction:
        00400c4b 48 b8 68        MOV        RAX,0x6f77206f6c6c6568

And basic_string generates the string, in the case of rust this was carazy endless set of calls, detected by ghidra as a runtime, but nevertheless the basic_string is an external imported function not included on the binary.

(gdb) x/x 0x7fffffffe1d0
0x7fffffffe1d0: 0xffffe1e0            low str ptr
0x7fffffffe1d4: 0x00007fff           hight str ptr
0x7fffffffe1d8: 0x0000000b        sz
0x7fffffffe1dc: 0x00000000
0x7fffffffe1e0: 0x6c6c6568         "hello world"
0x7fffffffe1e4: 0x6f77206f
0x7fffffffe1e8: 0x00646c72
0x7fffffffe1ec: 0x00000000        null terminated
(gdb) x/s 0x7fffffffe1e0
0x7fffffffe1e0: "hello world"

The string is on the stack, and it's very curious to see what happens if there are two followed strings like these:

  auto s = string(cstr);
  string s2 = "test";

Clang puts toguether both stack strings:
[ptr1][sz1][string1][null][string2][null][ptr2][sz2]

C++ QString datatype

Let's see the great and featured QString object defined on qstring.cpp and qstring.h

Some QString methods use the QCharRef class whose definition is below:

class Q_EXPORT QCharRef {     friend class QString;     QString& s;     uint p;
 
 Searching for the properties on the QString class I've realized that one improvement that  rust and golang does is the separation from properties and methods, so in the large QString class the methods are  hidden among the hundreds of methods, but basically the storage is a QStringData *;

After removing the methods of QStringData class definition we have this:

struct Q_EXPORT QStringData : public QShared {
    QChar *unicode;
    char *ascii;
#ifdef Q_OS_MAC9
    uint len;
#else
    uint len : 30;