This is the next installment of a series of deep-dives into the structure and implementation of variables in Visual Basic for Applications. For the previous posts, see the following:
In this post, I will cover the details of string variables and pointers. See Scalar Variables and Pointers in Depth for additional background and for the code for the utility functions HexPtr
and Mem_ReadHex
.
Pointers and memory for string variables
Even though string variables are treated semantically as value types, they are reference types by implementation. The contents of a string variable is actually a pointer to another memory location where the actual string characters are stored. With VBA we can either get the address to the variable itself using VarPtr
, or we can go straight to the start of the character buffer by using StrPtr
. For a variable declared as a String
, then, directly reading the memory at the address returned by VarPtr
should give you the same pointer value as calling StrPtr
.
Strings are BSTR structures
As noted in VBA Internals: What’s in a variable, strings in VBA are implemented using the COM BSTR structure. The BSTR structure actually starts with an unsigned 32-bit integer which indicates the length of the character buffer. Note this length is in bytes, not characters, and it does not include the two bytes of the terminating null character. However, the BSTR specification requires that implementers pass around the pointer to the start of the character buffer itself (rather than the preceding length field), so that a BSTR*
can be passed directly to functions expecting pointers to C-style null-terminated strings. In order to directly read this length field, then, we need to take the pointer returned by StrPtr
and back up 4 bytes.
In the example below, I show the full BSTR structure by getting the length in bytes of the string buffer itself using LenB
, back up 4 bytes to include the length field, and read a total of 6 extra bytes to include both the length field at the start and the null character at the end.
Code
Sub StringPointerExample()
Dim strVar As String, ptrVar As LongPtr, ptrBSTR As LongPtr
strVar = "Hello"
ptrVar = VarPtr(strVar)
Mem_Copy ptrBSTR, ByVal ptrVar, PTR_LENGTH
Debug.Print "ptrVar : 0x"; HexPtr(ptrVar); _
" : 0x"; Mem_ReadHex(ptrVar, PTR_LENGTH)
Debug.Print "ptrBSTR : 0x"; HexPtr(ptrBSTR)
Debug.Print "StrPtr(): 0x"; HexPtr(StrPtr(strVar))
Debug.Print "Memory : 0x"; Mem_ReadHex(ptrBSTR - 4, LenB(strVar) + 6)
End Sub
Output
ptrVar : 0x0039F4F0 : 0xE43A3508
ptrBSTR : 0x08353AE4
StrPtr(): 0x08353AE4
Memory : 0x0A000000480065006C006C006F000000
Explanation
The variable table in this case is pretty simple:
The functions used and memory layout revealed take a little more explaining. First, when we directly read the memory at the address returned by VarPtr
, we get the bytes of the pointer to the character buffer. Since my machine is little-endian the raw bytes appear backwards. The printout shows that calling StrPtr
returns the exact same pointer value as in ptrBSTR
.
Finally, we actually display the bytes of the BSTR. It starts with the 4-byte length field. Again, this is little-endian so we have to reverse the bytes to correctly interpret it. When we do we indeed see a value of 10, for the 10 bytes of the 5-character Unicode string “Hello”. Next is the character buffer. The characters are in the order expected, but the two bytes within each 16-bit code point are once again little-endian. Finally, there is a two-byte null character at the end.
*VarPtr
|
|
|
*StrPtr
|
![]() |
0x08353AEx |
0A |
00 |
00 |
00 |
48 |
00 |
65 |
00 |
6C |
00 |
6C |
00 |
6F |
00 |
00 |
00 |
|
Length Prefix |
|
= 0x0000000A = 1010 |
Chars |
|
= 0x0048 = H |
|
|
= 0x0065 = e |
|
|
= 0x006C = l |
|
|
= 0x006C = l |
|
|
= 0x006F = o |
Null term |
|
|