Performance issue with Steinberg::MemoryStream

I have been running in some serious performance issue with the class Steinberg::MemoryStream on Windows 10 when writing a lot of elements to it.

I wrote the little following test program:

void testFastWriteMemoryStream(int iNumElements)
{
  LOG_SCOPE_FUNCTION(INFO);

  FastWriteMemoryStream vs{};
  IBStreamer streamer(&vs);
  for(int i = 0; i < iNumElements; i++)
  {
    streamer.writeFloat(static_cast<float>(i));
  }
}

void testMemoryStream2(int iNumElements)
{
  LOG_SCOPE_FUNCTION(INFO);

  Steinberg::MemoryStream vs{};
  IBStreamer streamer(&vs);
  for(int i = 0; i < iNumElements; i++)
  {
    streamer.writeFloat(static_cast<float>(i));
  }
}

// TestMemoryStream - test_perf
TEST(TestMemoryStream, test_perf)
{
  LOG_SCOPE_FUNCTION(INFO);
//  for(int i = 0; i < 10; i++)
//    testFastWriteMemoryStream(10000000);
  for(int i = 0; i < 10; i++)
    testMemoryStream(10000000);
}

First the results, then I will explain what the problem is and what is FastMemoryStream

On macOS (run on MacBook pro 2.8Ghz Intel Core 7 / 16GB) the results are the following:
For MemoryStream: avg of 1.933s
For FastMemoryStream: avg of 1.878s

On Windows 10 (Intel Core 7 26700K 3.4Ghz / 24GB) the results are the following:
For MemoryStream: avg of 90.093 s!!!
For FastMemoryStream: avg of 10.207s

The issue is not that there is a factor 10 between the mac and the PC (for FastMemoryStream). Although my PC is faster on the paper than the mac, it behaves way more slowly. The real issue is that there is a huge penalty when using MemoryStream on Windows 10.

I strongly believe that the issue is this line

TSize newMemorySize = (((Max (memorySize, s) - 1) / kMemGrowAmount) + 1) * kMemGrowAmount;

This is incredibly inefficient because it keeps on growing the memory by just 4K (which is the value kMemGrowAmount) and for some reason on the PC the call to realloc does not seem to really realloc the memory but keeps on copying it.

So what is FastMemoryStream?
FastMemoryStream is the same class as MemoryStream with the newMemorySize computed this way (which is the way that the std::vector class computes the new size, so I didn’t reinvent the wheel…)

  auto newMemorySize = std::max(memorySize, static_cast<TSize>(256));
  newMemorySize = std::max(newMemorySize * 2, s);

This generates a lot less of reallocs and as a result it is a lot faster.

Just want to pass it along as whatever code uses MemoryStream (like the vst2wrapper getChunk method) is seriously impacted by it.