Denormal Numbers? Still a problem!

You can mask the corresponding MXCSR flags as described here Page 240

But denormals are not supposed to incur such a performance penalty on recent cpu’s AFAIK.
Perhaps you have some kind of branching in your loop that makes the cpu change depending on the input level ?