• Welcome to Valhalla Legends Archive.
 

Foxtrot - Windows Source Leak

Started by iago, March 01, 2004, 08:37 AM

Previous topic - Next topic

Kp

Quote from: Adron on March 04, 2004, 04:59 PM
I'm not convinced that that is enough to help much. With stdcall, the stack reset is done by the return statement. With cdecl, the compiler has to emit code to do that. Yes, you may avoid doing it more than once per subroutine, but still...

I believe in many cases the code to set up a stack frame is smaller if you don't sub esp by some amount first. Any arguments you like to pass from registers are stored with a single byte instruction push, while storing a register to an offset from esp requires a multi-byte instruction.

Code that stores into a fixed stack frame will be bigger, and then most likely slower, or more likely to fill the processor's cache.

I don't contest the size by any means.  The optimization level I was thinking of (-O3) prefers speed over all else; you can get smaller code just by backing off to -O2 (or if you want minimal code space usage, -Os).  I hadn't considered the cache issue, but I expect that the cache is big enough to tolerate a few dozen parameters.  It's my understanding from the gcc documentation that the intended performance gain was that the change will remove the sequence dependency of the parameters, with the hope that the compiler can generate better code if it can prepare the parameters in whatever order it pleases.

As a closing point, I'd like to say _stdcall is also bad because it allows VB code to work when calling WinAPI functions. ;)
[19:20:23] (BotNet) <[vL]Kp> Any idiot can make a bot with CSB, and many do!

Adron

I suppose lifting the sequence dependency might improve performance in some cases, but I don't think it will do that in most, or even in a large number of cases.

Also, keeping most of the calls as small as possible using only simple pushes, and adding a "sub esp, X" for the cases where setting up a stack frame in a random order would be advantageous might still result in faster code.

MSVC++ seems to use pushes to store data on stack sometimes, and subtractions from esp sometime. Most of the time it uses pushes.

I believe the cache issue is an important one. Not for applications that run alone in the system and have the cpu to themselves, but for most real-world applications minimizing size is important for performance. Either you're small, or you're pushed out of cache on a task-switch, or even swapped out.

Have you seen the result of post-processing tools that move seldom used code to completely different memory pages? That doesn't seem to save a large percentage, but it's still done - I doubt it would be done if it improved nothing since it's an inconvenience and makes debugging ewwy.

Anyway, I suppose some benchmarking of large applications compiled with either setting might be the best test :)

iago

This'll make an interesting test for broken AV:
QuoteX5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*