• Welcome to Valhalla Legends Archive.
 

C/C++ & Asm - Prototypeless Function Pointers

Started by tA-Kane, January 25, 2008, 01:16 PM

Previous topic - Next topic

tA-Kane

I'm working on a project which needs to schedule functions to be executed in a specific thread(s) (possibly in another process as well). The specific reasons why are various.

Assume -

  • The thread in which the functions are to be executed knows about the functions and is expecting them
  • The functions are to be executed in FIFO order - that is, first come/first serve
  • The functions do not have the same types of parameters, nay even the same number of parameters
  • Obviously due to the function being called in a different thread, the parameters are not currently stored on the stack. As such, the called function assumes ownership of the parameters and will deallocate them as necessary.
  • Return values may be ignored.
Requirements -

  • Memory use is of little concern (don't waste it obviously)
  • CPU use needs to be tight

At first, I was needing to schedule less than a dozen or so functions. There was simply a container with an integer-type function identifier and a method of storing the function parameters (byte array, linked list... it didn't matter at first). We used a switch statement to identify which function to execute, and each case also assumes the number of parameters have been stored.

As the project progressed, the number of functions to be scheduled has grown by several dozen, to the current count of about 110. While that in and of itself wasn't a problem, keeping the enumeration for which function belonged to which case did become a problem to some of our newer guys. Additionally, as the project progressed, it became a concern that the code size for something so simple was increasing a great deal as the number of functions and parameters was increasing.

Unfortunately, none of the other programmers on the project could come up with a better solution.


After some thought, I a prototypeless function pointer occurred to me:

typedef void* (__cdecl FUNCPTR)(...);


This does solve the issue of the enumeration growing, as you'd simply call the passed function pointer rather than doing a switch. Ironically, this should also save a little bit of CPU, although it's probably negligible.

It does, however, present a unique situation. The different functions to be pointed to have different numbers of parameters which cannot be identified using a prototypeless function pointer like this one. As far as I am aware, while C/C++ does provide a method of manually pushing any number of parameters onto this function, it does not provide the means to programmatically determine the number of parameters a function has or their boundaries (you can't use a loop to push the parameters). Due to this, it would re-introduce the switch statement to select the number of parameters and call the function as appropriate with that number of parameters.

As such, an Asm hack seemed required. Portability issues are a non-concern, so I proceeded.

Here's what I've come up with. The other programmers on the project have no experience with assembly (and very little experience with functions prototyped with (...)), which makes them unable to check my work. As such, what do you all think of this code? Are there any issues you can see which I may have not foreseen? Anything I may have missed?

I have a static byte array in carrier to hold the parameters simply because I don't want to make two allocations for the test -- one for the carrier structure and another for the parameters. This can obviously result in a buffer overflow of which I am aware. I am also aware it can result in large amounts of memory waste when pointing to functions with few or no parameters when there's dozens of entries in the schedule queue. Assuming no drastic problems with this code, both issues will be reviewed.

Things I will need to do is move carrier to a struct (I don't even remember why I made it a union) which includes the paramsize rather than passing paramsize alongside carrier. Additionally, carrier may eventually be passed to callit() as a pointer rather than inline parameter.

Sample test code below --

typedef void* (__cdecl FUNCPTR)(...);

union carrier {
FUNCPTR *funcptr;
char va_test[512];
};
void __cdecl callit(int size, carrier omg);

int __cdecl something(char *buf, int a, int b, int c, int d, int e, int f) {
sprintf(buf, "a = %.x\nb = %.x\nc = %.x\nd = %.x\ne = %.x\nf = %.x\nwtf=%.x\n", a, b, c, d, e, f, a ^ b ^ c ^ d ^ e ^ f);
printf("%s", buf);
return a ^ b ^ c ^ d ^ e ^ f;
}

__declspec(naked) void __cdecl callit(int paramsize, carrier omg) {
__asm {
push ebp;
mov ebp, esp;
push esi;

mov ecx, paramsize;
lea esi, omg;
add esi, SIZE omg.funcptr; // skip the function ptr
shr ecx, 2; // modulo 4 ... this assumes int is (and thus params are in a multiple of) 32-bit
jz _asmparamspushed;

// TODO: look into using repeat opcodes
_asmpushparams:
push [esi];
add esi, 4;
dec ecx;
jnz _asmpushparams;

_asmparamspushed:
call omg.funcptr;

lea esp, [ebp-4];
pop esi;
pop ebp;
retn;
}
}

int _tmain(int argc, _TCHAR* argv[])
{
char buf[256];
carrier omg;
int a, b, c, d, e, f, r;
a = 0x12345678;
b = 0x87654321;
c = 0x12873465;
d = 0x13572468;
e = 0x24687531;
f = 0x18273645;
r = 0;

omg.funcptr = (FUNCPTR *)something;
va_list list, list2;

va_start(list2, omg.funcptr);
va_start(list, omg.funcptr);
va_arg(list, int) = f;
va_arg(list, int) = d;
va_arg(list, int) = e;
va_arg(list, int) = c;
va_arg(list, int) = b;
va_arg(list, int) = a;
va_arg(list, char *) = buf;
callit((int)(list - list2), omg);
printf("--- double check ---\n");
printf("%s", buf);

// let the test display to the console before exiting
gets_s(buf, sizeof(buf));
return 0;
}


[Kp edit: deleted long comment that broke table layout.]
Macintosh programmer and enthusiast.
Battle.net Bot Programming: http://www.bash.org/?240059
I can write programs. Can you right them?

http://www.clan-mac.com
http://www.eve-online.com

brew

#1
Isn't this overcomplicating things? I personally liked your first idea better. And it doesn't need to be cpu consuming at all...
You have a numeric ordinal function identifier.
You have an array of function pointers.
Calling it is easy:


;your first parameter, the function identifier
mov eax, dword [ebp - 8]

;your second parameter, an address to your function specific parameter struct.
;or it could be your first generic parameter for the function you're calling.
mov ecx, dword [ebp - 12]

lea eax, [basearrayaddress + eax * 4]
call dword [eax]


Come to think of it, this is probably the way Windows handles window "messages".

It's memory efficient, and it takes maybe say, a total of 18-24 cycles? Is that cpu efficient enough?
<3 Zorm
Quote[01:08:05 AM] <@Zorm> haha, me get pussy? don't kid yourself quik
Scio te esse, sed quid sumne? :P

tA-Kane

Quote from: brew on January 25, 2008, 02:31 PM
Isn't this overcomplicating things?
Probably. It's one of the reason I'm asking; I'm looking to see if I like any solutions anyone else might be able to come up with, in addition to asking for comments on issues you might see with what I've already presented.

Quote from: brew on January 25, 2008, 02:31 PMI personally liked your first idea better.
Yes, it's quite simple. Unfortunately, as the number of functions grows, so does the function which calls them. It's quite large now for an idea so very simple. More code means more compile time and a larger executable (neither is a concern, just a sidenote). I'm trying to kill multiple birds with one stone.

Quote from: brew on January 25, 2008, 02:31 PMAnd it doesn't need to be cpu consuming at all...
You have a numeric ordinal function identifier.
You have an array of function pointers.
Calling it is easy:


;your first parameter, the function identifier
mov eax, dword [ebp - 8]

;your second parameter, an address to your function specific parameter struct.
;or it could be your first generic parameter for the function you're calling.
mov ecx, dword [ebp - 12]

lea eax, [basearrayaddress + eax * 4]
call dword [eax]


Come to think of it, this is probably the way Windows handles window "messages".

It's memory efficient, and it takes maybe say, a total of 18-24 cycles? Is that cpu efficient enough?
CPU & memory efficient "enough", sure. But I don't think that's the solution I'm after if what I'm already doing is not only possible, but working on the presented test. My solution already uses less memory (only a single function pointer instead of an array of pointers and an index, excluding the fact that the parameters are passed via a static byte array rather than a dynamically-allocated array of the needed size). And correct me if I'm wrong, but isn't getting a pointer with no math faster than getting a pointer using an indexed array?


Any other thoughts or suggestions?
Macintosh programmer and enthusiast.
Battle.net Bot Programming: http://www.bash.org/?240059
I can write programs. Can you right them?

http://www.clan-mac.com
http://www.eve-online.com

brew

Quote from: tA-Kane on January 25, 2008, 03:30 PM
Yes, it's quite simple. Unfortunately, as the number of functions grows, so does the function which calls them. It's quite large now for an idea so very simple. More code means more compile time and a larger executable (neither is a concern, just a sidenote). I'm trying to kill multiple birds with one stone.
The size of the function shouldn't grow at all with the addition of new functions, and should stay just as fast [an O(1) lookup]since it's all "math" based. The only thing that should grow is the array itself. The "lookup" itself is just one instruction. So your function that is called can be as small as 12 bytes, that is, if you specify your own calling convention and pass the parameters through registers.

Quote from: tA-Kane on January 25, 2008, 03:30 PM
And correct me if I'm wrong, but isn't getting a pointer with no math faster than getting a pointer using an indexed array?

You don't want to use math to get it? How would you get it, then? the instruction lea was made just for these kinds of situations.

Just curious, what is this? Perhaps if it's a type of api system you should include checks as to what function to call, etc, and patch up all the obvious possible security holes. If it's internal only, don't bother though. It wouldn't be worth the cpu cycles. My method would be the least cpu consuming, due to the lack of raw instructions. Each parameter wouldn't require a va_arg call as well, since you'd be passing the raw param or parameter struct. I wouldn't even bother with the stack here if it's as cpu critical as you say. Nor C.
<3 Zorm
Quote[01:08:05 AM] <@Zorm> haha, me get pussy? don't kid yourself quik
Scio te esse, sed quid sumne? :P

tA-Kane

Quote from: brew on January 25, 2008, 07:43 PM
Quote from: tA-Kane on January 25, 2008, 03:30 PM
Yes, it's quite simple. Unfortunately, as the number of functions grows, so does the function which calls them. It's quite large now for an idea so very simple. More code means more compile time and a larger executable (neither is a concern, just a sidenote). I'm trying to kill multiple birds with one stone.
The size of the function shouldn't grow at all with the addition of new functions, and should stay just as fast [an O(1) lookup]since it's all "math" based.
You mentioned you liked my first idea, and I was reflecting on it.

Quote from: brew on January 25, 2008, 07:43 PM
Quote from: tA-Kane on January 25, 2008, 03:30 PM
And correct me if I'm wrong, but isn't getting a pointer with no math faster than getting a pointer using an indexed array?

You don't want to use math to get it? How would you get it, then? the instruction lea was made just for these kinds of situations.
Compare:

lea eax, [basearrayaddress + eax * 4]
vs
lea esp, [ebp-4];

Essentially no math in the latter one.

In either case, I don't suppose it really matters at all.

Quote from: brew on January 25, 2008, 07:43 PMJust curious, what is this?
Internal storage and processing software designed to work with legacy (<= 266MHz) systems and still work with new systems added late next year (roughly a 10 year span between upgrades).

Quote from: brew on January 25, 2008, 07:43 PMPerhaps if it's a type of api system you should include checks as to what function to call, etc, and patch up all the obvious possible security holes.
I'm no security expert, so please explain.

Quote from: brew on January 25, 2008, 07:43 PMMy method would be the least cpu consuming, due to the lack of raw instructions. Each parameter wouldn't require a va_arg call as well, since you'd be passing the raw param or parameter struct. I wouldn't even bother with the stack here if it's as cpu critical as you say. Nor C.
Your method wouldn't require a va_arg(), sure. It doesn't change the way the data gets sent to callit(). Pushing the parameters into a structure with defined variables and then callit() sends that structure to the function (which, I mentioned, not all functions can be made to support) versus pushing them into a structure with a typeless byte array which is then pushed onto the stack by callit(). Hmm, I see little difference in how the parameters for the function arrives at callit() and significant advantages of having callit() push the parameters straight to the stack: don't have to create parameter structures for each function and rework each function to work with that parameter structure, and don't have to have a middleman function for the legacy imports which just calls the legacy function while pulling the parameters from the structure (and ends up pushing them to the stack anyways).


What I don't think you're understanding is that I'm not using this asm hack specifically for speed. If I were to use a function pointer array and a parameter structure, then why use an asm hack at all? That can be done without assembly. That alone has merits, don't get me wrong. But this group isn't going to switch to a parameter structure.

Without a parameter structure being passed to the queued functions, there's a need to pass an indeterminate-at-compile-time number of parameters to the called function. This means one of two things - use a switch to call the function with the required number of parameters, or use an asm hack to force that number of parameters onto the stack and then call the function.

And I do see difference between using an indexed function pointer array vs using a single function pointer. They both do the same thing in roughly the same amount of time. They both require a single parameter passed to callit() to identify the function to be called. But one has an advantage - a single function pointer uses less memory than an array of such. Since an array is not specifically required by anything, why use it?
Macintosh programmer and enthusiast.
Battle.net Bot Programming: http://www.bash.org/?240059
I can write programs. Can you right them?

http://www.clan-mac.com
http://www.eve-online.com

brew

Quote from: tA-Kane on January 26, 2008, 02:33 AM
You mentioned you liked my first idea, and I was reflecting on it.
Sorry, misunderstood.

Quote from: tA-Kane on January 25, 2008, 03:30 PM
lea eax, [basearrayaddress + eax * 4]
vs
lea esp, [ebp-4];
It would have little to no impact on performance-- don't forget how the cpu is going to carry out that multiplication (SHL).
I assumed that since you intend to call these functions from different processes (as you have stated in your first sentence), and it'd be hard to find the address of your function you wish to call. Since getting the address doesn't seem to be a problem, I guess that your solution would be optimal there.

Quote from: tA-Kane on January 25, 2008, 03:30 PM
Quote from: brew on January 25, 2008, 07:43 PMPerhaps if it's a type of api system you should include checks as to what function to call, etc, and patch up all the obvious possible security holes.
I'm no security expert, so please explain.
Nevermind. There should be no security problems if it's done right, since it's all internal (i thought it wasn't at first).


Quote from: tA-Kane on January 25, 2008, 03:30 PM
Your method wouldn't require a va_arg(), sure. It doesn't change the way the data gets sent to callit(). Pushing the parameters into a structure with defined variables and then callit() sends that structure to the function (which, I mentioned, not all functions can be made to support) versus pushing them into a structure with a typeless byte array which is then pushed onto the stack by callit(). Hmm, I see little difference in how the parameters for the function arrives at callit() and significant advantages of having callit() push the parameters straight to the stack: don't have to create parameter structures for each function and rework each function to work with that parameter structure, and don't have to have a middleman function for the legacy imports which just calls the legacy function while pulling the parameters from the structure (and ends up pushing them to the stack anyways).
I was refering to my version of the callit();. I don't see why you're taking the parameters, repushing them, etc etc. If you'd like, you can just use that solid char array as your parameter block. I don't have that good of a sense of types. It's all just memory to me. You would have to read from the memory any way, so why not just cast the block of 512 bytes to a parameter structure pointer, or even better yet-- if they don't want to deal with structures at all, they CAN do *(unsigned long *)(asdf + offset) = valuehere;. But look at how silly that looks :P
What you *should* do is just pass your param block's address instead of pushing them all onto the stack. You're making an unnecessary copy of that memory. You'd end up reading it like mov eax, dword [ebp - 4] anyways. The pushes are what's killing you. Don't forget you're modifying esp, and copying memory with every push you make.
Just curious, how are you "passing" the parameters? Shared process memory?
<3 Zorm
Quote[01:08:05 AM] <@Zorm> haha, me get pussy? don't kid yourself quik
Scio te esse, sed quid sumne? :P

tA-Kane

Quote from: brew on January 26, 2008, 11:24 AM
Quote from: tA-Kane on January 25, 2008, 03:30 PM
Your method wouldn't require a va_arg(), sure. It doesn't change the way the data gets sent to callit(). Pushing the parameters into a structure with defined variables and then callit() sends that structure to the function (which, I mentioned, not all functions can be made to support) versus pushing them into a structure with a typeless byte array which is then pushed onto the stack by callit(). Hmm, I see little difference in how the parameters for the function arrives at callit() and significant advantages of having callit() push the parameters straight to the stack: don't have to create parameter structures for each function and rework each function to work with that parameter structure, and don't have to have a middleman function for the legacy imports which just calls the legacy function while pulling the parameters from the structure (and ends up pushing them to the stack anyways).
I was refering to my version of the callit();. I don't see why you're taking the parameters, repushing them, etc etc. If you'd like, you can just use that solid char array as your parameter block. I don't have that good of a sense of types. It's all just memory to me. You would have to read from the memory any way, so why not just cast the block of 512 bytes to a parameter structure pointer, or even better yet-- if they don't want to deal with structures at all, they CAN do *(unsigned long *)(asdf + offset) = valuehere;. But look at how silly that looks :P
What you *should* do is just pass your param block's address instead of pushing them all onto the stack. You're making an unnecessary copy of that memory. You'd end up reading it like mov eax, dword [ebp - 4] anyways. The pushes are what's killing you. Don't forget you're modifying esp, and copying memory with every push you make.
Just curious, how are you "passing" the parameters? Shared process memory?
The method of communication is not up for discussion (NDA, unfortunately) but suffice to say that different processes having different addresses for the parameters is a nonissue due to both the communication process and the method of determining what the values of the parameters are.

I am taking the parameters and pushing them into a structure. The structure is "safe" to be moved from thread to thread, or to even be communicated to another process that is expecting the structure. I'm then sending a pointer to that structure, to Callit(). Since the structure is safe to be communicated to other threads, it is not stored on the stack. Because it's not stored on the stack, the functions' parameters have to be re-pushed to the stack.
Macintosh programmer and enthusiast.
Battle.net Bot Programming: http://www.bash.org/?240059
I can write programs. Can you right them?

http://www.clan-mac.com
http://www.eve-online.com

D.hycle

even i tried with this ..

                                                                 
    COMPUTE NEW-NO =FUNCTION INTEGER-OF-DATEWS-CURRENT-           
               DATE    NUMBER-OF-DAYS 
     COMPUTE NEW-DATE = FUNCTION DATE-OF-INTEGER NEW-NO         

still i m getting the same errors..

126  IGYPS2121-S   "FUNCTION" WAS NOT DEFINED AS A DATA-NAME.  THE STATEMENT
                                                                             
                   SAME MESSAGE ON LINE:    128                             
                                                                             
126  IGYPS2072-S   "INTEGER-OF-DATE" WAS INVALID.  SKIPPED TO THE NEXT VERB,
                                                                             
128  IGYPS2072-S   "DATE-OF-INTEGER" WAS INVALID.  SKIPPED TO THE NEXT VERB, -S   "DATE-OF-INTEGER" WAS INVALID.  SKIPPED TO THE NEXT VERB,

Regards,

tA-Kane

Macintosh programmer and enthusiast.
Battle.net Bot Programming: http://www.bash.org/?240059
I can write programs. Can you right them?

http://www.clan-mac.com
http://www.eve-online.com

MyndFyre

QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.