• Welcome to Valhalla Legends Archive.
 

[C]SMC

Started by Hdx, November 21, 2008, 06:22 PM

Previous topic - Next topic

Hdx

So, Self Modifying Code. That or inline compileing.
To get to the point i'm doing checkrevision. I'm trying to find a efficient way of doing it. Right now i'd doing like what everyone else has. Strip out the values for ABC, and the operations {^-+/*} And then i have a big switch statement in the main loop. Thats eww.
So I was thinking, I *could* modify the code at runtime by writing over it in memory. But then there's a problem. What exactly should I write?
Arnt the math operations different depending on what platform you're compiled on?
psudo code:
void doMath(uint32_t S){
  A += S;
  B += C;
  C += A;
  A += B;
}

switch(operator1){
  case '-': WriteMemory(&doMath + 1, THE_SUB_ASM_BYTE, 1); break;
  case '+': WriteMemory(&doMath + 1, THE_ADD_ASM_BYTE, 1); break;
  case '^': WriteMemory(&doMath + 1, THE_XOR_ASM_BYTE, 1); break;
}
switch(operator2){
  case '-': WriteMemory(&doMath + 5, THE_SUB_ASM_BYTE, 1); break;
  case '+': WriteMemory(&doMath + 5, THE_ADD_ASM_BYTE, 1); break;
  case '^': WriteMemory(&doMath + 5, THE_XOR_ASM_BYTE, 1); break;
}

for(x = 0; x < data.lenth, x += 4){
  doMath((uint32_t*)(&data+x));
}

The other idea would be runtime compiling.
void doMath(uint32_t S){
  /* like 10 operations worth of NOPs */
}

sIn = "A=A^S B=B-C C=C+A A=A+B";
code = Compile(sIn);
WriteMemory(&doMath, code, len(code));
for(x = 0; x < data.lenth, x += 4){
  doMath((uint32_t*)(&data+x));
}


Im jsut kinda ranting here but if you have suggestions feel free to post.
Also, if anyone has a high resolution timer in C I can snag that'd be great {I wana time some functions}

Proud host of the JBLS server www.JBLS.org.
JBLS.org Status:
JBLS/BNLS Server Status

brew

First off, you'd have to write the rest of the routine in assembly. Unless that is, you have some way to get your compiler to use one specific register for your operations.
Second of all, this is just retarded. So you're saying you'd like to replace a switch statement of operations with a switch statement to write operations ?... There's a better way to do what you want to do: Stick with what you have right now.
And yes, it is very architecture dependent...
<3 Zorm
Quote[01:08:05 AM] <@Zorm> haha, me get pussy? don't kid yourself quik
Scio te esse, sed quid sumne? :P

Hdx

#2
Moving the switch statement outside the main loop gives ~35% speed increase.
I jsut think I could get even MORE of an increase if I was able to get rid of the jmp/rets

Proud host of the JBLS server www.JBLS.org.
JBLS.org Status:
JBLS/BNLS Server Status

brew

Quote from: Hdx on November 21, 2008, 09:12 PM
Moving the switch statement outside the main loop gives ~35% speed increase.
I jsut think I could get even MORE of an increase if I was able to get rid of the jmp/rets
...
How did you get 35%? Did you pull that number out of your ass? Tell me how you were able to move the switch statement out? Wasn't that what you were asking in your first post?
<3 Zorm
Quote[01:08:05 AM] <@Zorm> haha, me get pussy? don't kid yourself quik
Scio te esse, sed quid sumne? :P

Barabajagal

#4
It's sort of easy... you use a switch statement before the loop and pointers to functions. Increases the speed by about 35% from the tests he and I have been doing most of today.

If you can get it any more efficient than this, I'd love to know (PB does allow inline assembly).

brew

Quote from: Andy on November 21, 2008, 10:18 PM
If you can get it any more efficient than this, I'd love to know (PB does allow inline assembly).

Write it in assembly. Don't bother with the stack for the operation functions.
<3 Zorm
Quote[01:08:05 AM] <@Zorm> haha, me get pussy? don't kid yourself quik
Scio te esse, sed quid sumne? :P

Hdx

Odd, I finially got home where I can test it in my C implementation. And having the switches in the loop is 2x faster then having them outside.
This is odd, because as andy said we got a ~35% decrease in time in his PB implementation.
Maybe PB just sucks? Without getting the EXE info, my implementation does 179ms for wc3. I'll write the pe loader and then work on making it faster later.

I am still curious about SMC, in C. It would be awesome if i could learn to do it.

Proud host of the JBLS server www.JBLS.org.
JBLS.org Status:
JBLS/BNLS Server Status

brew

#7
Yeah, the idea is valid to a point but flawed. If he's having to call the operations, and you're worried about the call and ret, then why call? I guess I like your idea about precalculating the calls- but why can't they be jumps?
That'd make for some serious performance gains. The processor wouldn't have to bother pushing/popping eip, or jumping back to the next instruction after the call. Also, I recommend using Intel's C compiler if you're going for the fastest code possible with C. And yes, PB just does suck.

By the way, I think you're stressing about the cost of a jump way too much. Perhaps PB wasn't smart enough to make a jump table, and the switch statement was just compiled into a bunch of sequential jumps. Or more likely, it's the file operations. In that main loop you're opening each file, reading, etc etc. A lot of performance nasty stuff. I'd try optimizing that.
<3 Zorm
Quote[01:08:05 AM] <@Zorm> haha, me get pussy? don't kid yourself quik
Scio te esse, sed quid sumne? :P

Barabajagal

The file reading is done all at once, and it takes almost no time. I thought it was the file operations at first, too... but it's not.

Yegg

Out of curiosity, will this method be even the least bit noticeable with a language like C? Are you just doing this for the sake of it being more efficient? Is this to make slower languages process the information faster?

Barabajagal

I don't think he cares about slower languages... He's just trying to find the best and fastest way to do it.

Hdx

Quote from: Yegg on November 22, 2008, 01:51 PM
Out of curiosity, will this method be even the least bit noticeable with a language like C? Are you just doing this for the sake of it being more efficient? Is this to make slower languages process the information faster?
In higher languages, Yes, it will be noticeable. It'll be noticeably slower. With the switch inside the main loop i average 180ms. With it outside I average 370ms. With a little bit of looking at it, it was obvious that switches inside would be faster then push/call/ret/pop
But it is valid for other language {PB for example where I got my ~35% from}

I'm working on loading PE files now. Maping out the sections is annoying me -.-

Proud host of the JBLS server www.JBLS.org.
JBLS.org Status:
JBLS/BNLS Server Status

Kp

Ignoring the overhead costs of preparing each method, you will get better performance by a large margin if you generate the code at runtime, for the simple reason that it avoids both the jumps/calls inherent in the switch and the repeated loads of the operator to identify what statement should be next.  For a large enough data set, the preparation overhead will be lost in the time spent executing the computation itself.  Depending on the data set, it's possible that fetching the data to checksum will evict the control string, making repeated switches even more expensive.  By generating the checksum function, you can keep the generated code in the icache, so it is protected from getting evicted by data loads.

If not for the sheer number of possible control strings, you could unroll the switch and get performance equal to, if not greater than (due to reduced overhead), the performance of doing runtime generation.

Although it might be possible, it's very likely not worth the trouble to come up with a platform independent way to do runtime generation.  One way to minimize your platform dependence would be to create a set of platform specific helper functions that know the opcodes for each operation.  Then the main, independent loop can do:

switch(op1) { case '+': add_op_plus(...); break; case '-': add_op_minus(...); break; ... }

Porting to a new platform then becomes a matter of updating the add_op_* functions, and writing something to generate new prolog/epilog.
[19:20:23] (BotNet) <[vL]Kp> Any idiot can make a bot with CSB, and many do!