Where [esi+3C] = 12345678 AND [esi+3C] is constantly updating with a new DWORD value...
My goal is to throw 12345678 into my own static pointer, but in reverse order 87654321.
This is what I've been bouncing around in my head...
push eax
mov al, byte ptr [esi+3F]
mov byte ptr [Pointer part 1], al
mov al, byte ptr [esi+3E]
mov byte ptr [Pointer part 2], al
mov al, byte ptr [esi+3D]
mov byte ptr [Pointer part 3], al
mov al, byte ptr [esi+3C]
mov byte ptr [Pointer part 4], al
pop eax
ret
Where the result would = DWORD of 87654321 in [Pointer], plus account for the auto-updating of the DWORD value in [esi+3C]. So if 12345678 changes to 23456789 it will become 98765432 my new static pointer.
If I'm rambling I apologize...
Anyway, my question is: Is there a better way to do this?
This is how I picture it:
mov ecx, 8
xor edx, edx
mov eax, [esi+3ch]
shifting:
shl edx, 4
mov ebx, eax
shr eax, 4
and ebx, 15
or edx, ebx
loop shifting
mov pointer, edx
Why are you seeking to reverse the nibbles as well? Usually people just want to reverse the ordering at the byte level. :)
Quote from: Paul on November 16, 2003, 02:03 AM
Where [esi+3C] = 12345678 AND [esi+3C] is constantly updating with a new DWORD value...
My goal is to throw 12345678 into my own static pointer, but in reverse order 87654321.
This is what I've been bouncing around in my head...
push eax
mov al, byte ptr [esi+3F]
mov byte ptr [Pointer part 1], al
mov al, byte ptr [esi+3E]
mov byte ptr [Pointer part 2], al
mov al, byte ptr [esi+3D]
mov byte ptr [Pointer part 3], al
mov al, byte ptr [esi+3C]
mov byte ptr [Pointer part 4], al
pop eax
ret
Where the result would = DWORD of 87654321 in [Pointer], plus account for the auto-updating of the DWORD value in [esi+3C]. So if 12345678 changes to 23456789 it will become 98765432 my new static pointer.
If I'm rambling I apologize...
Anyway, my question is: Is there a better way to do this?
Use the
bswap instruction - available on i486 and higher.
Quote from: Skywing on November 16, 2003, 10:58 AMUse the bswap instruction - available on i486 and higher.
I started to post the same thing, then noticed that he wants swapping on a per-nibble basis, not per-byte like bswap does. Hence my query to him. Of course, the code he provided us as an example is wrong if he really does want a per-nibble swap. :)
Well, I was assuming that his explanation was correct and the code possibly wrong since he wanted help with the code...
I was looking for the bswap, but it wasn't listed in 386intel.txt.....
edit:
mov eax, [esi+3ch]
bswap eax
mov ebx, eax
and eax, 0f0f0f0fh
and ebx, 0f0f0f0f0h
shl eax, 4
shr ebx, 4
or eax, ebx
mov pointer, eax
Quote from: Adron on November 16, 2003, 11:04 AM
Well, I was assuming that his explanation was correct and the code possibly wrong since he wanted help with the code...
Now he knows how to accomplish either of his listed goals. So, we should have everything covered? :p
Excellent, when I get off work I'll be able to test/compile this code into my project. Thanks much!
Quote from: Paul on November 16, 2003, 07:09 PM
Excellent, when I get off work I'll be able to test/compile this code into my project. Thanks much!
So, which
did you want to do, anyway?
Byte swapping!
Along similar lines, I was talking to Sky this morning about byte swapping for other-endian protocols... So he gave me the function:
unsigned short bswap(unsigned short u) { return ((u & 0xff) << 8) | (u >> 8); }
This is all well and good, but I decided to write it in ASM for the hell of it and got:
WORD ByteSwapWORD( WORD x )
{
__asm
{
mov ax, x
and ax, 0xff
shl ax, 8
shr x, 8
or x, ax
}
return x;
}
Which is all well and good until you get to DWORDs. Damned if I could figure out how to do the swapping in C, so I went with ASM again, this time coming up with:
DWORD ByteSwapDWORD( DWORD x )
{
__asm
{
push edx
push ebx
mov edx, x // 01 02 03 04
mov ax, dx // dx = 03 04
and ax, 0xff // ax = 03 04 -> 0000 0011 0000 0100 -> 0000 0000 0000 0100
shl ax, 8 // ax = 0000 0100 0000 0000
shr dx, 8 // dx = 0000 0000 0000 0011
or dx, ax // dx = 0000 0100 0000 0011 -> 04 03
shl edx, 16 // edx = 0000 0100 0000 0011 0000 0000 0000 0000 -> 04 03 00 00
xor ebx, ebx
mov eax, x // eax = 01 02 03 04
shr eax, 16 // eax = 00 00 01 02 -> 0000 0000 0000 0000 0000 0001 0000 0010
mov bx, ax
and ax, 0xff // ax = 0000 0000 0000 0010
shl ax, 8 // ax = 0000 0010 0000 0000
shr bx, 8 // bx = 0000 0000 0000 0001
or bx, ax // bx = 0000 0010 0000 0001 -> 02 01
or edx, ebx // edx = 0000 0100 0000 0011 0000 0010 0000 0001 -> 04 03 02 01
mov x, edx
pop ebx
pop edx
}
return x;
}
As you can see from the comments, I was having lots of fun working out the binary for the instructions and stuff. Anyway, after finishing this behemoth of a function, I seemed to remember something like this on the forums and whipped out my handy (and free) IA-32 Architecture Software Developer's Manual Volume 2: Instruction Set Reference. I looked up bswap and to my amazement, it did the whole DWORD thing in just one instruction. Well, damn, that was a lot of wasted effort. Then it said see xchg for 16-bit numbers and there was a single instruction that did it for words. *sigh* Final code looks like:
WORD ByteSwapWORD( WORD x )
{
__asm
{
mov ax, x
xchg ah, al
mov x, ax
}
return x;
}
DWORD ByteSwapDWORD( DWORD x )
{
__asm
{
mov eax, x
bswap eax
mov x, eax
}
return x;
}
Anyway, just thought I'd vent some frustration. :P
You could improve those by making them naked and fastcall.
__declspec(naked) unsigned short __fastcall ByteSwapWORD(unsigned short)
{
__asm {
xchg cl, ch
mov ax, cx
}
}
__declspec(naked) unsigned long __fastcall ByteSwapDWORD(unsigned long)
{
__asm {
bswap ecx
mov eax, ecx
}
}
Quote from: Skywing on November 26, 2003, 11:34 AM
You could improve those by making them naked and fastcall.
Even better, make them attribute ((regparm (1))), in which case the argument will be in eax/ax when the function starts, saving you from even having to move it from ecx. ;) Also, it might be worth doing some testing on whether it's faster to xchg or do the exchange manually. Same with bswap -- just because it's one instruction, it might not be fast. Finally, I'm certain CupHead's dword swapper is bloated. I've inlined something that has that effect several times and it's never been that long. :)
I'm sure it's bloated too, probably because I used an instruction for each step (and you can see the progression). Obviously there would be faster ways like just swapping the inner bytes and then the outer bytes, but what you see is what I've got.
Quote from: Kp on November 26, 2003, 04:23 PM
Quote from: Skywing on November 26, 2003, 11:34 AM
You could improve those by making them naked and fastcall.
Even better, make them attribute ((regparm (1))), in which case the argument will be in eax/ax when the function starts, saving you from even having to move it from ecx. ;) Also, it might be worth doing some testing on whether it's faster to xchg or do the exchange manually. Same with bswap -- just because it's one instruction, it might not be fast. Finally, I'm certain CupHead's dword swapper is bloated. I've inlined something that has that effect several times and it's never been that long. :)
How would you do that in msvc++ ? I can't find regparm or __attribute__ on msdn.
__attribute__((regparm(1)))
... ?
Quote from: Etheran on November 26, 2003, 06:24 PM
Quote from: Kp on November 26, 2003, 04:23 PM
Quote from: Skywing on November 26, 2003, 11:34 AM
You could improve those by making them naked and fastcall.
Even better, make them attribute ((regparm (1))), in which case the argument will be in eax/ax when the function starts, saving you from even having to move it from ecx. ;) Also, it might be worth doing some testing on whether it's faster to xchg or do the exchange manually. Same with bswap -- just because it's one instruction, it might not be fast. Finally, I'm certain CupHead's dword swapper is bloated. I've inlined something that has that effect several times and it's never been that long. :)
How would you do that in msvc++ ? I can't find regparm or __attribute__ on msdn.
__attribute__((regparm(1)))
... ?
Those are GCC extensions and are incompatible with VC.
that's what I thought, but is there any way to do this in vc? I'm thinking no and the only way to do it would be to put the value in eax before you make the function call.
int __declspec(naked) myFunction(void);
int ret;
__asm { mov eax, theVal }
ret = myFunction();
or perhaps this generates a compiler error..
myFunction();
__asm { mov ret, eax }
Quote from: Etheran on November 26, 2003, 06:48 PM
that's what I thought, but is there any way to do this in vc?
No.
Quote from: Skywing on November 26, 2003, 06:52 PM
Quote from: Etheran on November 26, 2003, 06:48 PM
that's what I thought, but is there any way to do this in vc?
No.
Which is truly unfortunate, because there's really no reason that I can see why you
shouldn't use all three call-clobbered registers for parameter passing (if you're going to pass values in registers at all -- there exist some circumstances (typically when the parameters are ignored for a while) when it's better not to pass them as registers).
As an interesting quirk, GCC supports MSVC's _fastcall correctly by creating a two-register pass using ecx,edx; too bad VC can't do the reverse and support GCC's ability to do three-register using eax,edx,ecx. :)