• Welcome to Valhalla Legends Archive.
 

Booleans 2 bytes - Why?

Started by FrOzeN, September 13, 2006, 06:08 AM

Previous topic - Next topic

FrOzeN

Booleans are literally just True/False, yet they take up 2 bytes whereas you could fit 16 True/False combinations into that.

I understand it has something to do with the fact that they were created back when we used 16-bit systems and that it was more efficient process data in 16-bit chunks?

Anyone have any links about this, and why they're 16-bits?

Also, would it be a good idea to use an integer with bitwise comparisons throughout your code, so you could cut down every 16 booleans into 1?

Thanks. :)
~ FrOzeN

Banana fanna fo fanna

What language are you talking about? I was under the impression they are usually one byte...

MyndFyre

I can't confirm this, but my first thought about the language you're referring to is VB.  If that's the case, it's most likely because the native integer size of VB (6 and older) is 16 bits, so it just translates over.

In .NET, each Boolean value is 4 bytes.  The C/C++ macros TRUE and FALSE are typically defined as 0x00000001 and 0x00000000 respectively - also 4 bytes.

I imagine that for C/C++ and .NET, this has to do with memory alignment.  If you're using a 1-byte value, all of a sudden your data cache is going to be thrown off, and it's more efficient to go to memory when the data is aligned on a 4-byte boundary anyway.

The reason that we don't use each bit of a memory location for Booleans - it's too easy to corrupt memory and bring about the doom of multiple variables if multiple booleans were stored in the same byte.  That, and it'd be incredibly hard for the compiler to determine which Boolean belonged to which bit.  Plus, this code:

if (global_boolean & 0x00010000) ...

is less efficient than

if (local_boolean)

because the first instruction will get compiled into something like:

mov eax, [eds + 1344h]
and eax, 00010000h
cmp eax, eax
jz   value_is_false

whereas the second instruction would be able to forego the mov and and instructions.

:)
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

Skywing

The C++ bool type is typically one byte, and the BOOL typedef typically resolves to int, which is commonly four bytes.  In some situations, bool values may be promoted to the native word size for argument passing or structure alignment.

tA-Kane

#4
Quote from: MyndFyre[vL] on September 13, 2006, 01:48 PM
I can't confirm this, but my first thought about the language you're referring to is VB.  If that's the case, it's most likely because the native integer size of VB (6 and older) is 16 bits, so it just translates over.

In .NET, each Boolean value is 4 bytes.  The C/C++ macros TRUE and FALSE are typically defined as 0x00000001 and 0x00000000 respectively - also 4 bytes.

I imagine that for C/C++ and .NET, this has to do with memory alignment.  If you're using a 1-byte value, all of a sudden your data cache is going to be thrown off, and it's more efficient to go to memory when the data is aligned on a 4-byte boundary anyway.

The reason that we don't use each bit of a memory location for Booleans - it's too easy to corrupt memory and bring about the doom of multiple variables if multiple booleans were stored in the same byte.  That, and it'd be incredibly hard for the compiler to determine which Boolean belonged to which bit.  Plus, this code:

if (global_boolean & 0x00010000) ...

is less efficient than

if (local_boolean)

because the first instruction will get compiled into something like:

mov eax, [eds + 1344h]
and eax, 00010000h
cmp eax, eax
jz   value_is_false

whereas the second instruction would be able to forego the mov and and instructions.

:)

You can also forego the cmp instruction in that sample.
mov eax, dword ptr ds:[eax+0x12345678]
and eax, 0x00001000
jz bool_is_false

It's only 50% larger code. Sure, it's not worth it in time-consuming tasks, such as fast-paced games and large-scale servers. But for most tasks, I think that the saved memory is worth it. I don't know about you, but a lot of my projects use a lot of booleans.

...
But really, you've forgotten the bt instruction!
bt dword ptr ds:[eax+0x12345678], bit_number
jnb bool_is_false

The biggest downside is that it's got some issues with using I/O memory as well as trouble with 64-bit portability. In particular, it's not able to access a bit higher than the 31st bit.

Edit:
Actually, the bt instruction needs to use jb/jnb, not jz/jnz. Fixed that.

And, you could also use the test instruction. However, test is larger than bt, so if codesize is important then that throws that out. The upside is that it doesn't have the problems with I/O memory and 64-bit portability. And with test, you can even test multiple bits at once, whereas you can't with bt.
Macintosh programmer and enthusiast.
Battle.net Bot Programming: http://www.bash.org/?240059
I can write programs. Can you right them?

http://www.clan-mac.com
http://www.eve-online.com

Skywing

Optimizing for modern processors is typically more complex than just instruction length.  Many of the handy instructions that appear to let you do neat tricks with bitfields or the like in just one instruction may actually be implemented in slow microcode instead of hardware on modern processors.

tA-Kane

That's an odd way to implement an instruction. Is there somewhere that lists such instructions per processor?
Macintosh programmer and enthusiast.
Battle.net Bot Programming: http://www.bash.org/?240059
I can write programs. Can you right them?

http://www.clan-mac.com
http://www.eve-online.com

MyndFyre

Quote from: tA-Kane on September 18, 2006, 09:07 AM
That's an odd way to implement an instruction. Is there somewhere that lists such instructions per processor?

I'm not 100% sure, but I believe the IA-32 reference manual tells you.
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

tA-Kane

I did a brief search through IA-32 Intel Architecture Software Developer's Manual Volumes 2A and 2B (instruction set references), and found no mention of microcode except for CPUID, which doesn't mention that it actually uses microcode, but rather mentions that one of the values returned specifies whether the processor supports the microcode update facility.
Macintosh programmer and enthusiast.
Battle.net Bot Programming: http://www.bash.org/?240059
I can write programs. Can you right them?

http://www.clan-mac.com
http://www.eve-online.com