• Welcome to Valhalla Legends Archive.
 

Internal variable system.

Started by EvilCheese, June 26, 2003, 09:22 AM

Previous topic - Next topic

herzog_zwei

That really depends, but it'll improve in more situations than that.  If you were talking about a perfect hash then, yes, it will definitely be slower.  However, that generally wastes lots of memory and isn't what you'd want to use if you do a lot of dynamic adds/deletes of keys.  In real life scenarios where you have lots of keys and want it to be dynamic, a hierarchical hash map can generally be faster than a regular hash map.  Regular hash maps have static settings that work well up to a certain point (and aren't set to hog up as much storage space as perfect hashes).  Once you exceed that (or you have a bad hash function), you will generate more colllisions.  The hierarchy would reduce the amount of potential collisions you will get.  Other benefits of a hierarchy include being able to cache the hash map for a node (so they don't need to be looked from the top level) and being able to iterate through all members of a classification quickly rather than having to iterate through all members of all classes.

EvilCheese

In a situation using some exhaustive lookup methodology, then a hierarchical system will obviously convey a lot of benefits merely by reducing the number of lookups you have to do to reach a given node on your tree.

However, I would tend to agree with Adron that if you are using a hashtable as lookup, that hierarchical arrangement would convey little in the way of obvious benefits, and would likely be slower in many situations, especially for trees with many levels, assuming you use true hierarchy.

Arta

I've recently implemented something like this for TestBNCS. It works by hashing the variable (Key) name, and saving that in a hashtable. Data is all stored as VOID* within nodes in the hashtable, the GetKey() function just returns a VOID* that can be cast to whatever -  the caller is expected to know what type the variable they're grabbing is. Addition of keys to the system is accomplished with a set of overloaded functions that all call an AddKey(Key, (LPCVOID)&Value, ValueLength) function.

I like this system so far, it's fast and efficient. Currently, I'm using a large hashtable and am not allowing collisions - for performance - but in the future I might make the hashtables entries into linked lists, depending on the performance hit. I don't think there will be many collisions anyway, since there are several different sets of data - not all data is lumped into the same hashtable - so it should work out well.

We shall see :)

herzog_zwei

#18
Quote from: Arta[vL] on July 03, 2003, 03:09 PM
I like this system so far, it's fast and efficient. Currently, I'm using a large hashtable and am not allowing collisions - for performance

Yeah, hash maps are one of the most flexible ways of storing dynamic data and it's very efficient if you use good hash functions.  Is the hash table static, did you implement a perfect hash, or are you just disallowing insertions of keys that would cause collisions?

Quote
- but in the future I might make the hashtables entries into linked lists, depending on the performance hit.

You might want to consider using an expandable/collpasible array instead of linked lists since it'll give better performance overall.  Pick a reasonable growth size for your dynamic array based upon what you think would be the average number of collisions your hash function will produce.  Expand/collapse the array only when needed (based upon how many entries are being used in the array and what your growth size is) and your insertions/deletions won't suffer much of a hit.  You'll definitely fragment the heap less and you gain the speed/ease of using an array.

If possible, you might also want to consider using enumerations or pointers (to static strings) as the key instead of full-blown dynamic strings.

Once you have all that going, you might want to have it use real variants instead of VOID*, just so you can call it complete (and possibly to deal with automatic allocation/deallocation of memory for certain types, something that a simple VOID* system isn't good at).  You don't suffer much of a hit from it but it does make life easier.  I'd avoid having a single overloaded function to Set/Get all types of data (other than a variant type) so that you can do something like SetFloat(key, 0) and turn it into a float instead of using Set(key, 0), where it can be a interpreted to be a any of a string/number/pointer/etc (not to mention C++ doesn't allow overloading of return types so Get would pose a problem).  Your Get functions can then be made to do automatically coerce data when possible.  Of course, automatic coercion is a source of bugs in most people's hands, so you'd have to weigh the risks/benefits and consider how it'd be used by others.

Arta

#19
Collisions are a concern. I'm not very pleased with my current workaround - which is to create a hash array with vastly more elements than I'll ever need - so I will very likely address this problem. The hashing function I'm using is reputedly quite good, testing it is another thing on my list. I'm currently disallowing keys that would cause collisions, and quitting with a fatal error. This is acceptable only because this is a development version and would obviously never be released or used until that's been changed.

However, most of this is moot. There are currently no dynamically named keys, they are all hard-coded, and this is unlikely to change. The only reason I'm using this system instead of a struct (or similar) is that not all values will be needed or used. Thus, this seems like a more efficient use of memory than to create a struct containing all possible members, most of which will be unused at any one time.

I am already automatically allocating and freeing memory. Since all of the overloaded functions call AddKey(Key, (LPCVOID)&Value, ValueLength), memory is simply allocated according to the value of ValueLength and is freed when the key is removed. The problem you pose with regards to floats and other ambiguous data is easily solved by casting the value to be added so that the appropriate overloaded function will be called:


SetFloat(key, (float)0);


This is also unlikely to be a problem, though. The only types commonly used are bool, DWORD, and char*, with an occasional FILETIME. In the unlikely event that anything else is needed, it can just be added by calling AddKey explicitly.

Adron

Quote from: Arta[vL] on July 07, 2003, 12:43 PM
Collisions are a concern. I'm not very pleased with my current workaround - which is to create a hash array with vastly more elements than I'll ever need - so I will very likely address this problem.

To address that problem... The simplest problem is to store the colliding elements in sequence in the hash table. When you lookup a key, you'll first look in the hash slot for the key, then in the following slots until you find an empty slot. This eventually causes clustering.

Once method #1 isn't efficient enough, you can have a hash function with two inputs to produce the slot for colliding inputs, or use some other method to store them nonsequentially.

Arta

#21
Quote from: Adron on July 07, 2003, 03:47 PM
To address that problem... The simplest problem is to store the colliding elements in sequence in the hash table. When you lookup a key, you'll first look in the hash slot for the key, then in the following slots until you find an empty slot. This eventually causes clustering.

Oh, I'm already doing that. When i say I'm not allowing collisions, I mean of actual key name hashes, not of table indexes. 2 keys that create the same hashtable index will both be stored - 2 keys that create the same hash will be rejected - the reason being that when retrieving keys, it's much faster to compare the hashes than it is to compare the names.

Sorry, didn't explain that very well before.