CRC checking executable's running code

tA-Kane · July 24, 2005, 11:29 PM

Suppose I write a program which needs to be able to check itself for any unauthorized modifications made to it (whether in the executable's file or added after it's been launched). With the obvious problems aside (self-modifying code, storing data within executable space, etc), I'll need to be able to get the program to find the boundaries of its own executable memory space and CRC check it. Correct me if I'm wrong, but any modifications to the executable file would *most likely* also show up in the program's executable code (and if not, then there are other safeguards against data section tampering), would they not?

So with that in mind, how might I go about getting (eg, what API calls) the program's executable code memory boundaries? Are there any things to consider when accessing such memory without actually executing it? This is, of course, assuming that the CRC code will be within those boundaries and will of course include itself in the checksumming process.

Another thing to think about is when using this in connection with a server verification scheme ... it is possible for a modified executable to always send the correct checksum by either modifying the code the sends the checksum to the server, modifying the checksum algorithm to simply immediately return the correct value, or etc. What would be a feasible method of adding some randomness to the code which would make the checksum almost never be a static value? Perhaps have the server send a random value or maybe some code to inject at various places within the CRC algorithm, which could alter the result but yet will not make the algorithm unstable?

tA-Kane · July 27, 2005, 02:27 PM

For anyone interested, I seem to have created a seemingly-working function to do what I need:

Code Select


void CheckExecutables(unsigned char Sum[16])
{
	SIZE_T						Length;
	char *						Current, i;
	MEMORY_BASIC_INFORMATION	Info;
	MD5_CTX						MD5;
	HANDLE						Instance;
	Instance = GetCurrentProcess();	// must use GetCurrentProcess() ... the one from WinMain() doesn't have sufficient access privs
	
	if (!Sum)
		return;

	MD5Init(&MD5);
	for (Current = (char *)sInfo.lpMinimumApplicationAddress; Current < sInfo.lpMaximumApplicationAddress; )
	{
		Length = VirtualQueryEx(Instance, Current, &Info, sizeof(Info));
		if (Length)
		{
			// wasn't a kernel-mode memory address
			if ((Info.State & MEM_COMMIT) && !(Info.State & MEM_RESERVE))
			{
				// is an accessable allocated region
				if (Info.Protect & (PAGE_EXECUTE | PAGE_EXECUTE_READ | PAGE_EXECUTE_READWRITE | PAGE_EXECUTE_WRITECOPY))
				{
					// is an executable region
					// let's MD5 it!
					MD5Update(&MD5, (unsigned char *)Info.AllocationBase, (unsigned int)Info.RegionSize);
				}
			}
			Current += Info.RegionSize;
		}
		else
		{
			Current += sInfo.dwPageSize;
		}
	}
	MD5Final(Sum, &MD5);
}

Would anyone care to critique?

Adron · July 27, 2005, 05:45 PM

You'll be checksumming all the loaded dlls, as well as possibly some data, depending on the architecture. Your checksum will give different results from time to time.

tA-Kane · July 28, 2005, 02:14 AM

Yes, I've noticed. I've found *this* code to be more reliable:

Code Select


void CheckExecutables(unsigned char Sum[16])
{
	SIZE_T						Length;
	char 						*Current;
	MEMORY_BASIC_INFORMATION	Info;
	MD5_CTX						MD5;
	HANDLE						Instance;
	SYSTEM_INFO					sInfo;
	Instance = GetCurrentProcess();	// must use GetCurrentProcess() ... the one from WinMain() doesn't have sufficient access privs
	
	GetSystemInfo(&sInfo);
	Sum = NewSum;
	
	MD5Init(&MD5);
	for (Current = (char *)sInfo.lpMinimumApplicationAddress; Current < sInfo.lpMaximumApplicationAddress && Current < (char *)0x40000000; )
	{
		Length = VirtualQueryEx(Instance, Current, &Info, sizeof(Info));
		if (Length)
		{
			// wasn't a kernel-mode memory address
			if ((Info.State & MEM_COMMIT) && !(Info.State & MEM_RESERVE))
			{
				// is an accessable allocated region
				if (Info.Protect & (PAGE_EXECUTE | PAGE_EXECUTE_READ | PAGE_EXECUTE_READWRITE | PAGE_EXECUTE_WRITECOPY) && (Info.Type & MEM_IMAGE))
				{
					// is an executable region
					// let's MD5 it!
					MD5Update(&MD5, (unsigned char *)Info.AllocationBase, (unsigned int)Info.RegionSize);
				}
			}
			Current += Info.RegionSize;
		}
		else
		{
			Current += sInfo.dwPageSize;
		}
	}
	MD5Final(Sum, &MD5);
}

Note that the two differences are the check to make sure the memory is below 0x40000000 (the 2GB limit, where application memory and system DLL memory is differentiated, if I'm not mistaken), and the check to make sure that the region is of type MEM_IMAGE. While I doubt that only checking MEM_IMAGE-type regions will result in checking all code regions, it does seem to eliminate the number of changes to the code during normal operation. Loading more DLLs seems to change the sum, which is partly what I want. I still need to verify that the sum changes if the "user" alters runtime code, but I'm confident that it will, since the code should reside in a MEM_IMAGE-type region, should it not? At least, it has in the sample project I've whipped up as well as the two other projects I temporarily added this to.

Of course, the checksum would change if the user loads a different (perhaps older or newer) version of user DLLs (eg, ones loaded via LoadLibrary() and such), correct?

Edit:
I do have one other odd question though; do you know if the MEMORY_BASIC_INFORMATION.Type property's possible values (MEM_IMAGE, MEM_MAPPED, MEM_PRIVATE) are mutually exclusive? How about the .State property (same question)?

tA-Kane · July 28, 2005, 07:08 AM

I have been working with another user to create an algorithm which is less prone to checksumming data sections. I've been able to come up with this code, with his assistance:

Code Select


bool CheckExecutables2(unsigned char Sum[16])
{
	// need to enumerate the loaded libraries
	// then, will need to sort the libraries' names alphabetically to ensure that they are always added to the checksum in the same order
	// note that you cannot sort libraries by full path, because path could change if the dll was loaded from one place instead of another place
	// checksum the modules, in sorted order... since the executable is returned within the modules, it will also be checksummed  :)
	HANDLE					Process;
	BOOL					Success;
	DWORD					Length, ModuleCount, Result, i, LastLoc, SectionCount;
	HMODULE					*UnsortedModules = NULL;
	HMODULE					*Modules = NULL;			// modules, sorted
	PIMAGE_DOS_HEADER		pDosHeader;
	PIMAGE_NT_HEADERS		pNTHeader;
	PIMAGE_SECTION_HEADER	pSectionHeader;
	char					**ModuleFilenames = NULL;	// pointers to module filenames, unsorted
	char					Current[MAX_PATH], Last[MAX_PATH];			// pointers to filenames (eg, within the pathnames) for sorting
	bool					rVal;
	MD5_CTX					MD5;
	
	Process = GetCurrentProcess();
	
	Success = EnumProcessModules(Process, NULL, 0, &Length);
	if (!Success)
		return false;	// unable to get module count... ewwwwwwwwww!!!
enummodules:
	ModuleCount = Length / sizeof(HMODULE);
	try {
		UnsortedModules = new HMODULE[ModuleCount];
	} catch (std::bad_alloc) {
		UnsortedModules = NULL;
	} if (!UnsortedModules) {
		return false;	// unable to allocate HMODULE array
	}
	
	Success = EnumProcessModules(Process, UnsortedModules, ModuleCount*sizeof(HMODULE), &Length);
	if (!Success)
	{
		// unable to enumerate modules! eww!!
		rVal = false;
		goto cleanup;
	}
	if (Length != (ModuleCount * sizeof(HMODULE)))
	{
		// oh VERY funny... loaded or unloaded a module after getting module count... BAH!
		delete[] UnsortedModules;
		goto enummodules;	// try again
	}
	try {
		ModuleFilenames = new char*[ModuleCount];
	} catch (std::bad_alloc) {
		ModuleFilenames = NULL;
	} if (!ModuleFilenames) {
		rVal = false;
		goto cleanup;
	}
	memset(ModuleFilenames, 0, ModuleCount*sizeof(char**));
	Result = FillModuleFilenames(Process, ModuleFilenames, UnsortedModules, ModuleCount);
	if (Result == (DWORD)-1) {
		// bad param... eww
		rVal = false;
		goto cleanup;
	}
	
	try {
		Modules = new HMODULE[ModuleCount];
	} catch (std::bad_alloc) {
		Modules = NULL;
	} if (!Modules) {
		// ugh...
		rVal = false;
		goto cleanup;
	}
	// module handles are allocated, module paths are allocated and retrieved, now need to sort plzkthx
	for (Result = 0; Result < ModuleCount; Result++) {
		for (i = Last[0] = 0, LastLoc = (DWORD)-1; i < ModuleCount; i++) {
			if (!ModuleFilenames[i])
				continue;
			strcpy(Current, ModuleFilenames[i]);
			PathStripPath(Current);
			if (strcasecmp(Current, Last) > 0) {
				strcpy(Last, Current);
				LastLoc = i;
			}
		}
		if (LastLoc != (DWORD)-1) {
			Modules[Result] = UnsortedModules[LastLoc];
			delete[] ModuleFilenames[LastLoc];	// need to delete and set to NULL to make sure that we don't check it again
			ModuleFilenames[LastLoc] = NULL;
		} else {
			Modules[Result] = NULL;		// set to invalid
		}
	}
	// now need to checksum the modules' code sections
	MD5Init(&MD5);
	for (i = 0; i < ModuleCount; i++) {
		if (!Modules[i])	// no filename? bleh... gonna have to skip it... should possibly return error status
			continue;
		pDosHeader = (PIMAGE_DOS_HEADER)Modules[i];
		pNTHeader = (PIMAGE_NT_HEADERS)(pDosHeader->e_lfanew + (char *)pDosHeader);
		SectionCount = pNTHeader->FileHeader.NumberOfSections;
		pSectionHeader = IMAGE_FIRST_SECTION(pNTHeader);
		for (Length = 0; Length < SectionCount; Length++, pSectionHeader++) {
			if (pSectionHeader->Characteristics & (IMAGE_SCN_CNT_CODE | IMAGE_SCN_MEM_EXECUTE))
				MD5Update(&MD5, (unsigned char *)((DWORD)Modules[i] + pSectionHeader->VirtualAddress), pSectionHeader->Misc.VirtualSize);
		}
	}
	MD5Final(Sum, &MD5);
	
	rVal = true;
cleanup:
	if (UnsortedModules)
		delete[] UnsortedModules;
	if (ModuleFilenames) {
		for (Length = 0; Length < ModuleCount; Length++)
			if (ModuleFilenames[Length])
				delete[] ModuleFilenames[Length];
		delete[] ModuleFilenames;
	}
	if (Modules)
		delete[] Modules;
	return rVal;
}

DWORD FillModuleFilenames(HANDLE Process, char **ModuleFilenames, HMODULE *Modules, DWORD ModuleCount)
{
	if (!ModuleFilenames || !Modules)
		return (DWORD)-1;
	
	DWORD	i, bad, Result;
	for (i = bad = 0; i < ModuleCount; i++) {
		if (ModuleFilenames[i])
			delete[] ModuleFilenames[i];
		try {
			ModuleFilenames[i] = new char[MAX_PATH];
		} catch (std::bad_alloc) {
			ModuleFilenames[i] = NULL;
		} if (!ModuleFilenames[i]) {
			// eww, unable to allocate!
			bad++;
			continue;
		}
		Result = GetModuleFileNameEx(Process, Modules[i], ModuleFilenames[i], MAX_PATH);
		if (!Result) {
			delete[] ModuleFilenames[i];
			ModuleFilenames[i] = NULL;
			bad++;
		}
	}
	i -= bad;
	if (i < 0)
		i = 0;
	return i;
}

Quite a fair bit longer in both source code and execution time. Definitely needs to be cleaned up, especially in the area of sorting the module names. It does provide a different sum than the previous algorithm, however that's rather understandable since it's almost to be expected; the principals are slightly different... the previous algorithm would checksum *all* executable memory address ranges, while this one should only checksum the memory address ranges loaded from the file (eg, if the program allocates another memory range and copies executable code to that memory range, then that memory range will not be checksummed). With this in mind, however, the previous checksumming function could (and would) mistake some purely data sections for executable sections, would it not?

Edit:
While the checksum appears to stay the same for the duration of a program's single run, it does not appear to be the same between runs, so either there's a bug in here somewhere or this function is *not* what I need.

Edit2:
I seem to have fixed the problem I was having. It seems that I was adding the incorrect base address for the start of each memory region to be checksummed, as well as only checksumming the first section, whether or not it was executable, instead of checksumming all executable sections. Now the checksum is the same between instances of the program.

I think this will do nicely.

Adron · July 28, 2005, 10:31 AM

What exactly are you trying to accomplish?

This might work sometimes to verify that it's the same user running the exe, and that no dlls have been changed. It will require updates often.

The sum will change every month the 15th when Windows Update posts new dlls. It will change every time the user installs an application with global hooks, say "Comet Cursor" (loads a dll into the address space of every gui application). And on system with multiple dlls with the same base address, it will give a different checksum every time the program is run.

tA-Kane · July 28, 2005, 02:12 PM

The algorithm must checksum all code related to the application, eg, code derived from the application's source code, as well as any specified DLLs' codes (which the second algorithm I provided should be able to do with some more tweaks), and of course, to help prevent tampering of the checksum-enabled (or -disabled) DLLs, checksum that list as well.

The checksum must be the same in all cases (eg, from day one to year million, theoretically), assuming that the code has not been tampered with nor upgraded to a newer version.

It'd be helpful if the algorithm checksummed the static data as well (strings, etc).

Would there be any way to prevent "Comet Cursor" from being installed to the application? If not, then how could you expect the algorithm to know the difference between user-friendly "Comet Cursor" and a similar-by-design hacker-friendly program/utility? In either case, it's probably best to not include such thing into the checksum to keep the overall feel of the application the same as the overall feel of the system.

Edit:
oops, was still writing this and accidentally pushed post... oh well

Adron · July 28, 2005, 04:59 PM

Well, you are expecting to have one checksum for each user of your application, not the same checksum for everyone?

Particularly, as soon as you include Windows' dlls in your checksum, you have to expect the checksum to change monthly.

And no, there's no easy way to know the difference between a hacker's dll and some other dll installed by a mouse or joystick or toolbar or similar.

And no, checking each dll in your address space won't find all code that has been injected.

MyndFyre · July 28, 2005, 05:10 PM

Quote from: tA-Kane on July 28, 2005, 02:14 AM
Note that the two differences are the check to make sure the memory is below 0x40000000 (the 2GB limit, where application memory and system DLL memory is differentiated, if I'm not mistaken),

I'm not really qualified to respond to the rest of your discussion, but I wanted to point out that 2gb is located from 0x00000000 to 0x7fffffff. Memory beyond 2gb is 0x80000000 to 0xffffffff.

tA-Kane · July 29, 2005, 06:17 AM

The checksum needs to the be the same for everyone. And like I said, it wouldn't be hard to add a check in the second algorithm to only checksum my own DLLs, instead of system DLLs.

You are right, MyndFyre, now that I think about it. But the thing is, whenever I look at the memory map of a program in a debugger, the system DLLs are located from 0x40000000 to 0x7FFFFFFF. I just saw 0x4xxxxxxx and figured 2-4GB, without doing calculations.

In any case, I think the second algorithm will work better because it can better identify the module name associated with a given memory region, and thus, better filter out the system DLLs.

Adron · July 29, 2005, 01:02 PM

If the checksum needs to be the same for everyone, then yes, you can only checksum your own dlls. You should also be careful only to checksum your actual code. You wouldn't for example want to checksum the import addresses.

And now given that you don't checksum the system dlls, it would be even easier to load evil_hacker.dll into your process

tA-Kane · July 29, 2005, 01:22 PM

Of course. But without guaranteeing that everyone patches *all* system DLLs the moment they're available (or, by the time that the next update is available), then it'd be an endless cycle with nearly endless possible "valid" checksums.

Adron · July 29, 2005, 01:25 PM

Yes, you have virtually endless possible checksums. Fun, eh?

TheMinistered · September 16, 2005, 02:15 PM

I guess I'll try to actually be helpful to you. A crc check can be used to detect modifications to the executables, but is rather weak because it's easily defeated via patching your cmp. I suggest looking into more complex protections such as symmetric code encryption/decryption, this schema can further be used to provide "leak protection" as the key can include specific computer specs.

tA-Kane · September 16, 2005, 11:25 PM

A CRC check is also quite a bit simpler than inline function encryption/decryption, in my opinion. If you know of a rather simple (and free) method of doing such with varying start and end encrypted regions and varying degrees of encryption levels (eg, whether using different keys, different length keys, or even different algorithms for different regions) and still maintains a decent runtime speed to the end-user on very old machines, then please be my guest and point me in the right direction. Searching google for this kind of information would take a lot of time and effort in the best case, especially since I doubt what I specifically want already exists and is free.

On a side note... if I were ever to get my own domain, these kinds of things would be an interesting addition to an advanced-level programming section I could create. I should get off my ass and make one.