• Welcome to Valhalla Legends Archive.
 

C++ code optimization

Started by Gnathonic, February 28, 2005, 03:16 PM

Previous topic - Next topic

Gnathonic

I'm using Microsoft VC++ 6.0, and I currently have about 6 hours experiance with C++ total so my code needs a bit of help, but what I have works it just isn't very fast. How would one optimize this code for speed, and reduced arrayiable size, and whatever else can be helped?


#include "filter.h"


Pixel32 Frame[2][1600][1200];
This is the current and past frame storage array.

int     TFrame[2][1600][1200][3];
This is a temp array, used so that seperation of colors is done only once.

int runProc(const FilterActivation *fa, const FilterFunctions *ff);

int tutorialRunProc(const FilterActivation *fa, const FilterFunctions *ff) {
    PixDim w, h;
counter array: w for frame width, h for frame hight.

    Pixel32 *src, *dst;

I don't fully understand this one but
the *src provides the source frame pixel by pixel, and the *dst
provides a way to output the frame, pixel by pixel.

LONGLONG DifAA, DifAB, DifBA;
field combination diference colection arrays.

int i;
a counter array.

DifAA = 0;
DifAB = 0;
DifBA = 0;
resets arrays value to 0. I think?

    src = (Pixel32 *)fa->src.data;
points the src array to the pixels in the source frame
   

    h = fa->src.h;
sets counter array to the frame height

    do {
        w = fa->src.w;
sets counter array to the frame width

        do {
            Frame[0][w][h] = *src++;
stores frame for easy referance, and to referance back to.

TFrame[0][w][h][0] = (Frame[0][w][h] & 0xFF0000) / 0x010000;
TFrame[0][w][h][1] = (Frame[0][w][h] & 0x00FF00) / 0x000100;
TFrame[0][w][h][2] = (Frame[0][w][h] & 0x0000FF);
seperates the color into its components (Red, Green, Blue) and reduces value of red, and green
so that calculations are more accurate and not weighted to some color.

        } while(--w);

        src = (Pixel32 *)((char *)src + fa->src.modulo);
moves the src array to the next row in the frame. I think?

    } while(--h);

dst = (Pixel32 *)fa->dst.data;
set the dst array to output the processed frame, pixel by pixel.

h = fa->src.h;
sets counter array

    do {
        w = fa->src.w;
sets counter array

        do {
for(i=0; i<3; i++) {
DifAA += abs(TFrame[0][w][h][i] - TFrame[0][w][h+1][i]);
DifAB += abs(TFrame[0][w][h][i] - TFrame[1][w][h+1][i]);
DifBA += abs(TFrame[1][w][h][i] - TFrame[0][w][h+1][i]);
compairs the fields from each frame against each other to find the one with the least difference
In order:
current frame against its own fields.
bottom field from the current frame and the top field from the past frame.
top field from the current frame and the bottom field from the past frame.
The "for" statement and counter rotate through the RGB colors that were seperated earlier in the code.

}
} while(--w);
--h;
} while(--h);
from here to

if((DifAA<=DifAB) && (DifAA<=DifBA))
{
h = fa->src.h;
do {
w = fa->src.w;

        do {
        *dst++ = Frame[0][w][h];
} while(--w);

        dst = (Pixel32 *)((char *)dst + fa->dst.modulo);
} while(--h);
}

else if((DifAB<DifAA) && (DifAB<DifBA))
{
h = fa->src.h;
do {
w = fa->src.w;

        do {
        *dst++ = Frame[0][w][h];
} while(--w);

dst = (Pixel32 *)((char *)dst + fa->dst.modulo);
w = fa->src.w;
--h;

do {
        *dst++ = Frame[1][w][h];
} while(--w);

        dst = (Pixel32 *)((char *)dst + fa->dst.modulo);


} while(--h);
}

else
{
h = fa->src.h;
do {
w = fa->src.w;

        do {
        *dst++ = Frame[1][w][h];
} while(--w);

dst = (Pixel32 *)((char *)dst + fa->dst.modulo);
w = fa->src.w;
--h;

do {
        *dst++ = Frame[0][w][h];
} while(--w);

        dst = (Pixel32 *)((char *)dst + fa->dst.modulo);


} while(--h);
}
here is the code for outputing the field combination that can best reduce interlacing artifacts (according to the calcuations used prior).

h = fa->src.h;
    do {
        w = fa->src.w;

        do {
Frame[1][w][h] = Frame[0][w][h];
for(i=0; i<3; i++) {
TFrame[1][w][h][i] = TFrame[0][w][h][i];
}
} while(--w);
} while(--h);

this whole thing moves the values from our current frame to our past frame section of the array.

    return 0;
}

EpicOfTimeWasted

First and foremost I'd edit your post and put [ code ] [ /code ] tags around the code in your post, so it's more readable.

Gnathonic

#2
Sorry about that. I can see what you mean. Is this better?

*edits first post.*

Oh, one request if you post any optimizations could you explain it, I mean at least the code so that when I write more code I can include that kind of optimizations in it?

Arta

You are far better off letting the compiler optimise your code. I don't know where the settings are in VC++ 6, but you can tell the compiler to optimise your code for speed or size, as well as change various other settings. The first thing I would suggest is changing to a release build form a debug build, if you haven't already.

PS: Are those Frame/TFrame arrays always that size? Could they be allocated using new/malloc to reduce your memory consumption?

Kp

IIRC, VC++ 6 "Standard" edition cannot optimize.  Given that he's relatively new to C++, I doubt he's bought a higher version than that.  If so, the options are either to get a higher grade VC++ or switch to MinGW (warning: MinGW is command-line driven, not for newbies or WIMP addicts).
[19:20:23] (BotNet) <[vL]Kp> Any idiot can make a bot with CSB, and many do!

po0f

An int array of 2x1600x1200x3?  IIRC ints are 4 bytes;  4*2*1600*1200*3 = 46080000 bytes / 1024 = 45000MB!!  That might be your speed issue right there.

Adron

45 MB, not GB, but yes, it's not going to be lightning fast whatever he does.

Looking at your code, do you want your output to be an array of integers, not an array of RGB values? That intermediate integer array is going to slow down the code by a huge amount because of memory access latency. It may be very much faster to just do the splitting for each pixel, as you will save on memory accesses and the CPU speed typically is many times higher than the memory speed. Same goes for if you don't *have* to divide each pixel into 3 4-byte values, the less memory you use, the faster it will be.


Gnathonic

Quote from: Arta[vL] on February 28, 2005, 08:57 PM
You are far better off letting the compiler optimise your code. I don't know where the settings are in VC++ 6, but you can tell the compiler to optimise your code for speed or size, as well as change various other settings. The first thing I would suggest is changing to a release build form a debug build, if you haven't already.
I did set the compiler to optimize the code for speed, but it did very little. I also set it to the release build setting, and that didn't impove the speed, at least not to a noticeable extent. It did reduce the program size from 224K to 20K.

Quote from: Arta[vL] on February 28, 2005, 08:57 PM
PS: Are those Frame/TFrame arrays always that size? Could they be allocated using new/malloc to reduce your memory consumption?
Yes I didn't know how to resize them, like I said I'm very new at this language. How does one use the new/malloc comands?

Quote from: Kp on February 28, 2005, 11:25 PM
IIRC, VC++ 6 "Standard" edition cannot optimize.  Given that he's relatively new to C++, I doubt he's bought a higher version than that.  If so, the options are either to get a higher grade VC++ or switch to MinGW (warning: MinGW is command-line driven, not for newbies or WIMP addicts).
I'm using the Enterprize edition. It's inbetween the standard and pro editions of the program.

Quote from: Adron on March 01, 2005, 04:07 AM
Looking at your code, do you want your output to be an array of integers, not an array of RGB values? That intermediate integer array is going to slow down the code by a huge amount because of memory access latency. It may be very much faster to just do the splitting for each pixel, as you will save on memory accesses and the CPU speed typically is many times higher than the memory speed. Same goes for if you don't *have* to divide each pixel into 3 4-byte values, the less memory you use, the faster it will be.
I just want something that is fast I don't really care what arrays it uses just as long as it works, fast. When I was making this program the guide I was using said it was a good idea to minimize the division used, so I made the temp array so that would only be done once, but I will try that when I get back to my PC, it makes sense to me that that might work.

Thanks for you help thus far.

In the mean time is there a way to optimize the code in the difference calc loop as it was segested, so that it runs faster?
EX.

h = fa->src.h;
   do {
      w = fa->src.w;
      do {
         DifAA +=
abs((Frame[0][w][h] & 0xFF0000) - (Frame[0][w][h+1] & 0xFF0000)) / 0x010000 +
abs((Frame[0][w][h] & 0x00FF00) - (Frame[0][w][h+1] & 0x00FF00)) / 0x000100 +
abs((Frame[0][w][h] & 0x0000FF) - (Frame[0][w][h+1] & 0x0000FF));
         DifAB +=
abs((Frame[0][w][h] & 0xFF0000) - (Frame[1][w][h+1] & 0xFF0000)) / 0x010000 +
abs((Frame[0][w][h] & 0x00FF00) - (Frame[1][w][h+1] & 0x00FF00)) / 0x000100 +
abs((Frame[0][w][h] & 0x0000FF) - (Frame[1][w][h+1] & 0x0000FF));
         DifBA +=
abs((Frame[1][w][h] & 0xFF0000) - (Frame[0][w][h+1] & 0xFF0000)) / 0x010000 +
abs((Frame[1][w][h] & 0x00FF00) - (Frame[0][w][h+1] & 0x00FF00)) / 0x000100 +
abs((Frame[1][w][h] & 0x0000FF) - (Frame[0][w][h+1] & 0x0000FF));
   } while(--w);
} while(--h);

Kp

Don't know if your compiler is smart enough to be optimizing that, but yes, I can see a few speedups to the supplied loop.  Replace the statements of the inner do { ... } as follows:

int x = Frame[0][w][h], y = Frame[0][w][h+1];
/* Now refer to x where you would've used Frame[0][w][h], etc.  This hints to the compiler that it's OK to cache these in registers.  You can reassign x and y after your modifications to DifAA are done.  Do the same caching for the other equations.
* Also, use >> 16 instead of / 0x10000, and >>8 instead of /0x100.  The compiler ought to be doing that for you, but no guarantees without looking at the object code.
*/
[19:20:23] (BotNet) <[vL]Kp> Any idiot can make a bot with CSB, and many do!

Gnathonic

#9
I did all the the optimizations, except the last, and the new/malloc which I plan to add soon (15 - 20 minutes from now?), and the program doubled it's speed *yea!*...

But sadly when I looked up the new/malloc commands I became extremely confused, and trying to read about it here didn't help much. Could one of you just give me that code. I think I might be able to get it when I have an example I'm familar with, but then maybe not in which case I'm going to have to deal with a headache while I try to get what they are trying to teach me at those other places.



array name:
Frame

Things with desired array demensions:
fa->src.h
fa->src.w

Kp

int *i = (int*)malloc(sizeof(int) * number_of_ints_needed);
if (!i) die ("Out of memory");
/* You now have an array of integers. */


Alternately, int *i = new int[number_of_ints_needed]; /* If out of memory, new already crashed your program with an exception */
[19:20:23] (BotNet) <[vL]Kp> Any idiot can make a bot with CSB, and many do!

Mephisto

#11
Unless of course you properly handle the allocation exception which is always good to do.

try {
    int *i = new int[number_of_ints_needed];
} catch(bad_alloc &exception) {
    printf("Exception occured when calling operator new: %s", exception.what());
}

Gnathonic

Quote from: Kp on March 02, 2005, 08:20 PM
int *i = (int*)malloc(sizeof(int) * number_of_ints_needed);
if (!i) die ("Out of memory");
/* You now have an array of integers. */


Alternately, int *i = new int[number_of_ints_needed]; /* If out of memory, new already crashed your program with an exception */
I'm looking at this and getting progressively more confused. Although new seams to make more sence than the malloc one. I still can't tell what goes where!!!
When I learned C++ I learned it from looking at some source code and a small, may I enphasize that again, small guide, because of this seeing the above provided code doesn't give me a single clue as to what to do. Would any one who understands that code, and the function of these arrays/calls/whatever:
Frame
fa->src.h
fa->src.w
Pixel32
tell me where they would go?

Kp

Quote from: SoR-Mephisto on March 02, 2005, 09:29 PMUnless of course you properly handle the allocation exception which is always good to do. try {
    int *i = new int[number_of_ints_needed];
} catch(bad_alloc &exception) {
    printf("Exception occured when calling operator new: %s", exception.what());
}

I suppose, although exception handling is generally so slow that I just turn off exception support entirely and use the variant that returns NULL instead of throwing a bad_alloc exception.  Which, btw, you forgot to mention is in namespace std. ;)  Unless he puts in a using std::bad_alloc;, your code won't work due to bad_alloc being unrecognized in the primary namespace.

Gnathonic: although you're considerably better spoken than most newbies we get here, the forum has a long tradition against just giving code because invariably it ends up causing more questions than it answered.  From what you've kept reiterating, it seems to me that the number of ints needed would be the product of fa->src.{h,w}, no?  Frame would be of type Pixel32 and would store the returned results of new.
[19:20:23] (BotNet) <[vL]Kp> Any idiot can make a bot with CSB, and many do!

Gnathonic

O.K. for the sake of working within you peoples "tradition" (yes I had noticed a lack of code responses), which just so happens to work against the way I learned VB, and C++ thus far, but here goes...


Is the first int the int command? array name? (doubt this one) or int array type?
I don't even have a guess what this is *i
the new statement seams simple enough
Is this second int... just see the first list.
The last one makes sence.

Like I said that whole thing was making me confused.