• Welcome to Valhalla Legends Archive.
 

Refining code to be packaged with mathematical programming language. Criticize!!

Started by Rule, May 15, 2006, 11:20 PM

Previous topic - Next topic

Rule

Your criticism wanted! 

I just started teaching myself C a few days ago, as I am helping with some of the more mathematical aspects of modifying a programming language (written to express time dependent systems of partial differential equations).  However, I am not very familiar with non object-oriented programming, so I am sure there is a lot to improve in my code. 

Although the C code I'm pasting below *works*, I would appreciate any stylistic advice, any comments on whether it would be better to use a switch statement somewhere, typedefinitions, more pointers, malloc or calloc, to consolidate or separate code segments, etc,  are encouraged!  I'll even listen to comments about spacing, but don't get ridiculous :).  While this tool is trivial, it will be packaged with the programming language, so I want the code to be reasonably professional.  Considering how inexperienced I am, I imagine there is still a lot of work to be done to get this code to that point.

I have questions scattered in comments throughout the code.  Its purpose is to generate a bunch of these "id" parameter files used in convergence testing, and to allow the user to specify various components of the files.  Any help welcome!  For example, is it standard for me to return 1 when there's an error in some cases?  I assumed this because main returns 0 as a convention.

Compiler warnings:

gcc -c -ansi -Wall -pedantic convergence.c
convergence.c: In function `main':
convergence.c:40: warning: int format, long int arg (arg 3)
convergence.c: In function `createId':
convergence.c:147: warning: int format, long int arg (arg 5)
convergence.c:147: warning: int format, long int arg (arg 6)
convergence.c: In function `isInteger':
convergence.c:222: warning: implicit declaration of function `isdigit'


To link:

gcc -lm -o convergence.o conv


Code:  (Also available using warz' pastebin script, at http://www.rafm.org/paste/results.php?id=57)


#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>

/* In C, when would not using the "unsigned" prefix lead to trouble? */

int inputcheck(int argc, char *argv[], int MAND);
void createId(int nId, long start, int factor, int argc, char *argv[]);
void usage();
void searchOptionals(int argc, char *argv[]);
int isInteger(char *check);

/*Flags values go here.  If flags aren't found in argv, these are set to the id0 default as declared in main*/
long lambda;
long sigma; /*Intentionally unused later in code*/

int main(int argc, char *argv[])
{
int numId, factor;
long startId;

extern long lambda;
extern long sigma;

static int MAND = 3; /* Number of mandatory arguments*/

/*id0 defaults for these flag variables*/
lambda = 0;
sigma = 0;


if (inputcheck(argc, argv, MAND) == 1) return 1;    /* return 1 if errors are found */


numId = atoi(argv[1]);   
startId = atol(argv[2]);
factor = atoi(argv[3]); 

printf("%d %d %d\n", numId, startId, factor);

createId(numId, startId, factor, argc, argv);

return 0;

}

/* returns 1 if an error is found, zero otherwise */
int inputcheck(int argc, char *argv[], int MAND)
{
/* register int i;  */
int j;
int nId;   
int strLength;

char argString[6]="\0";

if (argc < 4)
{
fprintf(stderr, "\n\t***All mandatory arguments weren't used***\n");
usage();
return 1;
}

for (j=1; j< argc; j++)
{

strLength = strlen(argv[j]);
if (strLength > 5)
{
fprintf(stderr, "\n\t***Arguments must be less than 6 digits***\n");
return 1;
}

sprintf(argString, argv[j]);   /*Every time this is run, argString clears back to null, which is what I want; *why* does it do this though?*/

if (j<=MAND)
{

if (isInteger(argString) != 1)
{
fprintf(stderr, "\n\t***Mandatory arguments must be integers***\n");
usage();

return 1;
}

}
/* for (i=0; i< strLength; i++)
{
if (isdigit(argString[i]) == 0)
{
fprintf(stderr, "\n\t***Mandatory arguments should be integers***\n");
usage();

return 1;
}

}
*/


}

nId=atoi(argv[1]);

if (nId > 100)
{
fprintf(stderr, "You specified more than 100 id files.\n");
return 1; 
}

return 0;
}


void createId(int nId, long start, int factor, int argc, char *argv[])
{
register int i;
FILE *file_ptr;


char fName[5]="\0";
register long powiter = start;  /*iteration at given id level*/  /*I'm warned if I don't initialize this before the loop, but it seems like a better practice not to!*/
/*I want to use a long, but trying gives me problems*/

if ( (pow( (double) start, (double) factor )) > 99999999 )  fprintf(stderr, "\nYour iteration values eventually exceed the storage capacity of the long variable.  Iter values which are 9 digits or greater should not be trusted.\n");

        system("rm id*");
fprintf(stdout, "All id files in this directory have been erased.\n\n");


for (i=0; i<nId; i++)
{
sprintf(fName, "id%d", i);
file_ptr = fopen(fName, "w+");


if (file_ptr != NULL)
{
fprintf(stdout, "%s created.\n", fName);


/*Checks for flags*/
searchOptionals(argc, argv);

fprintf(file_ptr, "# parameters for wave\n\nlevel := 0\nin_file := \"in%d.sdf\"\nout_file := \"out%d.sdf\"\n\ntag := ""\nlambda := %d\nNx0 := 32\nxmin := 0\nxmax := 10\nser := 0\nfout := 1\niter := %d\noutput := *-*\nepsiter := 1.0e-5\nepsiterid := 1.0\namp := 1.0\ncent := 5.0\nsigma := 0.8\n", i, i, lambda, powiter);


powiter *= factor;

fclose(file_ptr);


}

else { fprintf(stderr, "Unable to create %s\n", fName); exit(1); } /* Proper use of exit here? What does this do, precisely? */

}

fprintf(stdout, "%d id files generated.\n", i);

return;
}

void usage()
{
fprintf(stderr, "Overview: \n");
fprintf(stderr, "Used to generate and modify id files for convergence testing using RNPL.  Erases all id files currently in this directory.\n\n");
fprintf(stderr, "Usage:\n");
fprintf(stderr, "conv [number of id files to generate] [iterations in id0] [factor] (optional args here)\n");
fprintf(stderr, "\nfactor: linearly scales the iterations in each successive id file by this amount\n");
fprintf(stderr, "\nOptional arguments:\n");
fprintf(stderr, "-l [lambda]\n");
fprintf(stderr, "(Code has been written so that it shouldn't be difficult for the user to add more options)\n");
}

void searchOptionals(int argc, char *argv[])
{
register int i;
extern long lambda;
char option[5]="\0";


for (i=4; i<argc; i++)
{
sprintf(option, argv[i]);

if (strstr(option, "-l") != NULL)
{

sprintf(option, argv[i+1]);

if (isInteger(option) != 1)
{
fprintf(stderr,"\n\tOptional arguments should be integers!\n");
}

/*otherwise set lambda flag value*/
lambda = atol(argv[i+1]);

}

/* for example:
* ...   
* if (strstr(option, "-a")) { ... }
*/

}
}



/* returns 1 if it is an integer, 0 otherwise */
int isInteger(char *check)
{
register int i;
int theLength = strlen(check);

for (i=0; i<theLength; i++)
{
if (isdigit(check[i]) == 0) return 0;
}

return 1;
}

Joe[x86]

Quote from: Rule on May 15, 2006, 11:20 PM
Your criticism wanted! Pretend you're a bitter 14 year old on IRC who wants a sense of empowerment :D.

omfg ur stupid

Other than that, I've got nothin'.
Quote from: brew on April 25, 2007, 07:33 PM
that made me feel like a total idiot. this entire thing was useless.

K

These definition inside of main and elsewhere:

     extern long lambda;
     extern long sigma;

isn't necessary.  They're already declared above main(). extern is for functions or variables that will not be resolved until link time, i.e, located in a different object module.


There's no reason to define MAND as static inside of main:

static int MAND = 3;    /* Number of mandatory arguments*/

Perhaps you want MAND to be const or a #define.  static simply makes MAND accessable and 'static' through different calls of main(), and I would guess that you're probably only calling main() once in this program ;).

That's all I have time to look at now -- I'm heading to work.

Rule

Quote from: J on May 16, 2006, 07:34 AM
Quote from: Rule on May 15, 2006, 11:20 PM
Your criticism wanted! Pretend you're a bitter 14 year old on IRC who wants a sense of empowerment :D.

Other than that, I've got nothin'.

Please don't soil my topic then.

Rule

Thanks for looking K :).

Quote from: K on May 16, 2006, 09:53 AM
These definition inside of main and elsewhere:

     extern long lambda;
     extern long sigma;

isn't necessary.  They're already declared above main(). extern is for functions or variables that will not be resolved until link time, i.e, located in a different object module.

ok, I believe you, although to quote the book I learned C from:

Quote from: Mike McGrath
Variables declared externally outside of a function are known as "global" variables.  Each declaration should begin with the extern keyword which denotes that it is for a global variable rather than for a regular local variable.

Also, I'm wondering if it's really appropriate to use field (global) variables here at all?  I'm not sure exactly how they work in C (memory wise, etc).

Would it be much better just to create two pointers for lambda and sigma (in main), and pass them by reference to searchOptionals?


Kp

Include <ctype.h> to get rid of the warning about isdigit.  The format string warnings indicate that you're passing a value declared as long to a format string which wants a plain int.  Use %ld instead of %d to fix this.  (or %lu instead of %u or %li instead of %i, depending on what you're trying to print).

Using system(3) to execute such a simple command is overkill.  Consider doing a fork(2)+execve(2) to run the rm, or just iterate over the files and unlink(2) them yourself (see opendir(3), readdir(3), closedir(3) for how to find what's in the directory).

Use of sprintf is dangerous.  I think you're OK here, but it's better to use snprintf whenever possible.  Note that the Microsoft implementation of snprintf is misnamed (_snprintf) and buggy (it does not nul-terminate the result if it runs out of room).

When printing multiline strings, I prefer to print them in a single call, but take advantage of the compiler's string pasting to make it easier to read.  For example:


fprintf(stderr, "An error has occured.\n"
"The following may help you\n"
"(but it probably will not, since only the author knows what it means)\n"
"error %i\n", errNum);


Using /* ... */ to disable a block of code works, but can behave strangely if you disable documented code.  /* */ does not nest.  Use "#if 0 / #endif" to disable a block which may have comments in it.  Preprocessor directives do nest. :)

You receive a warning about powiter because it's never initialized in the loop.

There's no hard convention for return values, but there are a few categories:
0 on success, nonzero on error (and the good programmers return different values for different errors; bad programmers just always return 1)
negative on error, nonnegative on success.
0 on failure, nonzero on success

Pick a standard and stick with it as best you can.  The third one is annoying to deal with, but both the others work fine as long as you document it.  For more complex programs, you may wish to use an enum {} of return codes, so that there's a central listing of them and a way to easily add new ones without stepping on existing ones.

That's all for now.  I might add more later if I spot more subtle items. :)
[19:20:23] (BotNet) <[vL]Kp> Any idiot can make a bot with CSB, and many do!

K

Quote from: Mike McGrath
Variables declared externally outside of a function are known as "global" variables.  Each declaration should begin with the extern keyword which denotes that it is for a global variable rather than for a regular local variable.
I guess this is correct, but in practice it's hardly ever used.  If you declare your variables at the top of the file, which is good practice for global variables, you can use them inside of following functions without the "extern int..." statements.  Generally, you'll only use "extern int ..." to pull in variables declared in other files, but you could use it like so:


int main(int argc, char* argv[])
{
   extern int foo;
   foo = 3;
}
// foo is declared after main, so main must "pull it in" by declaring an extern variable.
int foo;


but it's much more common to just write

int foo;

int main(int argc, char* argv[])
{
   // foo is defined.
   foo = 3;
}


Rule


Kp

I just noticed one other thing.  You're using sprintf to move strings from argv to temporaries.  This is extremely dangerous, since sprintf interprets the second parameter as a format string.  If the user typed %s in one of the argv elements, sprintf would attempt to read the third argument (which isn't present) as a string pointer.  A crash is very likely in such a case.  You probably want strcpy(3) or strncpy(3) for moving strings.  strcpy(3) doesn't check that there is sufficient room for the destination, so it should be used only when you've already handled that.  For instance,
unsigned j = strlen(foo);
char bar[128];
/*
* This is an idiom for checking the length of the buffer.  It's overkill in
* this case since char has size 1 on every platform I've ever seen, but it's
* good to be cautious.  The right hand side evaluates to (number of bytes
* consumed by bar) / (number of bytes consumed by an element of bar), which
* is equivalent to (number of elements of bar).
*/
if (j >= sizeof(bar) / sizeof(bar[0]))
{
/* Arbitrary function to complain to the user that bad things happened */
panic("String too long.");
return;
}
strcpy(bar, foo);

This is OK, since the strcpy(3) is avoided if the string in foo is too long.  If the if(...) were removed, a buffer overflow vulnerability would be present.  strncpy(3) allows you to specify how long the output buffer is, but it's behavior is a bit odd.  Be sure to understand how it handles boundary conditions before using it.  Where possible, prefer strlcpy over strncpy(3).  The behavior of strlcpy is more useful than strncpy(3), but strlcpy is not part of the standard C library.

You're using strstr to check for options passed to the program.  Although this works, it looks a bit odd.  Based on the code structure, it looks like you wanted the program to be invoked as "./prog -l 1 -l 2[...]".  If so, you can use !strcmp(argv[j], "-l") instead of strstr(option, "-l") != NULL.

Using the register modifier on a variable is allowed, but usually is not needed.  Modern compilers are quite good at identifying the best choices for variables to place in a register.  Some of them will ignore you when you specify 'register' choices which they disagree with; others will honor your settings and generate sub-optimal code.

exit(3) is OK to use where you have it, but most large programs have strict rules about where calls to exit(3) are permitted.  People don't like it when a library routine just quits the program out from under them due to some minor error (like GTK+ and GMP are prone to doing with memory allocation...).  For a program of your size and complexity, it's fine.  When you start doing large projects, it's considered polite to return an error code and let the caller decide whether to quit or continue.

Calling exit(3) allows the C library to perform a graceful shutdown.  Buffered data is written to the OS and any user-registered shutdown code is called.  File descriptors are closed by the OS if the program does not.  Similarly, the OS frees all memory for the process when it terminates.

Stating return; at the bottom of a void function is legal, but unnecessary.  Falling off the end of the function is equivalent to hitting return;

I didn't mention it in my prior post, but there's a purpose behind the numbers I put next to some terms.  Man pages are divided into different sections by purpose.  I often follow man page conventions in my posts, and put the manpage section in parentheses after the function name.
[19:20:23] (BotNet) <[vL]Kp> Any idiot can make a bot with CSB, and many do!

MyndFyre

Quote from: Kp on May 17, 2006, 06:47 PM
exit(3) is OK to use where you have it, but most large programs have strict rules about where calls to exit(3) are permitted.  People don't like it when a library routine just quits the program out from under them due to some minor error (like GTK+ and GMP are prone to doing with memory allocation...).  For a program of your size and complexity, it's fine.  When you start doing large projects, it's considered polite to return an error code and let the caller decide whether to quit or continue.
To follow Kp's suggestion, as I'm not sure how much C++-based programming you've done, I thought I'd remind you of the by-reference parameter option.  For example, if you had the routine:

HENGINE getEngineHandle(int engineType)
{
  if (engine_list == NULL)
    exit(-1);

  return engine_list[engineType];
}

I realize that's a really simplified example, but you can return an error in an efficient way -- as Kp said -- through a return value.  Note that I call the error return type HRESULT, which is what I'm most familiar with (it's common in Microsoft products, but you'll see it pop up elsewhere too). 

HRESULT getEngineHandle(int engineType, HENGINE* lpEngineHandle)
{
  if (engine_list == NULL)
    return E_FAIL;

  *lpEngineHandle = engine_list[engineType];
  return E_OK;
}


This has several benefits:
1.) You can #define any number of error terms in your headers.
2.) Error values are returned for easy checking in if() conditionals, such as if (getEngineHandle(ENGINE_4CYL, lpHEngine) == E_OK).
3.) This is already a common design pattern.

There are several other things to note.  You can use this with double indirection using char arrays:

HRESULT getName(int maxLen, char** lppTextOut);

Usage of this prototype would be where getName() allocates its own buffer and returns it to the callee.

You can also do this with references instead of pointers, and then you don't need to do any dereferencing:

HRESULT getEngineHandle(int engineType, HENGINE& engineHandle)
{
  if (engine_list == NULL)
    return E_FAIL;

  lpEngineHandle = engine_list[engineType]; // note there is no dereference operator
  return E_OK;
}


It is also valid to mix and match pointers and references.  Just make sure you're doing it in the right order.
HRESULT GetName(int maxLen, char* &textOut);
This is my preferred treatment of pointer types.  However, you need to remember the right order.  You definitely don't want char&* textOut -- in fact, I'm pretty sure that's not valid syntax.  The easiest way to remember it is that a pointer is a data type, and a reference is not.  A reference is simply an alias of another variable.  So, the pointer modifier belongs with the data type -- in this case char -- because the variable is actually a char*.  Adding the reference operator simply means that the identifier is an alias for another identifier somewhere else.

Hope that all made sense!
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

Rule

Quote from: Kp on May 17, 2006, 06:47 PM
You're using strstr to check for options passed to the program.  Although this works, it looks a bit odd.  Based on the code structure, it looks like you wanted the program to be invoked as "./prog -l 1 -l 2[...]".  If so, you can use !strcmp(argv[j], "-l") instead of strstr(option, "-l") != NULL.

I wrote the program so that the user would have to enter three mandatory arguments in a particular order, to emphasize that the "id parameter" files are used for convergence testing.  Aside from that, I was planning on allowing other valid arguments to be entered in any order desired, which is why I used strstr. !strcmp with a loop would work as well, but may be equal or less than equal in efficiency to strstr if there are a large number of flag variables used in a random order? At least, strstr seemed to be a more obvious choice in that scenario.

Thanks for the extra comments, Kp and Myndfyre.  They have certainly accelerated my progress.  In response to Myndfyre, it can safely be assumed that my knowledge in either C or C++ is not extensive in any particular direction.

Joe[x86]

Quote from: Kp on May 17, 2006, 06:47 PM
It's overkill in this case since char has size 1 on every platform I've ever seen, but it's good to be cautious.

I believe char is two bytes in Java, in order to hold unicode characters. byte (which you may have been referring to) is still eight bits.

Quote from: brew on April 25, 2007, 07:33 PM
that made me feel like a total idiot. this entire thing was useless.

MyndFyre

Quote from: J on May 19, 2006, 07:24 AM
Quote from: Kp on May 17, 2006, 06:47 PM
It's overkill in this case since char has size 1 on every platform I've ever seen, but it's good to be cautious.

I believe char is two bytes in Java, in order to hold unicode characters. byte (which you may have been referring to) is still eight bits.
I'm fairly sure he didn't mean other languages, but actual architectures that C/C++ runs on.  char is always (more or less) 8 bits because C/C++ doesn't have any other data types that small.
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

Kp

Quote from: Rule on May 19, 2006, 02:41 AM
I wrote the program so that the user would have to enter three mandatory arguments in a particular order, to emphasize that the "id parameter" files are used for convergence testing.  Aside from that, I was planning on allowing other valid arguments to be entered in any order desired, which is why I used strstr. !strcmp with a loop would work as well, but may be equal or less than equal in efficiency to strstr if there are a large number of flag variables used in a random order? At least, strstr seemed to be a more obvious choice in that scenario.

I don't think that strstr helps you any here.  Users can still put the arguments in any order they want.  Using strcmp just means that you must write "-a -b -c" instead of "-a-b-c".  I've never seen a program which required dashes, but did not require spaces.  It is, however, common to see programs which would allow you to pass "-abc" or even just "abc" if you wanted to turn on the effects of flags a, b, and c.  tar(1) and jar(Sun) are examples of programs which allow you to abbreviate in the latter way.  It's equally legal to do tar -c -z -v -f - `ls -1` as it is to do tar czvf - `ls -1`.  The latter does not play nicely with shell completions, though. :)

Quote from: MyndFyre[vL] on May 19, 2006, 11:21 AM
Quote from: J on May 19, 2006, 07:24 AMI believe char is two bytes in Java, in order to hold unicode characters. byte (which you may have been referring to) is still eight bits.
I'm fairly sure he didn't mean other languages, but actual architectures that C/C++ runs on.  char is always (more or less) 8 bits because C/C++ doesn't have any other data types that small.

Yes, that's exactly what I meant. :)
[19:20:23] (BotNet) <[vL]Kp> Any idiot can make a bot with CSB, and many do!

Rule

Quote from: Kp on May 17, 2006, 06:47 PM
I just noticed one other thing.  You're using sprintf to move strings from argv to temporaries.  This is extremely dangerous, since sprintf interprets the second parameter as a format string.  If the user typed %s in one of the argv elements, sprintf would attempt to read the third argument (which isn't present) as a string pointer.  A crash is very likely in such a case.  You probably want strcpy(3) or strncpy(3) for moving strings.  strcpy(3) doesn't check that there is sufficient room for the destination, so it should be used only when you've already handled that. 

This seems like a nice fix: sprintf(movehere, "%s", argv[desired element]);

It seems to me like sprintf and the strcpy functions are somewhat redundant.  Are there obvious advantages to using one over the other in particular situations?