• Welcome to Valhalla Legends Archive.
 

splitting a txt file and setting it to an array

Started by Tontow, April 08, 2004, 12:04 AM

Previous topic - Next topic

Tontow

I know how to open it but im haveing dificulty when i try to set the .txt file  = to an array

(i want to open a text file and split it up every vbCrLf (return) and have it be returned in an array.
text file = array with every line being a different entrie in the array)

Newby

If I understand you correctly:

Input a line of the text file, and add it to the array

Loop until you reach the end of the file.

Perhaps show some coding?
- Newby

Quote[17:32:45] * xar sets mode: -oooooooooo algorithm ban chris cipher newby stdio TehUser tnarongi|away vursed warz
[17:32:54] * xar sets mode: +o newby
[17:32:58] <xar> new rule
[17:33:02] <xar> me and newby rule all

Quote<TehUser> Man, I can't get Xorg to work properly.  This sucks.
<torque> you should probably kill yourself
<TehUser> I think I will.  Thanks, torque.

o.OV

#2
Or you can load the whole file into a temporary string and then use Split.
I don't know which would be best
and I'm not aware of any direct methods.
If the facts don't fit the theory, change the facts. - Albert Einstein

Eli_1

Quote from: Tontow on April 08, 2004, 12:04 AM
I know how to open it but im haveing dificulty when i try to set the .txt file  = to an array

(i want to open a text file and split it up every vbCrLf (return) and have it be returned in an array.
text file = array with every line being a different entrie in the array)

There are 2 ways (that I would use) to do this:

1.) Input the file line by line into an array

dim myArray() as string
redim myArray(0)
open app.path & "/file.bla" for input as #1
do until eof(1)
   input #1, myArray(ubound(myArray))
   redim preserve myarray(ubound(myarray) + 1)
loop

redim preserve myarray(ubound(myarray) - 1)


Or
2.) Use binarry access read to input the whole file, then parse it according to the VbCrLf's with Split

dim myArray() as string, buffer as string
open app.path & "\file.bla" for binary access read as #1
buffer = space$(lof(1))
get #1, , buffer
myarray = split(buffer, vbcrlf)


Both are untested so you may have to tweak them some to get it to work. Hope it helps.

Tontow


Grok

Maybe TheMinistered or Adron knows how a VB arrays is constructed in memory, and if it is possible to get a little trickier.  Perhaps loading the whole file into a string, then altering the string to be an array, without having to redim.  I think that redim preserve is going to cause at least linecount copy operations.

Adron

Quote from: Grok on April 08, 2004, 07:02 AM
Maybe TheMinistered or Adron knows how a VB arrays is constructed in memory, and if it is possible to get a little trickier.  Perhaps loading the whole file into a string, then altering the string to be an array, without having to redim.  I think that redim preserve is going to cause at least linecount copy operations.

A VB array consists of a number of same-size objects laid out sequentially in memory, just like a C array. An array of String is a bit like a C array of "char*". The pointers will be stored at consecutive locations, but the actual text data may be stored anywhere in memory. This means that you can't turn a long string into an array of strings.

In C, you could do something like:


char buffer[] = "String1\nString2\nString3";
char *strings[3];
strings[0] = strtok(buffer, "\n");
strings[1] = strtok(0, "\n");
strings[2] = strtok(0, "\n");


which would give you 3 strings using the same big buffer. You yourself handle the allocation of memory for the strings, and you know that they all share the same buffer. In VB, the compiler handles allocation of memory for strings, and you can't tell it what memory to use.

If you did some magic to make VB use the same memory buffer for all strings, you'd get errors later when VB tried to free the memory used by each string separately.

If VB isn't stupid, it won't reallocate the memory for each string when you redim the array of strings. It will just move the pointers, which will be a rather fast operation. It should be nearly equivalent in speed to the solution in C above. Because there too you need to "redim" the strings array of pointers if you don't know the number of lines beforehand.

In C, you could also turn it into an actual array of strings without doing any more assignments at all, but only if the strings are fixed length. That would look something like this:


char buffer[] = "String1\0String2\0String3";
char (*strings)[8];
strings = (char (*)[8])buffer;


Here you are telling the compiler that "buffer" is actually an N by 8 (N = 3 in this case) matrix of characters. Each line in the matrix is one string. When you're reading the data from the file you have to replace the '\n' at the end of each line by the string-terminator '\0'.


iago

Perhaps it would be faster to scan in the file, count the endlines, and then read it in?  I don't know how the second file operation will compare to the redims, but I DO know that the second time you read the file it'll be faster due to caching.
This'll make an interesting test for broken AV:
QuoteX5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*


Adron

Another possibility would be to have a collection of arrays and add one array at a time, each array larger than the last, then only reallocate it once at the end.

Another possibility would be to check the filesize, guess using some reasonable statistic how many lines there will be, and allocate enough room + some margin for that right away. Then if you hit the limit, you do a new estimate based on the data you've read so far. And at the end, you redim it *down* which should hopefully not involve any copying of data.

Eli_1

Quote from: Eli_1 on April 08, 2004, 12:34 AM
1.) Input the file line by line into an array
<codeblock>
Or
2.) Use binarry access read to input the whole file, then parse it according to the VbCrLf's with Split
<codeblock>
I was bored and I used those two different ways *on my crappy computer* on various different files to see which one was faster. Here's my results (in ms).

On a file with only 56 lines (readme.txt):
   Method with ReDim: 16
   Method with Split   : 11

   Method with ReDim: 17
   Method with Split   : 11

   Method with ReDim: 21
   Method with Split   : 11

On a file with 550 lines (win.ini):
   Method with ReDim: 29
   Method with Split   : 20

   Method with ReDim: 35
   Method with Split   : 10

   Method with ReDim: 33
   Method with Split   : 24

On a file with 2589 lines (list from BrooDat.mpq):
   Method with ReDim: 117
   Method with Split   : 159

   Method with ReDim: 157
   Method with Split   : 181

   Method with ReDim: 172
   Method with Split   : 174

So it seems like the second method is much faster than the first, untill the file size gets pretty big. So the second method would be faster for the average config/shitlist/whatever (on my comp.)

Adron

It's more important to get good timings for a large list though - noone cares about 20 or 50 ms, but when it's 5000 or 10000 people will start caring...

Eli_1

Then in that case the first method would be a better choice.  :-\