• Welcome to Valhalla Legends Archive.
 

instr for byte array

Started by o.OV, February 12, 2004, 01:17 AM

Previous topic - Next topic

o.OV



Option Explicit

Private Declare Sub MemCopy Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal numbytes As Long)

Private Sub Form_Load()

   Unload Me
   
   Dim fakeArr(1 To 6) As Byte, testArr(1 To 1200) As Byte
   Dim fakeStr As String, testStr As String
   Dim fakeLen As Long, testLen As Long
   Dim Position As Long
   
   'populate the strings
   
   fakeStr = "<TEST>"
   testStr = Space$(1200)
   For Position = 0 To 198
       Mid$(testStr, Position * 6 + 1, 6) = fakeStr
   Next Position
   fakeStr = "<FAKE>"
   Mid$(testStr, Position * 6 + 1, 6) = fakeStr
   
   'copy each string into its own byte array
   
   testLen = Len(testStr)
   fakeLen = Len(fakeStr)
   MemCopy testArr(1), ByVal testStr, testLen
   MemCopy fakeArr(1), ByVal fakeStr, fakeLen
   
   'here is the main code _
       the objective is to get the Position of fakeArr() in testArr()
   
   MemCopy ByVal testStr, testArr(1), testLen
   MemCopy ByVal fakeStr, fakeArr(1), fakeLen
   Position = InStr(testStr, fakeStr)
   MsgBox "Position = " & Position
   
   'this is already fast .. but is there a faster way? _
       I hate the idea of converting it _
       to a vbString just to get a Position ..
   
End Sub



I ran a few tests and found that when working with one hundred bytes or less, a for next loop would perform better then Instr.

But I'm still looking for alternatives for cases that have more then one hundred bytes.
If the facts don't fit the theory, change the facts. - Albert Einstein

Stealth

#1
Here is the code, along with a few odd subs I built to test its implementation. Hope that helps.


Option Explicit

Private Sub Form_Load()
   Dim ary() As Byte
   Dim test() As Byte
   
   Call BuildByteArray(test(), "!")
   Call BuildByteArray(ary(), "This is most definitely a test, like never before!")
   
   Debug.Print "ary():  " & MergeArray(ary())
   Debug.Print "test(): " & MergeArray(test())
   Debug.Print "return: " & InStrByteArray(ary(), test())
End Sub

'// Returns the start index of an array B within a source array A.
'//  - If the search array is not present, the function will return -1.
'// Has not been tested on byte arrays <= 1 member in length
Function InStrByteArray(ByRef bytAry() As Byte, ByRef toFind() As Byte) As Long
   
   On Error GoTo InstrByteArrayError

   Dim i       As Long
   Dim Ret     As Long
   Dim FlagA   As Long '// positional flag in array
   Dim FlagF   As Long '// positional flag in toFind
   Dim Start   As Long '// start value
   Dim Current As Long
   
   Ret = -1 '// catch nonexistent strings
   Start = toFind(LBound(toFind))
   
   For i = LBound(bytAry) To UBound(bytAry)
       If bytAry(i) = Start Then
           '// found possible find-array start point
           
           If UBound(toFind) >= LBound(toFind) + 1 Then    '// catching 1-length B strings
               FlagF = LBound(toFind) + 1 '// skip the first char
               Current = toFind(FlagF)
           Else
               Ret = i
               GoTo InstrByteArrayExit
           End If
           
           FlagA = i + 1
           
           Do While ((bytAry(FlagA) = Current) And (FlagF < UBound(toFind)))
               If FlagA <= UBound(bytAry) Then
                   FlagA = FlagA + 1
                   FlagF = FlagF + 1
                   Current = toFind(FlagF)
               Else
                   'end of the byte array
                   GoTo InstrByteArrayExit
               End If
           Loop
           
           If (FlagF = UBound(toFind)) Then '// we found the whole array!
               Ret = i
               GoTo InstrByteArrayExit
           End If
       End If
   Next i
   
InstrByteArrayExit:
   InStrByteArray = Ret
   Exit Function
   
InstrByteArrayError:
   Debug.Print Err.Description
   GoTo InstrByteArrayExit
End Function

Sub BuildByteArray(ByRef arr() As Byte, ByVal inpt As String)
   Dim i As Integer
   
   ReDim arr(Len(inpt) - 1)
   
   For i = 1 To Len(inpt)
       arr(i - 1) = Asc(Mid$(inpt, i, 1))
   Next i
End Sub

Function MergeArray(ByRef arr() As Byte) As String

   Dim buf As String
   Dim i As Long
   
   buf = String(UBound(arr) + 1, vbNullChar)
   
   For i = 0 To UBound(arr)
       Mid$(buf, i + 1, 1) = Chr(arr(i))
   Next i
   
   MergeArray = Trim(buf)
   
End Function
- Stealth
Author of StealthBot

o.OV

#2
Wow your code is um.. Long

Thanks for trying anyways  :-\

The code below is my version that I used to test the 100 byte theory.. in the example its more bytes but the results still shows.

I'm gonna go examine your coding now and see if I can learn anything.



Option Explicit

Private Declare Sub MemCopy Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal numbytes As Long)
Private Declare Function GetTickCount Lib "kernel32.dll" () As Long
Dim byteArr(1 To 1200) As Byte, tempStr As String, fakeStr As String, T As Long, X As Long, XX As Long, XXX As Long, Length As Long, TempLen As Long
Dim fakeArr(1 To 6) As Byte, XXXX As Long, Position  As Long, tLen As Long
Private Sub Form_Load()
   
   Unload Me
       
   fakeStr = "<TEST>"
   tempStr = Space$(1200)
   
   For X = 0 To 198
       Mid$(tempStr, X * 6 + 1, 6) = fakeStr
   Next X
   fakeStr = "<FAKE>"
   Length = Len(tempStr)
   TempLen = Len(fakeStr)
   
   'toggle these two numbers for testing purposes
       'X = 20             '20 * 6 = 120 bytes
       X = 60             '60 * 6 = 360 bytes
   
   Mid$(tempStr, X * TempLen + 1, TempLen) = fakeStr
   
   'copy the each string into its own byte array
   
   MemCopy byteArr(1), ByVal tempStr, Length
   MemCopy fakeArr(1), ByVal fakeStr, TempLen
   
   'the objective is to get the position of fakeArr() in byteArr()
   T = GetTickCount
   For X = 1 To 1000
       MemCopy ByVal tempStr, byteArr(1), Length
       MemCopy ByVal fakeStr, fakeArr(1), TempLen
       Position = InStr(tempStr, fakeStr)
   Next X
   T = GetTickCount - T
   MsgBox "instr " & T
   Debug.Print "instr"; T; Position
   
   'is there a faster way? _
       I hate the idea of converting it _
       to a vbString just to get a Position
       
   T = GetTickCount
   For X = 1 To 1000
       For XX = 0 To Length / TempLen - 1
           Position = XX * TempLen + 1
           If byteArr(Position) = fakeArr(1) And byteArr(XX * TempLen + TempLen) = fakeArr(TempLen) Then
               XXXX = 1
               For XXX = Position + 1 To Position - 2 + TempLen
                   XXXX = XXXX + 1
                   If byteArr(XXX) <> fakeArr(XXXX) Then GoTo sKip
               Next XXX
               If XXXX = TempLen - 1 Then Exit For
           End If
sKip:
       Next XX
   Next X
   T = GetTickCount - T
   MsgBox "byte " & T
   Debug.Print "byte"; T; Position
   
End Sub



edit: oops dimmed fakeArr twice and removed dumDum() lol..

another edit: changed an "exit for" to "goto skip" to skip a check and changed numerics to variables to make it seem like a more realistic scenario and removed one of two "Position = XX * TempLen + 1" since the second wasn't needed. (*sigh*)
If the facts don't fit the theory, change the facts. - Albert Einstein

Adron

Try a loop calling memchr and memcmp or equivalent functions? I'm sure they must be exported by some DLL...

o.OV

#4
Quote from: Adron on February 13, 2004, 09:50 PM
Try a loop calling memchr and memcmp or equivalent functions? I'm sure they must be exported by some DLL...

Yeah.. I tried looking for it.
I use Api-Guide as reference.
Closest thing I could find was um.. lStrCmp
Perhaps I missed something?
If the facts don't fit the theory, change the facts. - Albert Einstein

Stealth

Quote from: o.OV on February 13, 2004, 10:04 PM
Quote from: Adron on February 13, 2004, 09:50 PM
Try a loop calling memchr and memcmp or equivalent functions? I'm sure they must be exported by some DLL...

Yeah.. I tried looking for it.
I use Api-Guide as reference.
Closest thing I could find was um.. lStrCmp
Perhaps I missed something?

I would think a Do loop as I used above would be far easier in this situation -- VB is most likely fairly fast at numeric comparisons, since that's all it's doing. Unless you manage to farm the job of actually finding the InStr position off to a DLL somehow, VB's byte comparisons should be fast enough.
- Stealth
Author of StealthBot

o.OV

#6
Quote from: Stealth on February 13, 2004, 10:28 PM
Quote from: o.OV on February 13, 2004, 10:04 PM
Quote from: Adron on February 13, 2004, 09:50 PM
Try a loop calling memchr and memcmp or equivalent functions? I'm sure they must be exported by some DLL...

Yeah.. I tried looking for it.
I use Api-Guide as reference.
Closest thing I could find was um.. lStrCmp
Perhaps I missed something?

I would think a Do loop as I used above would be far easier in this situation -- VB is most likely fairly fast at numeric comparisons, since that's all it's doing. Unless you manage to farm the job of actually finding the InStr position off to a DLL somehow, VB's byte comparisons should be fast enough.

A Do Loop is faster then a For Next loop in the event that there is no counter involved and the conditions for Exiting the Do Loop is internal.

In the case above. A counter is already required. Incrementing made by For Next is faster then incrementing done from inside a Loop of any type.
Also, placing the conditions check for Do Loop at the beginning is slower then placing it at the end. You should place the first conditions in an If Then statement before entering the Do Loop and place the conditions  check for the Do Loop at end of Do Loop (Loop while/until).
If the facts don't fit the theory, change the facts. - Albert Einstein

Adron

Quote from: Stealth on February 13, 2004, 10:28 PM
I would think a Do loop as I used above would be far easier in this situation -- VB is most likely fairly fast at numeric comparisons, since that's all it's doing. Unless you manage to farm the job of actually finding the InStr position off to a DLL somehow, VB's byte comparisons should be fast enough.

Yes, memchr is InStr for byte arrays. So that would be used for quickly finding the InStr position without having to turn the data into a string.

Adron

Quote from: o.OV on February 13, 2004, 10:04 PM

Yeah.. I tried looking for it.
I use Api-Guide as reference.
Closest thing I could find was um.. lStrCmp
Perhaps I missed something?

They can be found in msvcrt:

C:\WINNT\system32>dumpbin  /exports msvcrt.dll |find "mem"
       396  18B 0002696F _memccpy
       397  18C 000269C2 _memicmp
       683  2AA 000272D5 memchr
       684  2AB 00027376 memcmp
       685  2AC 00010980 memcpy
       686  2AD 0000FFB4 memmove
       687  2AE 00001A1D memset



o.OV

Quote from: Adron on February 14, 2004, 07:11 AM
Quote from: o.OV on February 13, 2004, 10:04 PM

Yeah.. I tried looking for it.
I use Api-Guide as reference.
Closest thing I could find was um.. lStrCmp
Perhaps I missed something?

They can be found in msvcrt:

C:\WINNT\system32>dumpbin  /exports msvcrt.dll |find "mem"
       396  18B 0002696F _memccpy
       397  18C 000269C2 _memicmp
       683  2AA 000272D5 memchr
       684  2AB 00027376 memcmp
       685  2AC 00010980 memcpy
       686  2AD 0000FFB4 memmove
       687  2AE 00001A1D memset


Thanks Adron I'll go google it and see how its declared,
+1 Adron. Thanks for trying to help me Stealth, +1.
If the facts don't fit the theory, change the facts. - Albert Einstein

Adron

Wrote this code to test it too:

 Dim a(1000000) As Byte
 Dim b(5) As Byte
 Dim i As Long, base As Long, ptr As Long
 For i = 0 To 1000000
   a(i) = Fix(Rnd * 256)
 Next i
 a(654321) = 12: b(0) = 12
 a(654322) = 23: b(1) = 23
 a(654323) = 34: b(2) = 34
 a(654324) = 45: b(3) = 45
 a(654325) = 56: b(4) = 56
 a(654326) = 67: b(5) = 67
 i = 0
 base = memchr(a(0), a(0), 1)
 Open "c:\tmp.txt" For Output As #1
 Do
   ptr = memchr(a(i), b(0), UBound(a) + 1 - i)
   If ptr = 0 Then Print #1, "No more matches": Exit Do
   i = ptr - base + 1
   If memcmp(a(i), b(1), UBound(b)) = 0 Then Print #1, "Found at " & i - 1
 Loop
 Close #1


and that generated this output:

Quote
Found at 654321
No more matches

o.OV

#11
I googled it and it is not what I hoped for.

I can't find any example declarations for memchr.

I did find a declaration for memcpy from msvcrt an equivalent of memcopy called from kernel32.

Making a DLL in C wouldn't be an effective solution would it .. ?

I'm just gonna poke around and crash myself .. maybe I might get lucky and guess the declaration.

EDIT:
WHOA. You got it? How was it declared?

And that is one major loop .. I hope it is faster then the one I posted.
If the facts don't fit the theory, change the facts. - Albert Einstein

Adron

I had some trouble with the declaration myself. Haven't yet found a perfect one. It seems that VB6 can understand a declaration telling it to use the __cdecl calling convention, but when compiled it gets hardcoded into raising an error (STUPID!!)


Private Declare Function memchr CDecl Lib "msvcrt" (buffer As Byte, ByVal data As Byte, length As Long) As Long
Private Declare Function memcmp CDecl Lib "msvcrt" (buffer1 As Byte, buffer2 As Byte, length As Long) As Long


So, that approach didn't work. What did work, running as a compiled app, was to write the declaration into a type library and use that. However, for some reason, VB crashes when trying to run when the type library I made is loaded. The declarations look like this:


[
  uuid(C7A86789-6ADB-45b1-BC6F-E6E8E72ECA1D),
  helpstring("Type library for calling VC runtime functions"),
  version(1.0)
]
library VCRUNTIME
{
   [
      helpstring("MSVCRT.DLL"),
      dllname("MSVCRT.DLL")
   ]
   module MSVCRT {
      [
         helpstring("Search for a byte in a byte array"),
         entry("memchr")
      ]
      int _cdecl memchr([in] unsigned char *buffer, [in] unsigned char data, [in] long length);
      [
         helpstring("Compare byte arrays"),
         entry("memcmp")
      ]
      int _cdecl memcmp([in] unsigned char *buffer1, [in] unsigned char *buffer2, [in] long length);
   }
}


You can download the type library here.

o.OV

I went to Project>References and added the tlb to the current project.



Option Explicit

'declare api

Private Declare Function memchr CDecl Lib "msvcrt" (buffer As Byte, ByVal data As Byte, length As Long) As Long
Private Declare Function memcmp CDecl Lib "msvcrt" (buffer1 As Byte, buffer2 As Byte, length As Long) As Long

Private Declare Sub MemCopy Lib "kernel32" Alias "RtlMoveMemory" (Destination As Any, Source As Any, ByVal numbytes As Long)
Private Declare Function GetTickCount Lib "kernel32.dll" () As Long

Private Sub Form_Load()
   
   'we don't need a form
   
   Unload Me
   
   AdronInByte
   
   Exit Sub
   
   'dim variables
   
   Dim testArr() As Byte
   Dim fakeArr() As Byte
   Dim testStr As String
   Dim fakeStr As String
   Dim testLen As Long
   Dim fakeLen As Long
   Dim T As Long
   Dim X As Long
   Dim Position As Long
   Dim testLBound As Long
   Dim fakeLBound As Long
   Dim Difference As Long
       
   'populate strings
   
   fakeStr = "<TEST>"
   testStr = Space$(1200)
   For X = 0 To 198
       Mid$(testStr, X * 6 + 1, 6) = fakeStr
   Next X
   fakeStr = "<FAKE>"
   
   'get string lengths
   
   testLen = Len(testStr)
   fakeLen = Len(fakeStr)
   
   'set position for fakeStr _
       toggle these 4 numbers for testing purposes
           X = 199            '199 * 6 = 1194
           'X = 20             '20 * 6 = 120 bytes
           'X = 60             '60 * 6 = 360 bytes
           'X = 1              '1 * 6 = 6 bytes
           'X = 0              '0 * 6 = 0 bytes
   Mid$(testStr, X * fakeLen + 1, fakeLen) = fakeStr
   
   'resize byte arrays
   
   Difference = -5
   
   ReDim testArr(Difference To testLen - 1 + Difference)
   ReDim fakeArr(Difference To fakeLen - 1 + Difference)
   
   testLBound = LBound(testArr)
   fakeLBound = LBound(fakeArr)
   
   'copy each string into its own byte array
   
   MemCopy testArr(Difference), ByVal testStr, testLen
   MemCopy fakeArr(Difference), ByVal fakeStr, fakeLen
   
   'the objective is to get the Position of fakeArr() in byteArr()
       
       'InStr
   
       T = GetTickCount
       For X = 1 To 1000
           MemCopy ByVal testStr, testArr(Difference), testLen
           MemCopy ByVal fakeStr, fakeArr(Difference), fakeLen
           Position = InStr(testStr, fakeStr) + Difference - 1
       Next X
       T = GetTickCount - T
       MsgBox "InStr " & T & " " & Position
       Debug.Print "InStr"; T; Position
   
       'is there a faster way? _
           I hate the idea of converting it _
           to a vbString just to get a Position
       
       'InByte
       
       T = GetTickCount
       For X = 1 To 1000
           Position = InByte(testArr, testLBound, UBound(testArr) - testLBound + 1, fakeArr, fakeLBound, UBound(fakeArr) - fakeLBound + 1)
       Next X
       T = GetTickCount - T
       MsgBox "InByte " & T & " " & Position
       Debug.Print "InByte"; T; Position
   
End Sub

Public Function InByte(MainArr() As Byte, mStart As Long, mLen As Long, FindArr() As Byte, fStart As Long, fLen As Long) As Long
   
   Dim X As Long
   Dim XX As Long
   Dim XXX As Long
   Dim bytePosition As Long
   Dim someThing As Long
           
   someThing = mLen / fLen - 1
           
   For X = 0 To mLen / fLen - 1
       bytePosition = X * fLen + mStart
       If MainArr(bytePosition) = FindArr(fStart) And MainArr(bytePosition + fLen - 1) = FindArr(fStart + fLen - 1) Then
           XXX = fStart
           For XX = bytePosition + 1 To bytePosition - 2 + fLen
               XXX = XXX + 1
               If MainArr(XX) <> FindArr(XXX) Then GoTo InByte_sKip
           Next XX
           If XXX = mStart + fLen - 2 Then InByte = bytePosition: Exit Function
       End If
InByte_sKip:
   Next X

End Function

Sub AdronInByte()

Dim a(1000000) As Byte
 Dim b(5) As Byte
 Dim i As Long, base As Long, ptr As Long
 For i = 0 To 1000000
   a(i) = Fix(Rnd * 256)
 Next i
 a(654321) = 12: b(0) = 12
 a(654322) = 23: b(1) = 23
 a(654323) = 34: b(2) = 34
 a(654324) = 45: b(3) = 45
 a(654325) = 56: b(4) = 56
 a(654326) = 67: b(5) = 67
 i = 0
 base = memchr(a(0), a(0), 1)
 Open "c:\tmp.txt" For Output As #1
 Do
   ptr = memchr(a(i), b(0), UBound(a) + 1 - i)
   If ptr = 0 Then Print #1, "No more matches": Exit Do
   i = ptr - base + 1
   If memcmp(a(i), b(1), UBound(b)) = 0 Then Print #1, "Found at " & i - 1
 Loop
 Close #1

End Sub



Run-time error '49':
Bad DLL calling convention

mmm..
If the facts don't fit the theory, change the facts. - Albert Einstein

Adron

#14
Quote from: Adron on February 14, 2004, 11:06 AM
I had some trouble with the declaration myself. Haven't yet found a perfect one. It seems that VB6 can understand a declaration telling it to use the __cdecl calling convention, but when compiled it gets hardcoded into raising an error (STUPID!!)


Private Declare Function memchr CDecl Lib "msvcrt" (buffer As Byte, ByVal data As Byte, length As Long) As Long
Private Declare Function memcmp CDecl Lib "msvcrt" (buffer1 As Byte, buffer2 As Byte, length As Long) As Long


Just to demonstrate, this code:


 b(5) = 67
 i = 0
 base = memchr(a(0), a(0), 1)


generates this disassembly:


_text:00401FB8                 mov     cx, 67          
_text:00401FBC                 call    ___vbaUI1I2
_text:00401FC1                 mov     ecx, [ebp+b]
_text:00401FC4                 add     ecx, [ebp+tmp_index] ; (tmp_index=5)
_text:00401FCA                 mov     [ecx], al       ; b(5) = 67
_text:00401FCC                 and     [ebp+i], 0      ; i = 0
_text:00401FD0                 push    49
_text:00401FD2                 call    @__vbaError     ; Err.Raise 49
                                      (Bad DLL calling convention)
_text:00401FD7                 movsx   eax, ax
_text:00401FDA                 mov     [ebp+base], eax ; base = Err.Raise(49)