Valhalla Legends Archive

Programming => General Programming => .NET Platform => Topic started by: TehUser on June 18, 2005, 05:56 PM

Title: Warning: ASCII.GetBytes()
Post by: TehUser on June 18, 2005, 05:56 PM
Something that will probably come up if you write network code in .NET is converting a string to bytes.  I've mainly seen this done with the following code:

byte[] byteBuffer = Encoding.ASCII.GetBytes(stringBuffer);

This, as it turns out, can be a bad thing.  For those of us not intimately familiar with the ASCII standard, it turns out that ASCII characters only use 7 bits.  What does this mean for your conversion?  It means that any byte that is too high to be displayed in 7 bits (0xFF, for example) will be converted into 0x3F, the question mark.  This can be particularly confusing when you're trying to discover why, when you send a packet with 0xFF at the front, you end up disconnected all of the time.  So, the simplest solution that I have found is to write your own function to convert strings to byte arrays.
Title: Re: Warning: ASCII.GetBytes()
Post by: K on June 18, 2005, 06:14 PM
I had this problem myself.  I worked around it by using Encoding.Default which, at least on my system, appears to be ANSI encoding, although it probably varies by system.
Title: Re: Warning: ASCII.GetBytes()
Post by: dxoigmn on June 19, 2005, 12:53 PM
Probably best to use Encoding.UTF8 at least for Battle.net. Why would you be insert high-ascii characters into a string anyways?
Title: Re: Warning: ASCII.GetBytes()
Post by: TehUser on June 19, 2005, 09:29 PM
Because it's not for Battle.net.
Title: Re: Warning: ASCII.GetBytes()
Post by: dxoigmn on June 20, 2005, 12:00 AM
Quote from: TehUser on June 19, 2005, 09:29 PM
Because it's not for Battle.net.

The question wasn't assuming you're doing something for battle.net (the previous comment was just a general thought about Battle.net though). I'm just wondering what requires you to put high-ascii characters into a string. Seems like a byte array would be more useful considering that is what you're doing in the end anyways. The question was generated because you said something about sending 0xff and the high-bit being lost. I guess what I was trying to get at is using a String as a buffer is a bad idea and probably best to use a array of bytes.
Title: Re: Warning: ASCII.GetBytes()
Post by: Joe[x86] on June 23, 2005, 07:47 PM
0xFF is part of the Battle.net protocol header.
Title: Re: Warning: ASCII.GetBytes()
Post by: MyndFyre on June 23, 2005, 09:07 PM
Quote from: 0x4A6F655B7838365D on June 23, 2005, 07:47 PM
0xFF is part of the Battle.net protocol header.
Yes, but that's irrelevant because using a string as a buffer is unintelligent.
Title: Re: Warning: ASCII.GetBytes()
Post by: l)ragon on July 10, 2005, 05:49 PM
    Private Function EncodeStringToByteArray(ByVal inBuf As String) As Byte()
        Dim bytOut(inBuf.Length - 1) As Byte
        Dim intI As Integer

        For intI = 0 To (inBuf.Length - 1)
            bytOut(intI) = CByte(Asc(Mid(inBuf, intI + 1)))
        Next
        Return bytOut
    End Function

    Private Function EncodeByteArrayToString(ByVal inBuf As Byte(), ByVal numByts As Integer) As String
        Dim bytOut As String
        Dim intI As Integer

        For intI = 0 To (numByts - 1)
            bytOut += Chr(Val(inBuf(intI)))
        Next
        Return bytOut
    End Function

You could consider doing something like this instead of useing that junk ms class, that class will only give you your bytes for character value range between 0 and 128.
Title: Re: Warning: ASCII.GetBytes()
Post by: MyndFyre on July 10, 2005, 06:58 PM
No, that *method* of that *instance* of that class will only give you values under 128.  Using a proper instance, you could do it the right way.

Quote from: dxoigmn on June 20, 2005, 12:00 AM
Quote from: TehUser on June 19, 2005, 09:29 PM
Because it's not for Battle.net.

The question wasn't assuming you're doing something for battle.net (the previous comment was just a general thought about Battle.net though). I'm just wondering what requires you to put high-ascii characters into a string. Seems like a byte array would be more useful considering that is what you're doing in the end anyways. The question was generated because you said something about sending 0xff and the high-bit being lost. I guess what I was trying to get at is using a String as a buffer is a bad idea and probably best to use a array of bytes.
I ignored this question before because I wasn't sure what you meant.  However, if you're using Unicode characters, you can have characters over 128 that would be lost using Encoding.ASCII.  That's why you should using Encoding.Unicode, or Encoding.UTF8, or the proper encoding based on the character set in use.
Title: Re: Warning: ASCII.GetBytes()
Post by: dxoigmn on July 10, 2005, 08:52 PM
Quote from: MyndFyre on July 10, 2005, 06:58 PM
No, that *method* of that *instance* of that class will only give you values under 128.  Using a proper instance, you could do it the right way.

Another thing to add: letting the framework do the conversion is smarter since there are so many things you can mess up with all the different types of encodings there are out there. Kinda of off-topic but the same can be said (http://blogs.msdn.com/larryosterman/archive/2005/07/05/435734.aspx) for crypto.
Title: Re: Warning: ASCII.GetBytes()
Post by: l)ragon on July 14, 2005, 04:07 PM
Imports System

    Private Function EncodeStringToByteArray(ByVal inBuf As String) As Byte()
        Dim bytOut(inBuf.Length - 1) As Byte
        Dim intI As Integer

        For intI = 0 To (inBuf.Length - 1)
            bytOut(intI) = BitConverter.GetBytes(inBuf.Chars(intI))(0)
        Next
        Return bytOut
    End Function

Another solution which would probably be better off being used than the one I posted above.
Was no need to edit the post above since it works too, but it's not thinking in the .net way (I guess thats a better way of saying it heh).
Title: Re: Warning: ASCII.GetBytes()
Post by: MyndFyre on July 14, 2005, 05:37 PM
No, I)ragon, you're still wrong, even more so now than before.

1.) .NET Strings are not null-terminated, and their Length properties reflect their actual length.  .NET String objects are serialized in wide pascal format, with two bytes preceeding the data specifying the length.  Natively, .NET String objects are stored in Unicode, with each character having two bytes.  You use the .Length property, not .Length - 1.

2.) By only taking the first byte of a two-byte character value (as BitConverter.GetBytes(char) returns a 2-byte array), you're losing any value over 255.  Unicode characters over 255 will be lost.

3.) Why would you do this at all?  System.Text.Encoding (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemtextencodingmemberstopic.asp) provides built-in managed support for all text encodings natively supported by Windows.  Simply calling Encoding.UTF8.GetBytes, Encoding.Default.GetBytes, or other similar methods will give you what you want.

Stop trying to fight it!
Title: Re: Warning: ASCII.GetBytes()
Post by: l)ragon on July 14, 2005, 06:53 PM
Quote from: MyndFyre on July 14, 2005, 05:37 PM
No, I)ragon, you're still wrong, even more so now than before.

1.) .NET Strings are not null-terminated, and their Length properties reflect their actual length.  .NET String objects are serialized in wide pascal format, with two bytes preceeding the data specifying the length.  Natively, .NET String objects are stored in Unicode, with each character having two bytes.  You use the .Length property, not .Length - 1.

2.) By only taking the first byte of a two-byte character value (as BitConverter.GetBytes(char) returns a 2-byte array), you're losing any value over 255.  Unicode characters over 255 will be lost.

3.) Why would you do this at all?  System.Text.Encoding (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemtextencodingmemberstopic.asp) provides built-in managed support for all text encodings natively supported by Windows.  Simply calling Encoding.UTF8.GetBytes, Encoding.Default.GetBytes, or other similar methods will give you what you want.

Stop trying to fight it!
Enlighten me on how what I did is "wrong" again, since I have passed every byte through this as a string starting from char 0x00 ending at char 0xFF all of which being successfull same with the other function above the one I just posted.
Title: Re: Warning: ASCII.GetBytes()
Post by: dxoigmn on July 14, 2005, 07:54 PM
Quote from: MyndFyre on July 14, 2005, 05:37 PM
No, I)ragon, you're still wrong, even more so now than before.

1.) .NET Strings are not null-terminated, and their Length properties reflect their actual length.  .NET String objects are serialized in wide pascal format, with two bytes preceeding the data specifying the length.  Natively, .NET String objects are stored in Unicode, with each character having two bytes.  You use the .Length property, not .Length - 1.

The .Length -1 is due to the inane fact (and one of a few nuances I have with VB) that when you define an array you tell it what the upper index is, not the number of elements in the array. Arrays in VB are zero-based though!
Title: Re: Warning: ASCII.GetBytes()
Post by: MyndFyre on July 14, 2005, 09:32 PM
Ahh.  K, I was wrong about the length thing.

I)ragon, I whipped up this program to demonstrate the differences in output when you have an extended character set in use:

Imports System.Text

Module Module1
    Const JPText As String = "これはひもである高価値非英国の特性を使用する。"

    Sub Main()
        Console.WriteLine("Output string: {0}", JPText)
        Console.WriteLine("String length: {0}", JPText.Length)

        Dim dragonBytes() As Byte
        dragonBytes = EncodeStringToByteArray(JPText)
        Console.WriteLine("Dragon's method byte array length: {0}", dragonBytes.Length)
        Console.WriteLine("|)ragon's method byte array output:")
        WriteByteArray(dragonBytes)
        Console.WriteLine("I)ragon's method returning to a string: {0}", _
            Encoding.Default.GetString(dragonBytes))
        ' I used Encoding.Default because it supports character values
        ' up to 255.

        Console.WriteLine()

        Dim myndBytes() As Byte
        myndBytes = CorrectEncoding(JPText)
        Console.WriteLine("MyndFyre's method (Unicode) byte array length: {0}", myndBytes.Length)
        Console.WriteLine("MyndFyre's method (Unicode) array output:")
        WriteByteArray(myndBytes)
        Console.WriteLine("MyndFyre's method (Unicode) returning to a string: {0}", _
            Encoding.Unicode.GetString(myndBytes))

        Console.WriteLine()

        myndBytes = CorrectEncoding(JPText, Encoding.UTF8)
        Console.WriteLine("MyndFyre's method (UTF-8) byte array length: {0}", myndBytes.Length)
        Console.WriteLine("MyndFyre's method (UTF-8) array output:")
        WriteByteArray(myndBytes)
        Console.WriteLine("MyndFyre's method (UTF-8) returning to a string: {0}", _
            Encoding.UTF8.GetString(myndBytes))

        Console.Read()
    End Sub

    'I)ragon's function
    Private Function EncodeStringToByteArray(ByVal inBuf As String) As Byte()
        Dim bytOut(inBuf.Length - 1) As Byte
        Dim intI As Integer

        For intI = 0 To (inBuf.Length - 1)
            bytOut(intI) = BitConverter.GetBytes(inBuf.Chars(intI))(0)
        Next
        Return bytOut
    End Function

    'MyndFyre's functions - note increased flexibility
    Private Function CorrectEncoding(ByVal inBuf As String, ByVal encStyle As Encoding) As Byte()
        Return encStyle.GetBytes(inBuf)
    End Function
    Private Function CorrectEncoding(ByVal inBuf As String) As Byte()
        Return CorrectEncoding(inBuf, Encoding.Unicode)
    End Function

    Private Sub WriteByteArray(ByVal buffer() As Byte)
        Dim go As Boolean = True
        Dim start As Integer = 0
        Do Until go = False
            go = WriteByteLine(buffer, start)
            start = start + 16
        Loop
    End Sub

    Private Function WriteByteLine(ByVal buffer() As Byte, _
        ByVal index As Integer) As Boolean

        Dim i As Integer
        Dim res As Boolean = True

        For i = index To index + 15
            If i < buffer.Length Then
                Console.Write("{0:x2} ", buffer(i))
            Else
                Console.Write("   ")
                res = False
            End If

            If i = index + 7 Then
                Console.Write(" ")
            End If
        Next

        Console.Write("  ")

        For i = index To index + 15
            Dim b As Byte
            If i < buffer.Length Then
                b = buffer(i)
            Else
                b = &H20 'space
            End If

            Dim c As Char
            c = ChrW(b)

            If Char.IsLetterOrDigit(c) Or Char.IsPunctuation(c) Or Char.IsSymbol(c) Or c = " " Then
                Console.Write(c.ToString())
            Else
                Console.Write(".")
            End If

            If i = index + 7 Then
                Console.Write(" ")
            End If
        Next
        Console.WriteLine()
        Return res
    End Function
End Module

Note that to save this in VB.NET, you have to go to Save As... and click the arrow on the "Save" button in the dialog, and select "Save with Encoding."  For the purposes of this project, I chose Unicode (UTF-8 with Signature).

This is what is output (note that Console programs do not support the extended character set and hence display ??????????? when the Japanese text is displayed):

Output string: ???????????????????????
String length: 23
Dragon's method byte array length: 23
|)ragon's method byte array output:
53 8c 6f 72 82 67 42 8b  d8 a1 24 5e f1 fd 6e 79   S.or.gB. O¡$^ñyny
27 92 7f 28 59 8b 02                               '..(Y..
I)ragon's method returning to a string: SOor,gB<O¡$^ñyny''⌂(Y<☻

MyndFyre's method (Unicode) byte array length: 46
MyndFyre's method (Unicode) array output:
53 30 8c 30 6f 30 72 30  82 30 67 30 42 30 8b 30   S0.0o0r0 .0g0B0.0
d8 9a a1 4f 24 50 5e 97  f1 82 fd 56 6e 30 79 72   O.¡O$P^. ñ.yVn0yr
27 60 92 30 7f 4f 28 75  59 30 8b 30 02 30         '`.0.O(u Y0.0.0
MyndFyre's method (Unicode) returning to a string: ???????????????????????

MyndFyre's method (UTF-8) byte array length: 69
MyndFyre's method (UTF-8) array output:
e3 81 93 e3 82 8c e3 81  af e3 81 b2 e3 82 82 e3   a..a..a. _a..a..a
81 a7 e3 81 82 e3 82 8b  e9 ab 98 e4 be a1 e5 80   .§a..a.. é«.ä.¡å.
a4 e9 9d 9e e8 8b b1 e5  9b bd e3 81 ae e7 89 b9   ☼é..è.±å ..a.rç..
e6 80 a7 e3 82 92 e4 bd  bf e7 94 a8 e3 81 99 e3   æ.§a..ä. ¿ç."a..a
82 8b e3 80 82                                     ..a..
MyndFyre's method (UTF-8) returning to a string: ???????????????????????


As you can see, you lost your data when you encoded this string with your method (by the way, this is what happened when I decoded your byte array with ASCII, Unicode, UTF-7, and UTF-8, respectively):

I)ragon's method returning to a string: S♀or☻gB♂X!$^q}ny'↕⌂(Y♂☻
I)ragon's method returning to a string: ???????????
I)ragon's method returning to a string: S?or?gB?O¡$^ñyny'?⌂(Y?☻
I)ragon's method returning to a string: SorgB?$^ny'⌂(Y☻


I hope you see why your method would cause problems for internationalization, and agree that it would just be better if we let the professionals who have already done the work for us manage our strings.

(By the way, that Japanese text came from Google where I translated it from: "This is a string that uses higher-value non-English characters.")
Title: Re: Warning: ASCII.GetBytes()
Post by: l)ragon on July 15, 2005, 12:26 PM
Well I was under the impression this was for a server so lets see this, this way then.

When your viewing the data recieved via the socket what are you going to see.
take one of your lines here
53 30 8c 30 6f 30 72 30  82 30 67 30 42 30 8b 30   S0.0o0r0 .0g0B0.0
d8 9a a1 4f 24 50 5e 97  f1 82 fd 56 6e 30 79 72   O.¡O$P^. ñ.yVn0yr
27 60 92 30 7f 4f 28 75  59 30 8b 30 02 30         '`.0.O(u Y0.0.0

true or false your going to have data recieved in this format.