• Welcome to Valhalla Legends Archive.
 

Warning: ASCII.GetBytes()

Started by TehUser, June 18, 2005, 05:56 PM

Previous topic - Next topic

TehUser

Something that will probably come up if you write network code in .NET is converting a string to bytes.  I've mainly seen this done with the following code:

byte[] byteBuffer = Encoding.ASCII.GetBytes(stringBuffer);

This, as it turns out, can be a bad thing.  For those of us not intimately familiar with the ASCII standard, it turns out that ASCII characters only use 7 bits.  What does this mean for your conversion?  It means that any byte that is too high to be displayed in 7 bits (0xFF, for example) will be converted into 0x3F, the question mark.  This can be particularly confusing when you're trying to discover why, when you send a packet with 0xFF at the front, you end up disconnected all of the time.  So, the simplest solution that I have found is to write your own function to convert strings to byte arrays.

K

I had this problem myself.  I worked around it by using Encoding.Default which, at least on my system, appears to be ANSI encoding, although it probably varies by system.

dxoigmn

#2
Probably best to use Encoding.UTF8 at least for Battle.net. Why would you be insert high-ascii characters into a string anyways?

TehUser


dxoigmn

#4
Quote from: TehUser on June 19, 2005, 09:29 PM
Because it's not for Battle.net.

The question wasn't assuming you're doing something for battle.net (the previous comment was just a general thought about Battle.net though). I'm just wondering what requires you to put high-ascii characters into a string. Seems like a byte array would be more useful considering that is what you're doing in the end anyways. The question was generated because you said something about sending 0xff and the high-bit being lost. I guess what I was trying to get at is using a String as a buffer is a bad idea and probably best to use a array of bytes.

Joe[x86]

0xFF is part of the Battle.net protocol header.
Quote from: brew on April 25, 2007, 07:33 PM
that made me feel like a total idiot. this entire thing was useless.

MyndFyre

Quote from: 0x4A6F655B7838365D on June 23, 2005, 07:47 PM
0xFF is part of the Battle.net protocol header.
Yes, but that's irrelevant because using a string as a buffer is unintelligent.
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

l)ragon

    Private Function EncodeStringToByteArray(ByVal inBuf As String) As Byte()
        Dim bytOut(inBuf.Length - 1) As Byte
        Dim intI As Integer

        For intI = 0 To (inBuf.Length - 1)
            bytOut(intI) = CByte(Asc(Mid(inBuf, intI + 1)))
        Next
        Return bytOut
    End Function

    Private Function EncodeByteArrayToString(ByVal inBuf As Byte(), ByVal numByts As Integer) As String
        Dim bytOut As String
        Dim intI As Integer

        For intI = 0 To (numByts - 1)
            bytOut += Chr(Val(inBuf(intI)))
        Next
        Return bytOut
    End Function

You could consider doing something like this instead of useing that junk ms class, that class will only give you your bytes for character value range between 0 and 128.
*^~·.,¸¸,.·´¯`·.,¸¸,.-·~^*ˆ¨¯¯¨ˆ*^~·.,l)ragon,.-·~^*ˆ¨¯¯¨ˆ*^~·.,¸¸,.·´¯`·.,¸¸,.-·~^*

MyndFyre

#8
No, that *method* of that *instance* of that class will only give you values under 128.  Using a proper instance, you could do it the right way.

Quote from: dxoigmn on June 20, 2005, 12:00 AM
Quote from: TehUser on June 19, 2005, 09:29 PM
Because it's not for Battle.net.

The question wasn't assuming you're doing something for battle.net (the previous comment was just a general thought about Battle.net though). I'm just wondering what requires you to put high-ascii characters into a string. Seems like a byte array would be more useful considering that is what you're doing in the end anyways. The question was generated because you said something about sending 0xff and the high-bit being lost. I guess what I was trying to get at is using a String as a buffer is a bad idea and probably best to use a array of bytes.
I ignored this question before because I wasn't sure what you meant.  However, if you're using Unicode characters, you can have characters over 128 that would be lost using Encoding.ASCII.  That's why you should using Encoding.Unicode, or Encoding.UTF8, or the proper encoding based on the character set in use.
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

dxoigmn

Quote from: MyndFyre on July 10, 2005, 06:58 PM
No, that *method* of that *instance* of that class will only give you values under 128.  Using a proper instance, you could do it the right way.

Another thing to add: letting the framework do the conversion is smarter since there are so many things you can mess up with all the different types of encodings there are out there. Kinda of off-topic but the same can be said for crypto.

l)ragon

Imports System

    Private Function EncodeStringToByteArray(ByVal inBuf As String) As Byte()
        Dim bytOut(inBuf.Length - 1) As Byte
        Dim intI As Integer

        For intI = 0 To (inBuf.Length - 1)
            bytOut(intI) = BitConverter.GetBytes(inBuf.Chars(intI))(0)
        Next
        Return bytOut
    End Function

Another solution which would probably be better off being used than the one I posted above.
Was no need to edit the post above since it works too, but it's not thinking in the .net way (I guess thats a better way of saying it heh).
*^~·.,¸¸,.·´¯`·.,¸¸,.-·~^*ˆ¨¯¯¨ˆ*^~·.,l)ragon,.-·~^*ˆ¨¯¯¨ˆ*^~·.,¸¸,.·´¯`·.,¸¸,.-·~^*

MyndFyre

No, I)ragon, you're still wrong, even more so now than before.

1.) .NET Strings are not null-terminated, and their Length properties reflect their actual length.  .NET String objects are serialized in wide pascal format, with two bytes preceeding the data specifying the length.  Natively, .NET String objects are stored in Unicode, with each character having two bytes.  You use the .Length property, not .Length - 1.

2.) By only taking the first byte of a two-byte character value (as BitConverter.GetBytes(char) returns a 2-byte array), you're losing any value over 255.  Unicode characters over 255 will be lost.

3.) Why would you do this at all?  System.Text.Encoding provides built-in managed support for all text encodings natively supported by Windows.  Simply calling Encoding.UTF8.GetBytes, Encoding.Default.GetBytes, or other similar methods will give you what you want.

Stop trying to fight it!
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

l)ragon

Quote from: MyndFyre on July 14, 2005, 05:37 PM
No, I)ragon, you're still wrong, even more so now than before.

1.) .NET Strings are not null-terminated, and their Length properties reflect their actual length.  .NET String objects are serialized in wide pascal format, with two bytes preceeding the data specifying the length.  Natively, .NET String objects are stored in Unicode, with each character having two bytes.  You use the .Length property, not .Length - 1.

2.) By only taking the first byte of a two-byte character value (as BitConverter.GetBytes(char) returns a 2-byte array), you're losing any value over 255.  Unicode characters over 255 will be lost.

3.) Why would you do this at all?  System.Text.Encoding provides built-in managed support for all text encodings natively supported by Windows.  Simply calling Encoding.UTF8.GetBytes, Encoding.Default.GetBytes, or other similar methods will give you what you want.

Stop trying to fight it!
Enlighten me on how what I did is "wrong" again, since I have passed every byte through this as a string starting from char 0x00 ending at char 0xFF all of which being successfull same with the other function above the one I just posted.
*^~·.,¸¸,.·´¯`·.,¸¸,.-·~^*ˆ¨¯¯¨ˆ*^~·.,l)ragon,.-·~^*ˆ¨¯¯¨ˆ*^~·.,¸¸,.·´¯`·.,¸¸,.-·~^*

dxoigmn

Quote from: MyndFyre on July 14, 2005, 05:37 PM
No, I)ragon, you're still wrong, even more so now than before.

1.) .NET Strings are not null-terminated, and their Length properties reflect their actual length.  .NET String objects are serialized in wide pascal format, with two bytes preceeding the data specifying the length.  Natively, .NET String objects are stored in Unicode, with each character having two bytes.  You use the .Length property, not .Length - 1.

The .Length -1 is due to the inane fact (and one of a few nuances I have with VB) that when you define an array you tell it what the upper index is, not the number of elements in the array. Arrays in VB are zero-based though!

MyndFyre

Ahh.  K, I was wrong about the length thing.

I)ragon, I whipped up this program to demonstrate the differences in output when you have an extended character set in use:

Imports System.Text

Module Module1
    Const JPText As String = "これはひもである高価値非英国の特性を使用する。"

    Sub Main()
        Console.WriteLine("Output string: {0}", JPText)
        Console.WriteLine("String length: {0}", JPText.Length)

        Dim dragonBytes() As Byte
        dragonBytes = EncodeStringToByteArray(JPText)
        Console.WriteLine("Dragon's method byte array length: {0}", dragonBytes.Length)
        Console.WriteLine("|)ragon's method byte array output:")
        WriteByteArray(dragonBytes)
        Console.WriteLine("I)ragon's method returning to a string: {0}", _
            Encoding.Default.GetString(dragonBytes))
        ' I used Encoding.Default because it supports character values
        ' up to 255.

        Console.WriteLine()

        Dim myndBytes() As Byte
        myndBytes = CorrectEncoding(JPText)
        Console.WriteLine("MyndFyre's method (Unicode) byte array length: {0}", myndBytes.Length)
        Console.WriteLine("MyndFyre's method (Unicode) array output:")
        WriteByteArray(myndBytes)
        Console.WriteLine("MyndFyre's method (Unicode) returning to a string: {0}", _
            Encoding.Unicode.GetString(myndBytes))

        Console.WriteLine()

        myndBytes = CorrectEncoding(JPText, Encoding.UTF8)
        Console.WriteLine("MyndFyre's method (UTF-8) byte array length: {0}", myndBytes.Length)
        Console.WriteLine("MyndFyre's method (UTF-8) array output:")
        WriteByteArray(myndBytes)
        Console.WriteLine("MyndFyre's method (UTF-8) returning to a string: {0}", _
            Encoding.UTF8.GetString(myndBytes))

        Console.Read()
    End Sub

    'I)ragon's function
    Private Function EncodeStringToByteArray(ByVal inBuf As String) As Byte()
        Dim bytOut(inBuf.Length - 1) As Byte
        Dim intI As Integer

        For intI = 0 To (inBuf.Length - 1)
            bytOut(intI) = BitConverter.GetBytes(inBuf.Chars(intI))(0)
        Next
        Return bytOut
    End Function

    'MyndFyre's functions - note increased flexibility
    Private Function CorrectEncoding(ByVal inBuf As String, ByVal encStyle As Encoding) As Byte()
        Return encStyle.GetBytes(inBuf)
    End Function
    Private Function CorrectEncoding(ByVal inBuf As String) As Byte()
        Return CorrectEncoding(inBuf, Encoding.Unicode)
    End Function

    Private Sub WriteByteArray(ByVal buffer() As Byte)
        Dim go As Boolean = True
        Dim start As Integer = 0
        Do Until go = False
            go = WriteByteLine(buffer, start)
            start = start + 16
        Loop
    End Sub

    Private Function WriteByteLine(ByVal buffer() As Byte, _
        ByVal index As Integer) As Boolean

        Dim i As Integer
        Dim res As Boolean = True

        For i = index To index + 15
            If i < buffer.Length Then
                Console.Write("{0:x2} ", buffer(i))
            Else
                Console.Write("   ")
                res = False
            End If

            If i = index + 7 Then
                Console.Write(" ")
            End If
        Next

        Console.Write("  ")

        For i = index To index + 15
            Dim b As Byte
            If i < buffer.Length Then
                b = buffer(i)
            Else
                b = &H20 'space
            End If

            Dim c As Char
            c = ChrW(b)

            If Char.IsLetterOrDigit(c) Or Char.IsPunctuation(c) Or Char.IsSymbol(c) Or c = " " Then
                Console.Write(c.ToString())
            Else
                Console.Write(".")
            End If

            If i = index + 7 Then
                Console.Write(" ")
            End If
        Next
        Console.WriteLine()
        Return res
    End Function
End Module

Note that to save this in VB.NET, you have to go to Save As... and click the arrow on the "Save" button in the dialog, and select "Save with Encoding."  For the purposes of this project, I chose Unicode (UTF-8 with Signature).

This is what is output (note that Console programs do not support the extended character set and hence display ??????????? when the Japanese text is displayed):

Output string: ???????????????????????
String length: 23
Dragon's method byte array length: 23
|)ragon's method byte array output:
53 8c 6f 72 82 67 42 8b  d8 a1 24 5e f1 fd 6e 79   S.or.gB. O¡$^ñyny
27 92 7f 28 59 8b 02                               '..(Y..
I)ragon's method returning to a string: SOor,gB<O¡$^ñyny''⌂(Y<☻

MyndFyre's method (Unicode) byte array length: 46
MyndFyre's method (Unicode) array output:
53 30 8c 30 6f 30 72 30  82 30 67 30 42 30 8b 30   S0.0o0r0 .0g0B0.0
d8 9a a1 4f 24 50 5e 97  f1 82 fd 56 6e 30 79 72   O.¡O$P^. ñ.yVn0yr
27 60 92 30 7f 4f 28 75  59 30 8b 30 02 30         '`.0.O(u Y0.0.0
MyndFyre's method (Unicode) returning to a string: ???????????????????????

MyndFyre's method (UTF-8) byte array length: 69
MyndFyre's method (UTF-8) array output:
e3 81 93 e3 82 8c e3 81  af e3 81 b2 e3 82 82 e3   a..a..a. _a..a..a
81 a7 e3 81 82 e3 82 8b  e9 ab 98 e4 be a1 e5 80   .§a..a.. é«.ä.¡å.
a4 e9 9d 9e e8 8b b1 e5  9b bd e3 81 ae e7 89 b9   ☼é..è.±å ..a.rç..
e6 80 a7 e3 82 92 e4 bd  bf e7 94 a8 e3 81 99 e3   æ.§a..ä. ¿ç."a..a
82 8b e3 80 82                                     ..a..
MyndFyre's method (UTF-8) returning to a string: ???????????????????????


As you can see, you lost your data when you encoded this string with your method (by the way, this is what happened when I decoded your byte array with ASCII, Unicode, UTF-7, and UTF-8, respectively):

I)ragon's method returning to a string: S♀or☻gB♂X!$^q}ny'↕⌂(Y♂☻
I)ragon's method returning to a string: ???????????
I)ragon's method returning to a string: S?or?gB?O¡$^ñyny'?⌂(Y?☻
I)ragon's method returning to a string: SorgB?$^ny'⌂(Y☻


I hope you see why your method would cause problems for internationalization, and agree that it would just be better if we let the professionals who have already done the work for us manage our strings.

(By the way, that Japanese text came from Google where I translated it from: "This is a string that uses higher-value non-English characters.")
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.