• Welcome to Valhalla Legends Archive.
 

Regex for Capturing user details

Started by MyndFyre, July 19, 2009, 04:12 PM

Previous topic - Next topic

MyndFyre

This is written for .NET Regex syntax; I'm not sure what incompatibilities exist between it and other languages, so please bear with me.  It seems to work fine based on my testing with RegexBuddy, but I'm welcome to other test cases.

\A(?<charName>[^*@\s]*?)?\*?(?<accountName>[^*@\s]+)@?(?<gateway>\w+)?\z

The idea:
Diablo II character name can't include *, @, or whitespace.  Take that as little as possible before a star (if I'm connecting with D2, I always get a star prefixed to the name).  Then, the account name has the same rules (no *, @, or whitespace), and must be at least one character long.  Take that up until the optional @ and optional gateway, which must be at least one word character in length, to the end of the string.

It seems to work fine.  I could see it breaking if * or @ was allowed in usernames, but it doesn't seem to be.

Any thoughts?

[Edit]I added the \A anchor to require the beginning of the string as well.
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

MyndFyre

I've modified this slightly so that it wouldn't match the @ without a gateway being captured:

\A(?<charName>[^*@\s]*?)?\*?(?<accountName>[^*@\s]+)(?:@(?<gateway>\w+))?\z

(Previously, "MyndFyre@" would match with MyndFyre going into the <accountName> group, but won't anymore).
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

Camel

#2
There are some illys out there that have @ in the account name. Do you handle that case? Or, at least, not crash?

[edit] BNetUserTest.java

MyndFyre

Quote from: Camel on July 22, 2009, 02:02 PM
There are some illys out there that have @ in the account name. Do you handle that case? Or, at least, not crash?

[edit] BNetUserTest.java

Thank you!  I was not aware that @ could be in the account name. 

That could make for some very difficult identification.
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

Sixen

Yeah, I was going to say exactly that... Great example, W@R@USEast.
Blizzard Tech Support/Op W@R - FallenArms
The Chat Gem Lives!
http://www.diablofans.com
http://www.sixen.org

xpeh

Quote from: MyndFyre on July 19, 2009, 04:12 PM
This is written for .NET Regex syntax;
.NET has its own regex??

FFFFFFFFFFFFFFFFFFFFFFFFUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

MyndFyre

#6
Quote from: xpeh on July 23, 2009, 03:10 PM
Quote from: MyndFyre on July 19, 2009, 04:12 PM
This is written for .NET Regex syntax;
.NET has its own regex??

FFFFFFFFFFFFFFFFFFFFFFFFUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
It turns out that there are a lot of variations among a number of different regex flavors and I was simply saying that I don't know if there are any distinctions that might need to be made between the one I posted and something else that isn't, say, as feature-rich.  For instance, according to that chart, Java, Python, and Ruby don't support (?n) for explicit capture, and .NET doesn't support possessive quantifiers.

So, time to get off your high horse.
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

MyndFyre

OK, thanks to Camel's comments, I think this will assist in accuracy:


\A(?:(?<charName>[^*#@\s]+)\*)?(?<accountName>[^#*\s]+?)(?:#(?<instance>\d{1,9}))?(?:@(?<gateway>\w+))?(?:#(?<instance>\d{1,9}))?\z


Alternatively, I can use:
\A(?:(?<charName>[^*#@\s]+)\*)?(?<accountName>[^#*\s]+?)(?:#(?<instance>\d{1,9}))?(?:@(?<gateway>USEast|USWest|Asia|Europe|Azeroth|Lordaeron|Kalimdor|Northrend))?(?:#(?<instance>\d{1,9}))?\z

I've refactored it to use non-capturing groups to make sure that the character name match always consumes the *, the gateway match always consumes the last @, and the instance matching always consumes the hash.  The only possible non-standard support shown here is the instance, which can appear before or after the namespace, which I think is non-standard (testuser@Azeroth#2 would indicate that the user's account name is "testuser@Azeroth", but I doubt that such a user exists).

The second one correctly matches "W@R" as accountName, but the first mistakenly thinks that R is the namespace.  Both correctly match "($@$@$@)". 

I think the best solution would be to use the second for real servers where the namespaces are well-defined and to drop namespace support altogether for non-legit servers.
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

Sixen

Quote from: MyndFyre on July 29, 2009, 10:25 AM
The only possible non-standard support shown here is the instance, which can appear before or after the namespace, which I think is non-standard (testuser@Azeroth#2 would indicate that the user's account name is "testuser@Azeroth", but I doubt that such a user exists).

Maybe i'm reading this incorrectly, but it's possible to see that user. If testuser@Azeroth and testuser@USEast are in the same channel, they will see eachother's namespaces. Therefore, if testuser@Azeroth#2 enters, the SC user will see "testuser@Azeroth#2", just as if testuser@USEast#2 would enter, the War3 user would see "testuser@USEast#2".
Blizzard Tech Support/Op W@R - FallenArms
The Chat Gem Lives!
http://www.diablofans.com
http://www.sixen.org

MyndFyre

@Sixen: The question to which you're responding might be different than the one I'm trying to address.

Suppose we have the account named "testuser" that exists on both USEast and Azeroth in those namespaces.  In that instance, "testuser" would appear as "testuser@USEast" to the person using Warcraft III, and as "testuser" to himself.  Conversely, the Warcraft user would appear as "testuser@Azeroth" to the person using SC/D2, but "testuser" to himself.

HOWEVER, if a user created an illy named "testuser@Azeroth" using Starcraft, then that user would appear as "testuser@Azeroth@USEast" to someone using Warcraft 3.  In order to see the hash after the @, though, that account must be logged on multiple times so that the hash can be generated by the server ("testuser@Azeroth#2" or "testuser@Azeroth#2@USEast").

The reason that the second regex above is more accurate is that, it's VERY unlikely that anyone has an illy with @Azeroth, @USEast, etc. in circulation (I would wager that, if such an account did exist, Blizzard would have killed it by now).  But why would anyone have guessed to create such an account?
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

brew

Wouldn't it be a lot more professional, cleaner, and efficient to write a function for this instead of using a regex?
<3 Zorm
Quote[01:08:05 AM] <@Zorm> haha, me get pussy? don't kid yourself quik
Scio te esse, sed quid sumne? :P

MyndFyre

Quote from: brew on July 29, 2009, 06:52 PM
Wouldn't it be a lot more professional, cleaner, and efficient to write a function for this instead of using a regex?
I don't know about "more professional" (which seems fairly subjective) and cleaner or efficient, and I'm not saying that this is going to work for me right.  It's a text parsing problem.  Regex is a text parsing solution.  *shrug*
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

Sixen

Quote from: MyndFyre on July 29, 2009, 02:24 PM
@Sixen: The question to which you're responding might be different than the one I'm trying to address.

Suppose we have the account named "testuser" that exists on both USEast and Azeroth in those namespaces.  In that instance, "testuser" would appear as "testuser@USEast" to the person using Warcraft III, and as "testuser" to himself.  Conversely, the Warcraft user would appear as "testuser@Azeroth" to the person using SC/D2, but "testuser" to himself.

HOWEVER, if a user created an illy named "testuser@Azeroth" using Starcraft, then that user would appear as "testuser@Azeroth@USEast" to someone using Warcraft 3.  In order to see the hash after the @, though, that account must be logged on multiple times so that the hash can be generated by the server ("testuser@Azeroth#2" or "testuser@Azeroth#2@USEast").

The reason that the second regex above is more accurate is that, it's VERY unlikely that anyone has an illy with @Azeroth, @USEast, etc. in circulation (I would wager that, if such an account did exist, Blizzard would have killed it by now).  But why would anyone have guessed to create such an account?

Ooooh, I understand. Misunderstanding, we were talking about two different things.
Blizzard Tech Support/Op W@R - FallenArms
The Chat Gem Lives!
http://www.diablofans.com
http://www.sixen.org

xpeh

brew, you mean to implement your own regex?

It's easier only for noobs who can't handle regex. They are worth it, inspite of that they are write-only code. Main reason not to use it is speed - if you have regex in your main cycle, replacing it can make a real big boost depending on regex.

Camel

#14
I don't really have time to read the thread in detail right now, but it looks like you're missing at least one gateway: @Blizzard (check out #Blizzard Tech Support on USWest), and are assuming that 'instance' can only be one digit, which is definitely false. A good test case might be W@R@Blizzard#101, which should break on both fronts with your second regex.

Not sure if this helps, but in my bot I force the user to pick one of the *.battle.net named servers, and then use that information to infer the logged in user's gateway, and subsequently validate other users' names.