I'm interested in implementing some kind of automated pattern recognition to defeat certain types of spam attacks.  For instance, it would be advantangeous if clients could recognize that large numbers of users with randomized alphanumeric names joining the channel within a short period of time is suspicious and most likely a spam attack.

Any suggestions on how to detect commonality or patterns between names in these kinds of situations?


start off by looking for numbers within the name.


I remember coming across some interesting C code a few weeks back. It was a password integrity routine, it would *almost always* return false for pure english-ish words. (for instance, Eibro is not a real word, but it probably wouldn't pass the test)
I'll be damned if I can find it now, though I remember it speaking of some sort of 'triple rule' for all words in the english language. I googled for about 5 minutes, but I can't seem to come up with it again.

This could be used, or modified to suit your purposes. If users joining/leaving in a short period of time pass this test, they're most likely randomized. I'm pretty sure the function output degrees of integrity, which would be much more useful than just true/false.
Eibro of Yeti Lovers.


Although it may be a little complex for this situation, a basic neural net would work well for this.  It'd have to be "trained" to recognize name entropy, but with a sufficient number of examples, it wouldn't be a problem.  Anyway, probably too complex a solution for what you need, but just a suggestion.

Banana fanna fo fanna

CBR (case-based reasoning) would be a great way to attack this I think.

There's a great article in JavaPro about it (search www.javapro.com for cbr). Essentially, it involves isolating X number of criteria for each result/search and placing them on an X dimensional coordinate system. Plot a few results (examples: mail from a friend, sex spam, viagra spam, nigerian spam, security announcement, etc) on the graph, and for each incoming mail, plot the mail on the graph and see which result it is closest to. Then, you can decide whether or not it is spam or not.


After today's example of a mass-loading bot, I think a start to automatically banning people off of patterns in there names would be to have some sort of learning mode that learns each time the channel is massed. For example, right now all the bots in Op [vL] are random numeric names 15 characters in length. Have the bot save those parameters and from now on ban anyone one with a username that matches the parameters. The bot could be programmed to recognize new parameters during these attacks, and ban accordingly.
Banana fanna fo fanna

That's definately a neural net.


Keep in mind that the solution should not be so processor intensive that floodbot attacks will effectively DoS the client out of useablility.


Yes and no, I am thinking we could make it a little simpler.
With the neural net, you just have to train it to an adequate point, after that, any code for adjusting weights can go.  You just need the algorithm and the post-training weights stored in a file in order to do the actual check.


When a flood/spam attack happens, recognize the pattern used by the individual. This will more than likely match a pattern used by the particular flood bot. Then create a profile, and when any suspicious activity occurs that fits a profile take the necessary actions specific to your channels needs. If a flood/spam attack takes place that does not fit a specific profile you could just have your bot in a learning mode trying to figure it out, but not allowing itself to crash or flood. You could also consolidate profiles between members of your botnet to add to the learning process. While this may not prevent every occurrence, it will prevent a large percentage of people who use other peoples tools, and keep those persistent in flooding battle.net channels on their toes. Profiles could contain all characteristics of the bots used in the attack from their usernames, to their client type, to what they spam.

This is my idea to this problems solution, so feel free to build upon this or share your own. I think this is a problem many people could learn
   To determine a viable means for determining commonality among strings.

   Trim all numbers from a name. Then generate a duality table which creates a running pair sequence for the names as shown below (Should contain 14 WORDs which should be the running 2 byte sequence) These entries are then individually summed and then compared against each entry in other names and the difference stored.  The total difference from the pairs is the commonality.  The higher the number - the greater the uniqueness.  A threshold can be set and you can opt to only do this check against unknown/unflagged users.  This should eliminate nearly every bot attack.  When you trim the numbers - you eliminate the ability of bots to create a higher commonality seed due to the differentials created because of differences in numbers.  To get a better result you could lowercase or uppercase the string prior to this check as well.

Here are some usernames their 15 byte hex equivalent and the duality table generation results:

Name:   indulgence
HexStr:   69 6E 64 75 6C 67 65 6E 63 65 00 00 00 00 00

Duality Table   results:
69 6E         -> 00D7
6E 64         -> 00D2
64 75         -> 00D9
75 6C         -> 00E1
6C 67         -> 00D3
67 65         -> 00CC
65 6E         -> 00D3
6E 63         -> 00D1
63 65         -> 00C8
65 00         -> 0065
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000

Name:   ecnegludni
HexStr:   65 63 6E 65 67 6C 75 64 6E 69 00 00 00 00 00

Duality Table   results:
65 63         -> 00C8
63 6E         -> 00D1
6E 65         -> 00D3
65 67         -> 00CC
67 6C         -> 00D3
6C 75         -> 00E1
75 64         -> 00D9
64 6E         -> 00D2
6E 69         -> 00D7
69 00         -> 0069
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000

Name:   thuscelackpiss
HexStr:   74 68 75 73 63 65 6C 61 63 6B 70 69 73 73 00

Duality Table   results:
74 68         -> 00DC
68 75         -> 00DD
75 73         -> 00E8
73 63         -> 00D6
63 65         -> 00C8
65 6C         -> 00D1
6C 61         -> 00CD
61 63         -> 00C4
63 6B         -> 00CE
6B 70         -> 00DB
70 69         -> 00D9
69 73         -> 00DC
73 73         -> 00E6
73 00         -> 0073

Name:   hismajesty.
HexStr:   6D 61 6A 65 73 74 79 2E 00 00 00 00 00 00 00

Duality Table   results:
6D 61         -> 00CE
61 6A         -> 00CB
6A 65         -> 00CF
65 73         -> 00D8
73 74         -> 00E7
74 79         -> 00ED
79 2E         -> 00A7
2E 00         -> 002E
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000

Results of comparison for: hismajesty.  (vs)  thuscelackpiss

thuscelackpiss  hismajesty.       Results
--------------  -----------       -------
00DC            00CE         ->  000E
00DD            00CB         ->  0012
00E8            00CF         ->  0019
00D6            00D8         ->  0002
00C8            00E7         ->  001F
00D1            00ED         ->  001C
00CD            00A7         ->  0026
00C4            002E         ->  0096
00CE            0000         ->  00CE
00DB            0000         ->  00DB
00D9            0000         ->  00D9
00DC            0000         ->  00DC
00E6            0000         ->  00E6
0073            0000         ->  0073
          Commonality Seed   ->  05E9

Results of comparison for: indulgence  (vs)  ecnegludni

indulgence   ecnegludni       Results
----------   ----------       -------
00D7            00C8         ->  000F
00D2            00D1         ->  0001
00D9            00D3         ->  0006
00E1            00CC         ->  0015
00D3            00D3         ->  0000
00CC            00E1         ->  0015
00D3            00D9         ->  0006
00D1            00D2         ->  0001
00C8            00D7         ->  000F
0065            0069         ->  0004
0000            0000         ->  0000
0000            0000         ->  0000
0000            0000         ->  0000
0000            0000         ->  0000
          Commonality Seed   ->  005A




Also, you could use a letter-commonality table, but that would take more time.


very very nice =) indulgence, very nice