• Welcome to Valhalla Legends Archive.
 

problem with removing html

Started by Tazo, December 18, 2004, 07:27 PM

Previous topic - Next topic

Tazo

I dl'd a file off of pscode to remove html from whatever. I used it to remove the HTML from my aim messages. [ http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=40698&lngWId=1 ] I made 2 textboxes, one called txt7 and one called txt8 . I then did

'sets the received message into the textboxForm1.txt7.Text = "" & CStr(A2) & " - " & CStr(A6)
'removes the html from the textboxForm1.txt8.Text = RemoveHTML(Form1.label7.Text)

Only problem is that the outcome is not right. For instance, if it puts in like
Joe - <HTML><BODY BGCOLOR="#ffffff"><FONT LANG="0">hello</FONT></BODY></HTML>

for whatever reason the outcome is just Joe -
any help?
When i try this program how its supposed to be used, by opening the text file and then making the html to text, it works fine. But when i just copy it into mine, it messes up.
btw vb6

MyndFyre

So-- to be clear, you got code somewhere else and you want us to support it?

It seems like PSC lets you post comments, support requests, and the like on their page.  Why don't you try there -- or even to e-mail the author?
QuoteEvery generation of humans believed it had all the answers it needed, except for a few mysteries they assumed would be solved at any moment. And they all believed their ancestors were simplistic and deluded. What are the odds that you are the first generation of humans who will understand reality?

After 3 years, it's on the horizon.  The new JinxBot, and BN#, the managed Battle.net Client library.

Quote from: chyea on January 16, 2009, 05:05 PM
You've just located global warming.

Warrior

I still dont see why you cant just use Replace
Quote from: effect on March 09, 2006, 11:52 PM
Islam is a steaming pile of fucking dog shit. Everything about it is flawed, anybody who believes in it is a terrorist, if you disagree with me, then im sorry your wrong.

Quote from: Rule on May 07, 2006, 01:30 PM
Why don't you stop being American and start acting like a decent human?

Banana fanna fo fanna

Quote from: Warrior on December 19, 2004, 09:05 AM
I still dont see why you cant just use Replace

Because that obviously wouldn't accomplish what he's trying to do.

s/<.*?>/ /

Mr. Neo

#4
Quote from: Banana fanna fo fanna on December 19, 2004, 10:43 AM
s/<.*?>/ /

This will not work as regular expressions are greedy and will replace everything starting from <HTML> to </HTML>.  This expression was the first one I thought of using and it turned out to work just a tad too well.  So, I went and created a little StripHTML function.


Function RemoveHTML(source As String) As String
  Dim a As RegEx
  Dim d As Integer
 
  a = New RegEx
  a.Options.DotMatchAll = True
  a.ReplacementPattern = ""
  a.Options.Greedy = False
 
  d = 1
  While d <> 0
    d = InStr(0, source, "<")
   
    If d <> 0 Then
      a.SearchPattern = "<.+>"
      source = a.Replace(source,0)
    End If
  Wend

  Return source
End Function


Please note, this was made in REALbasic so you will have to change some things around to the VB equivalents.  This code is tested and working.  I have not tested it, and doubt that it would work, if there are <'s scattered throughout the message.  You could improve upon that if you wish.

Edit:  It will also hang if there are not the same number of > as <.

Banana fanna fo fanna

Well, in Python doing .*? will not be greedy due to the question mark. I don't know if it's the same syntax in other implementations, though.