Valhalla Legends Archive

Programming => General Programming => Visual Basic Programming => Topic started by: Tazo on December 18, 2004, 07:27 PM

Title: problem with removing html
Post by: Tazo on December 18, 2004, 07:27 PM
I dl'd a file off of pscode to remove html from whatever. I used it to remove the HTML from my aim messages. [ http://www.Planet-Source-Code.com/vb/scripts/ShowCode.asp?txtCodeId=40698&lngWId=1 ] I made 2 textboxes, one called txt7 and one called txt8 . I then did

'sets the received message into the textboxForm1.txt7.Text = "" & CStr(A2) & " - " & CStr(A6)
'removes the html from the textboxForm1.txt8.Text = RemoveHTML(Form1.label7.Text)

Only problem is that the outcome is not right. For instance, if it puts in like
Joe - <HTML><BODY BGCOLOR="#ffffff"><FONT LANG="0">hello</FONT></BODY></HTML>

for whatever reason the outcome is just Joe -
any help?
When i try this program how its supposed to be used, by opening the text file and then making the html to text, it works fine. But when i just copy it into mine, it messes up.
btw vb6
Title: Re: problem with removing html
Post by: MyndFyre on December 19, 2004, 03:19 AM
So-- to be clear, you got code somewhere else and you want us to support it?

It seems like PSC lets you post comments, support requests, and the like on their page.  Why don't you try there -- or even to e-mail the author?
Title: Re: problem with removing html
Post by: Warrior on December 19, 2004, 09:05 AM
I still dont see why you cant just use Replace
Title: Re: problem with removing html
Post by: Banana fanna fo fanna on December 19, 2004, 10:43 AM
Quote from: Warrior on December 19, 2004, 09:05 AM
I still dont see why you cant just use Replace

Because that obviously wouldn't accomplish what he's trying to do.

s/<.*?>/ /
Title: Re: problem with removing html
Post by: Mr. Neo on December 19, 2004, 11:19 AM
Quote from: Banana fanna fo fanna on December 19, 2004, 10:43 AM
s/<.*?>/ /

This will not work as regular expressions are greedy and will replace everything starting from <HTML> to </HTML>.  This expression was the first one I thought of using and it turned out to work just a tad too well.  So, I went and created a little StripHTML function.


Function RemoveHTML(source As String) As String
  Dim a As RegEx
  Dim d As Integer
 
  a = New RegEx
  a.Options.DotMatchAll = True
  a.ReplacementPattern = ""
  a.Options.Greedy = False
 
  d = 1
  While d <> 0
    d = InStr(0, source, "<")
   
    If d <> 0 Then
      a.SearchPattern = "<.+>"
      source = a.Replace(source,0)
    End If
  Wend

  Return source
End Function


Please note, this was made in REALbasic so you will have to change some things around to the VB equivalents.  This code is tested and working.  I have not tested it, and doubt that it would work, if there are <'s scattered throughout the message.  You could improve upon that if you wish.

Edit:  It will also hang if there are not the same number of > as <.
Title: Re: problem with removing html
Post by: Banana fanna fo fanna on December 19, 2004, 11:26 AM
Well, in Python doing .*? will not be greedy due to the question mark. I don't know if it's the same syntax in other implementations, though.