Jump to content


remove specific string


2 replies to this topic

#1 derek.sullivan

    Jedi In Training

  • Members
  • PipPip
  • 341 posts
  • Gender:Male
  • Location:Georgia
  • Interests:preaching, programming, music, friends, outdoors, moves, books

Posted 14 October 2009 - 02:32 PM

I use this code to recieve utf8-text for what I am wanting to view:

function strip_html_tags($string) {

$string = preg_replace(
array(
'@<head[^>]*?>.*?</head>@siu',
),
array(
'',
'',
),
$string);

return strip_tags($string);

}

$url = "http://thechristianchat.com/echat45/public/rmessages.html";
$raw_file = file_get_contents($url);
preg_match( '@<meta\s+http-equiv="Content-Type"\s+content="([\w/]+)(;\s+charset=([^\s"]+))?@i',
	$raw_Text, $matches );
$encoding = $matches[3];
$utf8_text = iconv( $encoding, "utf-8", $raw_file );
$utf8_text = strip_html_tags( $utf8_text );
$utf8_text = htmlentities($utf8_text);
$utf8_text = html_entity_decode( $utf8_text, ENT_QUOTES, "UTF-8" );

and here is the output:

Occupants: [Reverse Message Order]
if (parent.frames[2].ignore.indexOf("|derek|") == -1) {document.write('derek - logged off - using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 on 10/14 at 2:43pm CST)'); }
if (parent.frames[2].ignore.indexOf("|derek|") == -1) {document.write('derek - logged on - using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 on 10/14 at 2:19pm CST)'); } gator has timed out.
if (parent.frames[2].ignore.indexOf("|gator|") == -1) {document.write('gator - blessings to everyone'); }
if (parent.frames[2].ignore.indexOf("|gator|") == -1) {document.write('gator - in facxt some areas in Iowa had some snow flurries down around interstate 80 moving east'); }
if (parent.frames[2].ignore.indexOf("|gator|") == -1) {document.write('gator - almot midnight and the weather here in iowa feels like thanksfiving week'); }

what I want to do is get rid of the:

[code]if (parent.frames[2].ignore.indexOf("|whatever|") == -1) {document.write('whatever - says');} but what I don't want to get rid of in this part of the string is the text between the single quotes in document.write... eg document.write('whatever - says'); I want to keep whatever - says just not all the crud around it... any suggestions?

#2 Demonslay

    P2L Jedi

  • Members
  • PipPipPip
  • 970 posts
  • Gender:Male
  • Location:A strange world where water falls out of the sky... for no reason.
  • Interests:Graphic Design, Coding, Splinter Cell, Cats

Posted 14 October 2009 - 07:21 PM

So are you wanting to get rid of any JavaScript, so it isn't executed by the browser? Simply add to your match array for preg_replace() anything in <script> tags. There may be other ways to make sure you aren't subject to any XSS attacks, but that should get you started. I would look more into other CMS systems and see what matches they do for user content, I know some of them can get rather complicated.

#3 Hayden

    P2L Jedi

  • Members
  • PipPipPip
  • 716 posts
  • Gender:Male
  • Location:Texas

Posted 18 October 2009 - 03:23 PM

Here's what I found out. I couldn't figure away to do it without multiple ereg_replace requests but...oh well.

$curl = new CURL;

$html = $curl->get('http://thechristianchat.com/echat45/public/rmessages.html');

$html = ereg_replace('<script[^>]+>.*</script>','',$html);
$html = ereg_replace('<script[^>]+>.*</script>','',$html);
$html = eregi_replace('<NOSCRIPT>.*</NOSCRIPT>','',$html);

cURL Class Source





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users