How To Check For Banned Word On Page With Javascript ?

sunny_pro

New member
Joined
Jun 18, 2017
Messages
86
Points
0
Folks,

I'm trying to add a content filter (banned words filter) onto a web proxy. When the Javascript detects any of the banned words on the page, it should:

* As soon as a banned word is detected, it should not check for any more but immediately echo/print on screen/display a warning to the user with the first banned word spotted:
"Banned word "blah" found on page. You will be redirected to Google".
* Then, redirect to Google.
* Dump to database the banned word found, and the url of the page the banned word was found on. Eg.

Banned Word Found|Url|Date & Time of the Server
--------------------------------------------------------
ass|donkey-ass.com|25-01-2007, 03:00:00



The only requirement is that, I should be able to feed a list of banned words it should check for.

What would the code be to do all that ?
I'd like to code see samples.
I'm trying to build one in php but the php gurus say it is best the banned words checking is done on client-side. Else, the page would take too long to load (high cpu usage) if the checking is done on server side. I have no experience in writing Javascript code.

Thank You!
 

Rob Whisonant

Moderator
Joined
May 24, 2016
Messages
2,489
Points
113
I would not take this approach. Many people surf with java and javascript disabled. Nothing would run with these surfers. Plus, anytime you do something client side, it can be faked. In other words they could remove all banned words on the fly before your javascript ran. So a page with a banned word would appear to you as fine.
 

sunny_pro

New member
Joined
Jun 18, 2017
Messages
86
Points
0
sunny_pro
You see, I want to run a public web proxy but I do not want my users viewing sites containing banned words. If I have 1k banned word and a page contains 1k words then my current php filter has to loop 1k*1k=1m times on a single page to check for banned words. Now, imagine the cpu usage. My host likely to object. On top of that, when thousands of users simultaneously use my web proxy then you can imagine how slow my service would become to load pages.
That is why at Stack Over Flow the php guru who is helping me build the php filter, suggests I look into the JS filter instead.

This is shocking news for me indeed! Client-side can be faked ? How ? You JS experts care to elaborate more ? I have to learn this loophole to prevent maliciousness in the future.
So, what is the solution then ? I mean, the php filter would be too slow and the page would load slowly and my users would get fedup and ditch the free service. What would you do if you were in my position ?

I do not know JS atall. I'm struggling with php. Can't take 2 giants all at once. If I can find a ready made script online then using my common sense I maybe able to customise it according to my needs asking a little here and there on many forums and reading a little tutorials (but I do not have the time, nor the patience to learn the full language) then I'll share the final version on all forums. Thus, I contribute back to the forums that helped me a little.
I'll google and see if I can find any free samples. I might find a few on Stack Over Flow.

You know my php project is nearly finished. I added a link click logger onto a free web proxy to log what users are browsing (with their permission). I wanted to add a words filter (banned words) so users don't view forbidden sites nor my proxy logs them. However, the php forums failed to advise me that the filtering is best done at the client side rather than server side. Else, the service will become slow. I learnt this at Stack Over Flow. Even though my php words filter is finished. I reckon the JS word filter would be best for my project.

The owner of this fiddle just brought it to my attention that she has written a fiddle and to check it out:
http://jsfiddle.net/rMJxR/33/

Or, I can use the jQuery library she says:
https://github.com/ChaseFlorell/jQuery.ProfanityFilter

Frankly, I do not know what a fiddle is. Checking it out now.
Neither I have a single bit of experience with jQuery lib.

I hope it does the job. You experts are welcome to check and give your opinions as feed-back to me and all the future newbies finding their way here. :)
 

Rob Whisonant

Moderator
Joined
May 24, 2016
Messages
2,489
Points
113
Anything client side can be defeated or bypassed. No exceptions. For example, to bypass your javascript check, I simply go into the settings on my browser and turn off (disable) java and javascript.

You can also create browser plugins that can intercept your javascript calls and modify the data as seen fit. For example, I can have a page change all banned words to clean words as it loads. Then your proxy program never sees them.

To run the type of proxy you are talking, you are going to need a dedicated server just to start. If it gets very popular you will need a server farm eventually.
 

sunny_pro

New member
Joined
Jun 18, 2017
Messages
86
Points
0
sunny_pro
Mmm. Is there a way we can get the system to detect whether the js is being changed to change the banned words to clean words or not ?
I mean, I can provide the banned words in the array unencrypted (so it is easy for me to edit, add and delete banned words) and on another section I can provide the same list encrypted. The JS can unencrypt the crypted list and cross check. If both lists do not match then the JS will realise the banned words have been changed and perform some act based on some condition.
Would this work ?

Do not worry. When my proxy gets popular, I can migrate to a dedicated host. In the meanwhile if you know of any shared host that will allow me to run a web proxy, who won't charge an arm & a leg then you're welcome to make recommendations.

For the time being, let us work on this JS. We can think of security on how to prevent code injection or JS modifications later.
Now, why do you reckon this code does not work ? I only see a blank page.


Code:
<?php
/*
ERROR HANDLING
*/

$curl = curl_init();

curl_setopt($curl, CURLOPT_URL, 'https://www.buzzfeed.com/mjs538/the-68-words-you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X');
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );

$result = curl_exec($curl);
if (curl_errno($curl)) {
    echo 'Error:' . curl_error($curl);
}
$response = curl_getinfo( $curl );
if($response['http_code'] == '200' )
{
?>
<script>
$(function(){
		var pageData = "hello <a>example can have tags! swear word is </a><div> one..</div>";
   if(checkSwear(pageData)== false){
   //redirect to google
   }
});

function checkSwear(sentance) {
	
	var swear_words_arr=new Array("blow", "nut", "asshole");
  var regex = new RegExp('\\b(' + swear_words_arr.join('|') + ')\\b', 'i' );
  if(regex.test(sentance)) {
    alert("Please refrain from using offensive words"); /* + alert_text */
    return false;
  } else {
  //alert(1)
  	return true;
  }
}
</script>
<?php
}
curl_close($curl);
?>
 

Rob Whisonant

Moderator
Joined
May 24, 2016
Messages
2,489
Points
113
In one word .... Nope. You can edit the DOM as it loads in a browser from the client side. They could even remove all the words if they wanted and send you the index page of Disney instead.

Or they just disable javascript and your javascript does not even load.

Never trust client side processing for anything. It can always be changed and faked.

If you need something that can be trusted, it has to be done server side before you send the page to the client.
 
Older threads
Top