How To Weed-out Empty Values From Array Values ?

sunny_pro

New member
Joined
Jun 18, 2017
Messages
86
Points
0
Php Folks,

How to weed-out empty array values ?

PHP:
print_r(array_filter($keywords_array, 'strlen'));
The above example from the following link did not work.
https://stackoverflow.com/questions/3654295/remove-empty-array-elements

My code so far. Building a web crawler. It crawls your page and notes the keywords & links and counts them. Not fully finished.
Look at the attached image and you will notice blank values on the column "keywords". That is due to array values being empty.
Therefore, need to weed-out the empty values from the array values before dumping the array values onto mysql tbl.

PHP:
<?php 

//Required PHP Files.
include 'config.php';
include 'header.php';

//1). Set Banned Words.
$banned_words = array("asshole", "nut", "bullshit");

$url = 'https://www.york.ac.uk/teaching/cws/wws/webpage1.html';
// 2). $curl is going to be data type curl resource.
$curl = curl_init();

// 3). Set cURL options.
curl_setopt($curl, CURLOPT_URL, "$url");
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

// 4). Run cURL (execute http request).
$result = curl_exec($curl);

if (curl_errno($curl))
{
	echo 'Error:' . curl_error($curl);
}

$response = curl_getinfo( $curl );

//If page is fetched then replace banned words found on page.
if($response['http_code'] == '200' )
{   
	$regex = '/\b';
	$regex .= implode('\b|\b', $banned_words);
	$regex .= '\b/i';
	$substitute = 'BANNED WORD REPLACED';
	$clean_result = preg_replace($regex, $substitute, $result);
	//Present the banned words filtered webpage.
	echo $clean_result;
}
else
{
	//Show error if page fetching fails.
	echo "Page fetching problem!";
	echo "$response[http_code]";
	exit();
}

curl_close($curl);

//Define Variables
	$keywords_number = "0";
	$keywords_count = "0";
	$links_count = "0";
	$keywords_links_count = "0";
	$images_count = "0";
	$keywords_images_count = "0";
	$keywords_internal_links_count = "0";
	$keywords_external_links_count = "0";

//Link Exractor starts here. It will extract all links present on the page.
function linkExtractor($clean_result)
{	
    $linkArray = array();
    if(preg_match_all('/<a\s+.*?href=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/a>/i', $clean_result, $link_matches, PREG_SET_ORDER))
	{
		foreach ($link_matches as $link_match) 
		{   	   
			GLOBAL $url,$links_count,$keywords_links_count,$images_count,$keywords_images_count,$keywords_internal_links_count,$keywords_external_links_count;
	   
			echo "url: $url<br>";
			echo "link_match: $link_match[links_count]<br>";
			$links_count++;
			echo "links_count: $links_count++<br>";
			$keywords_links_count++;
			echo "keywords_links_count: $keywords_links_count++<br>";
			$images_count++;
			echo "images_count: $images_count++<br>";
			$keywords_images_count++;
			echo "keywords_images_count: $keywords_images_count++<br>";
			$keywords_internal_links_count++;
			echo "keywords_internal_links_count: $keywords_internal_links_count++<br>";
			$keywords_external_links_count++;
			echo "keywords_external_links_count: $keywords_external_links_count++<br>";		  
       }
	}
    return $linkArray;
}
echo '<pre>' . print_r(linkExtractor($clean_result), true) . '<pre>';


//Content Filter starts here to check for banned words present on the page.
$keywords_array = explode(" ", $clean_result);

$keywords_count = "0";
foreach($keywords_array as $keyword) 
{   
	echo $keyword."\n";
	echo "keyword: $keyword<br>";
	$keywords_count++;
	echo "Keywords_count: $keywords_count++<br>";
	
	print_r(array_filter($keywords_array, 'strlen'));
}



foreach($keywords_array as $keyword) 
{   
	$keywords_number++;
		
	//Insert the user's inputs into Mysql database using php's sql injection prevention method "Prepared Statements".
	$stmt = mysqli_prepare($conn, "INSERT INTO searchengine_index(url,keywords,keywords_number,keywords_count,links,links_count,keywords_links_count,images_count,keywords_images_count,keywords_internal_links_count,keywords_external_links_count) VALUES (?,?,?,?,?,?,?,?,?,?,?)");
	
	GLOBAL $url,$keywords_number,$links_count,$keywords_links_count,$images_count,$keywords_images_count,$keywords_internal_links_count,$keywords_external_links_count;
	
	mysqli_stmt_bind_param($stmt, 'ssisiiiiiii', $url,$keyword,$keywords_number,$keywords_count,$link_match[$keywords_links_count],$links_count,$keywords_links_count,$images_count,$keywords_images_count,$keywords_internal_links_count,$keywords_external_links_count);
	mysqli_stmt_execute($stmt);
			
	//Check if data was successfully submitted or not.
	if(!$stmt)
	{
		echo "Sorry! Our system is currently experiencing a problem indexing your website. We will try some other time!";
		exit();
	}	
}

?>
And, I get this error:

Notice: Undefined index: links_count in C:\xampp\htdocs\test\crawler.php on line 71

How to rid this error ? Wanting to echo each array values in the foreach loop.
Line 71:
PHP:
echo "link_match: $link_match[links_count]<br>";
And, I don't know why the "url_indexing_date" column showing zero values. I got another tbl that shows the dates in such a column.

EMPTY_ROWS.jpg

I will need to find a regex to weed-out the html tags so they don't get dumped into the "keywords" column in the tbl but only the keywords extracted from the webpage content that the visitor sees.
 
Newer threads
Replies
6
Views
1,885
Replies
5
Views
1,529
wms
Replies
6
Views
3,510
Latest threads
Replies
4
Views
659
Replies
0
Views
96
Replies
1
Views
136
Replies
1
Views
133

Referral contests

Referral link for :

Sponsors

Popular tags

You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an alternative browser.

Top