Fatal error: Uncaught Error: Call to a member function find() on string in C:\xampp\

sunny_pro

New member
Joined
Jun 18, 2017
Messages
86
Points
0
Php Buddies,

What I am trying to do is learn to build a simple web crawler.
So at first, I will feed it a url to start with.
It will then fetch that page and extract all the links into a single array.
Then it will fetch each of those links pages and extract all their links into a single array likewise. It will do this until it reaches it's max link deep level.
Here is how I coded it:

PHP:
<?php 

include('simple_html_dom.php'); 

$current_link_crawling_level = 0; 
$link_crawling_level_max = 2

if($current_link_crawling_level == $link_crawling_level_max)
{
	exit(); 
}
else
{
	$url = 'https://www.yahoo.com'; 
	$curl = curl_init($url); 
	curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 
	curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1); 
	curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0); 
	curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0); 
	$html = curl_exec($curl); 
	
	$current_link_crawling_level++;	

	//to fetch all hyperlinks from the webpage 
	$links = array(); 
	foreach($html->find('a') as $a) 
	{ 
		$links[] = $a->href; 
		echo "Value: $value<br />\n"; 
		print_r($links); 
		
		$url = '$value'; 
		$curl = curl_init($value); 
	    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 
		curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1); 
		curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0); 
		curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0); 
		$html = curl_exec($curl); 

		//to fetch all hyperlinks from the webpage 
		$links = array(); 
		foreach($html->find('a') as $a) 
		{ 
			$links[] = $a->href; 
			echo "Value: $value<br />\n";
			print_r($links); 
			$current_link_crawling_level++;
		} 
	echo "Value: $value<br />\n";
	print_r($links);  
}

?>
I have a feeling I got confused and messed it up in the foreach loops. Nestled too much. Is that the case ? Hint where I went wrong.

Unable to test the script as I have to first sort out this error:
Fatal error: Uncaught Error: Call to a member function find() on string in C:\xampp\h

After that, I will be able to test it. Anyway, just looking at the script, you think I got it right or what ?

Thanks

I just replaced:

PHP:
//$html = file_get_html('http://example.com');
with:

PHP:
$url = 'https://www.yahoo.com'; 
$curl = curl_init($url); 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0); 
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0); 
$html = curl_exec($curl);
That is all!
That should not result in that error! :eek:

UPDATE:

I have been given this sample code just now ...
PHP:
Possible solution with str_get_html:

$url = 'https://www.yahoo.com'; 
$curl = curl_init($url); 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0); 
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0); 
$response_string = curl_exec($curl); 

$html = str_get_html($response_string);

//to fetch all hyperlinks from a webpage 
$links = array(); 
foreach($html->find('a') as $a) { 
    $links[] = $a->href; 
} 
print_r($links); 
echo "<br />";
Gonna experiment with it.
Just sharing it here for other future newbies! :)

I am told:
"file_get_html is a special function from simple_html_dom library. If you open source code for simple_html_dom you will see that file_get_html() does a lot of things that your curl replacement does not. That's why you get your error."

Anyway, folks, I really don't wanna be using this limited capacity file_get_html() and so let's replace it with cURL. I tried my best in giving a shot at cURL here. What-about you ? Care to show how to fix this thingY ?

I did a search on the php manual for str_get_html to be sure what the function does. But, I am shown no results.
And so, I ask: Just what does it do ?

Php Buddies,

Look at these 2 updates. They both succeed in fetching the php manual page but fail to fetch the yahoo homepage. Why is that ?
The 2nd script is like the 1st one except a small change. Look at the commented-out parts in script 2 to see the difference. The added code comes after the commented-out code part.

SCRIPT 1
PHP:
<?php 

//HALF WORKING

include('simple_html_dom.php'); 

$url = 'http://php.net/manual-lookup.php?pattern=str_get_html&scope=quickref'; // WORKS ON URL
//$url = 'https://yahoo.com'; // FAILS ON URL

$curl = curl_init($url); 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0); 
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0); 
$response_string = curl_exec($curl); 

$html = str_get_html($response_string);

//to fetch all hyperlinks from a webpage 
$links = array(); 
foreach($html->find('a') as $a) { 
    $links[] = $a->href; 
} 
print_r($links); 
echo "<br />"; 


?>
SCRIPT 2
PHP:
<?php 

//HALF WORKING

include('simple_html_dom.php'); 

$url = 'http://php.net/manual-lookup.php?pattern=str_get_html&scope=quickref'; // WORKS ON URL
//$url = 'https://yahoo.com'; // FAILS ON URL
$curl = curl_init($url); 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0); 
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0); 
$response_string = curl_exec($curl); 

$html = str_get_html($response_string);

/*
//to fetch all hyperlinks from a webpage 
$links = array(); 
foreach($html->find('a') as $a) { 
    $links[] = $a->href; 
} 
print_r($links); 
echo "<br />"; 
*/

// Hide HTML warnings
libxml_use_internal_errors(true);
$dom = new DOMDocument;
if($dom->loadHTML($html, LIBXML_NOWARNING)){
    // echo Links and their anchor text
    echo '<pre>';
    echo "Link\tAnchor\n";
    foreach($dom->getElementsByTagName('a') as $link) {
        $href = $link->getAttribute('href');
        $anchor = $link->nodeValue;
        echo $href,"\t",$anchor,"\n";
    }
    echo '</pre>';
}else{
    echo "Failed to load html.";

}

?>
Don't forget my previous post!

Cheers!
 
Older threads
Latest threads
Replies
0
Views
31
Replies
0
Views
33
Replies
1
Views
42
Replies
3
Views
106
Recommended threads
Replies
2
Views
3,804
Replies
1
Views
3,306
Replies
1
Views
2,732
Replies
1
Views
2,978
Top