Using robots.txt to block URLs with *

Marc van Leeuwen

Premium Member
Joined
May 29, 2016
Messages
1,127
Points
63
Can you tell me how to block this type of URL using robots.txt

example.com/topic/this-is-web-page-1/reply
example.com/topic/this-is-web-page-2/reply
example.com/topic/this-is-web-page-3/reply
example.com/topic/this-is-web-page-4/reply

I mean it will block URLs with end of "reply"

but still allowing Google spiders to crawl these pages

example.com/topic/this-is-web-page-1/
example.com/topic/this-is-web-page-2/
example.com/topic/this-is-web-page-3/
example.com/topic/this-is-web-page-4/

What is your solution?
 

Rob Whisonant

Moderator
Joined
May 24, 2016
Messages
2,485
Points
113
One thing to keep in mind when using wildcards in robots.txt files. Not all crawlers and bots know how to interpret wildcards.
 

Marc van Leeuwen

Premium Member
Joined
May 29, 2016
Messages
1,127
Points
63
Marc van Leeuwen
I understand because wildcards will only work with Google or a few search engines know it.
I tested and found out the answer for my question above, just putting this code into robots.txt and it worked.
Code:
Disallow: /threads/*/reply
 
Latest threads
Recommended threads
Replies
7
Views
4,456
Replies
1
Views
1,893
Replies
2
Views
2,923
Replies
25
Views
10,549
Replies
4
Views
4,903

Referral contests

Referral link for :

Sponsors

Popular tags

You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an alternative browser.

Top