{"id":9958,"date":"2025-08-06T09:55:55","date_gmt":"2025-08-06T09:55:55","guid":{"rendered":"http:\/\/localhost\/?p=9958"},"modified":"2025-08-06T09:55:55","modified_gmt":"2025-08-06T09:55:55","slug":"perplexity-ai-ignores-no-crawling-rules-on-websites-crawls-them-anyway","status":"publish","type":"post","link":"https:\/\/zero.redgem.net\/?p=9958","title":{"rendered":"Perplexity AI ignores no-crawling rules on websites, crawls them anyway"},"content":{"rendered":"<h2>Security Update News<\/h2>\n<h3>Update Information<\/h3>\n<table style=\"width:100%; border-collapse: collapse; margin-bottom: 20px;\">\n<tr>\n<th style=\"text-align: left; padding: 8px; border: 1px solid #ddd; \">Title<\/th>\n<td style=\"padding: 8px; border: 1px solid #ddd;\">Perplexity AI ignores no-crawling rules on websites, crawls them anyway<\/td>\n<\/tr>\n<tr>\n<th style=\"text-align: left; padding: 8px; border: 1px solid #ddd; \">Update ID<\/th>\n<td style=\"padding: 8px; border: 1px solid #ddd;\">MALWAREBYTES:84799232104A5D1CC024F253C1B9123E<\/td>\n<\/tr>\n<tr>\n<th style=\"text-align: left; padding: 8px; border: 1px solid #ddd; \">Type<\/th>\n<td style=\"padding: 8px; border: 1px solid #ddd;\">malwarebytes<\/td>\n<\/tr>\n<tr>\n<th style=\"text-align: left; padding: 8px; border: 1px solid #ddd; \">Published<\/th>\n<td style=\"padding: 8px; border: 1px solid #ddd;\">2025-08-06T12:45:05<\/td>\n<\/tr>\n<tr>\n<th style=\"text-align: left; padding: 8px; border: 1px solid #ddd; \">Last Updated<\/th>\n<td style=\"padding: 8px; border: 1px solid #ddd;\">2025-08-06T12:45:05<\/td>\n<\/tr>\n<\/table>\n<h3>Security Impact<\/h3>\n<table style=\"width:100%; border-collapse: collapse; margin-bottom: 20px;\">\n<tr>\n<th style=\"text-align: left; padding: 8px; border: 1px solid #ddd; \">Severity<\/th>\n<td style=\"padding: 8px; border: 1px solid #ddd; color: #666666; font-weight: bold;\">NONE<\/td>\n<\/tr>\n<\/table>\n<h3>Update Details<\/h3>\n<div style=\"; padding: 15px; border-left: 4px solid #4CAF50; margin-bottom: 20px;\">\nImagine putting up a no-trespassing sign for people walking their dogs, and then finding out that one person dresses up their Great Dane as a calf and walks it on your grounds.<\/p>\n<p>Well that&#8217;s sort of what AI answer engine Perplexity has been doing, by evading the no-crawl directives of websites, according to Cloudflare.<\/p>\n<p>The no-trespassing sign in this case would be a robots.txt file\u2014a small text file placed on a website that tells search engines and other automated tools (often called &#8220;bots&#8221; or &#8220;crawlers&#8221;) which pages or sections of the site they are allowed to access and which parts they should not visit.<\/p>\n<p>Cloudflare started an investigation after customers complained that Perplexity accessed their content despite the fact that they had disallowed Perplexity crawling activity in their robots.txt files. They had also created Web Application Firewall (WAF) rules to specifically block both of Perplexity\u2019s declared crawlers: PerplexityBot and Perplexity-User.<\/p>\n<p>So Cloudflare set up some test domains and queried Perplexity about them to see if it would return the information that it shouldn\u2019t be able to access. What the researchers found was surprising, because it looks like Perplexity uses a very deliberate system to bypass the no-crawling directives.<\/p>\n<p>The researchers discovered the use of a user-agent designed to impersonate Google Chrome on macOS (to look like a regular user) when their declared crawler was blocked. The User Agent String shows the website you\u2019re visiting some basic information about your browser and operating system, so the site can provide an experience optimized for that setup. The Perplexity crawler camouflaged by this User Agent String used regularly changing IP-addresses outside of Perplexity\u2019s official IP range, allowing it to successfully access the content it wasn&#8217;t supposed to.<\/p>\n<p>I wondered Perplexity&#8217;s view on the whole thing, so I asked it whether it knew what a robots.txt file was for and why it was bad to ignore one. It returned the following:  <\/p>\n<p>![Perplexity answers why it&#8217;s wrong to bypass robots.txt files](https:\/\/www.malwarebytes.com\/wp-content\/uploads\/sites\/2\/2025\/08\/It_knows.png)<\/p>\n<p>> Why is it wrong to ignore robots.txt?<br \/>> <br \/>>   * Respect for Privacy and Security: Some parts of a website may contain private, sensitive, or unfinished content not meant for public search engines. Ignoring robots.txt can expose data that should remain private.<br \/>>   * Fair Resource Usage: Crawling websites takes up bandwidth and server resources. By telling robots where not to go, website owners can prevent wasteful use of these resources, ensuring the site runs smoothly for real users.<br \/>>   * Legal and Ethical Standards: Ignoring robots.txt can be considered unethical, and in certain circumstances, it might even break terms of service or data protection regulations depending on what information is accessed and how it&#8217;s used.<br \/>>   * Trust and Reputation: If a robot repeatedly ignores robots.txt, it can be banned or blacklisted by websites, and its reputation could suffer in the public and professional community.<br \/>> <\/p>\n<p>The official reaction of Perplexity is somewhat different. It hides behind the difference between a regular bot and an AI agent that is on a quest for a user to answer a specific question.<\/p>\n<p>> The bluster around this issue reveals that Cloudflare\u2019s leadership is either dangerously misinformed on the basics of AI, or simply more flair than cloud. https:\/\/t.co\/NgliGZCspP<br \/>> <br \/>> &#8212; Perplexity (@perplexity_ai) August 5, 2025<\/p>\n<p>Perplexity reasons that:<\/p>\n<p>> \u201cModern AI assistants work fundamentally differently from traditional web crawling. When you ask Perplexity a question that requires current information\u2014say, &#8220;What are the latest reviews for that new restaurant?&#8221;\u2014the AI doesn&#8217;t already have that information sitting in a database somewhere. Instead, it goes to the relevant websites, reads the content, and brings back a summary tailored to your specific question.<br \/>> <br \/>> This is fundamentally different from traditional web crawling, in which crawlers systematically visit millions of pages to build massive databases, whether anyone asked for that specific information or not.\u201d<\/p>\n<p>Although I see Perplexity&#8217;s point, there is a big difference between crawling websites to gather as much information as you can and seeking to answer a specific question for one user, the decision whether a website owner wants to allow either is up to them. And there should be no need for sneaking around.<\/p>\n<p>So why not create a User Agent String that tells website owners \u201cthis is just a short visit to find some specific information\u201d to discern it from actual crawlers that siphon up every bit they can find, and then let the website owners decide whether they will allow them or not?<\/p>\n<p>Either way, this discussion seems far from over, and with the rise of AI agents we will probably see problems arise that were not on the radar before we all started using AI.<\/p>\n<p>* * *<\/p>\n<p>**We don &#8216;t just report on data privacy\u2014we help you remove your personal information**<\/p>\n<p>Cybersecurity risks should never spread beyond a headline. With Malwarebytes Personal Data Remover, you can scan to find out which sites are exposing your personal information, and then delete that sensitive data from the internet.\n<\/p><\/div>\n<p><a href=\"https:\/\/www.malwarebytes.com\/blog\/news\/2025\/08\/perplexity-ai-ignores-no-crawling-rules-on-websites-crawls-them-anyway\" target=\"_blank\" style=\"display: inline-block; color: white; padding: 10px 20px; text-decoration: none; border-radius: 4px;\">View Advisory Details<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Security Update News Update Information Title Perplexity AI ignores no-crawling rules on websites, crawls them anyway Update ID MALWAREBYTES:84799232104A5D1CC024F253C1B9123E Type malwarebytes Published 2025-08-06T12:45:05 Last Updated&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[6,8,12,115,13,33,7,11,5],"class_list":["post-9958","post","type-post","status-publish","format-standard","hentry","category-category_news","tag-cve","tag-cvss","tag-exploit","tag-malwarebytes","tag-news","tag-none","tag-security","tag-tapic","tag-vulnerability"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Perplexity AI ignores no-crawling rules on websites, crawls them anyway - zero redgem<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/zero.redgem.net\/?p=9958\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Perplexity AI ignores no-crawling rules on websites, crawls them anyway - zero redgem\" \/>\n<meta property=\"og:description\" content=\"Security Update News Update Information Title Perplexity AI ignores no-crawling rules on websites, crawls them anyway Update ID MALWAREBYTES:84799232104A5D1CC024F253C1B9123E Type malwarebytes Published 2025-08-06T12:45:05 Last Updated...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/zero.redgem.net\/?p=9958\" \/>\n<meta property=\"og:site_name\" content=\"zero redgem\" \/>\n<meta property=\"article:published_time\" content=\"2025-08-06T09:55:55+00:00\" \/>\n<meta name=\"author\" content=\"invoker\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"invoker\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/?p=9958#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/?p=9958\"},\"author\":{\"name\":\"invoker\",\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/#\\\/schema\\\/person\\\/fbfeae8dfad117ac08a7621bee1a1dca\"},\"headline\":\"Perplexity AI ignores no-crawling rules on websites, crawls them anyway\",\"datePublished\":\"2025-08-06T09:55:55+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/?p=9958\"},\"wordCount\":899,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/#organization\"},\"keywords\":[\"CVE\",\"CVSS\",\"exploit\",\"malwarebytes\",\"news\",\"NONE\",\"Security\",\"tapic\",\"Vulnerability\"],\"articleSection\":[\"category_news\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/zero.redgem.net\\\/?p=9958#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/?p=9958\",\"url\":\"https:\\\/\\\/zero.redgem.net\\\/?p=9958\",\"name\":\"Perplexity AI ignores no-crawling rules on websites, crawls them anyway - zero redgem\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/#website\"},\"datePublished\":\"2025-08-06T09:55:55+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/?p=9958#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/zero.redgem.net\\\/?p=9958\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/?p=9958#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/zero.redgem.net\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Perplexity AI ignores no-crawling rules on websites, crawls them anyway\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/#website\",\"url\":\"https:\\\/\\\/zero.redgem.net\\\/\",\"name\":\"zero redgem\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/zero.redgem.net\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/#organization\",\"name\":\"zero redgem\",\"url\":\"https:\\\/\\\/zero.redgem.net\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"\",\"contentUrl\":\"\",\"width\":191,\"height\":188,\"caption\":\"zero redgem\"},\"image\":{\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/zero.redgem.net\\\/#\\\/schema\\\/person\\\/fbfeae8dfad117ac08a7621bee1a1dca\",\"name\":\"invoker\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f17c01d7338e6932bcde121cf83569393df3374625d25afd62677cfb528f2e3e?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f17c01d7338e6932bcde121cf83569393df3374625d25afd62677cfb528f2e3e?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f17c01d7338e6932bcde121cf83569393df3374625d25afd62677cfb528f2e3e?s=96&d=mm&r=g\",\"caption\":\"invoker\"},\"sameAs\":[\"https:\\\/\\\/zero.redgem.net\"],\"url\":\"https:\\\/\\\/zero.redgem.net\\\/?author=1\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Perplexity AI ignores no-crawling rules on websites, crawls them anyway - zero redgem","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/zero.redgem.net\/?p=9958","og_locale":"en_US","og_type":"article","og_title":"Perplexity AI ignores no-crawling rules on websites, crawls them anyway - zero redgem","og_description":"Security Update News Update Information Title Perplexity AI ignores no-crawling rules on websites, crawls them anyway Update ID MALWAREBYTES:84799232104A5D1CC024F253C1B9123E Type malwarebytes Published 2025-08-06T12:45:05 Last Updated...","og_url":"https:\/\/zero.redgem.net\/?p=9958","og_site_name":"zero redgem","article_published_time":"2025-08-06T09:55:55+00:00","author":"invoker","twitter_card":"summary_large_image","twitter_misc":{"Written by":"invoker","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/zero.redgem.net\/?p=9958#article","isPartOf":{"@id":"https:\/\/zero.redgem.net\/?p=9958"},"author":{"name":"invoker","@id":"https:\/\/zero.redgem.net\/#\/schema\/person\/fbfeae8dfad117ac08a7621bee1a1dca"},"headline":"Perplexity AI ignores no-crawling rules on websites, crawls them anyway","datePublished":"2025-08-06T09:55:55+00:00","mainEntityOfPage":{"@id":"https:\/\/zero.redgem.net\/?p=9958"},"wordCount":899,"commentCount":0,"publisher":{"@id":"https:\/\/zero.redgem.net\/#organization"},"keywords":["CVE","CVSS","exploit","malwarebytes","news","NONE","Security","tapic","Vulnerability"],"articleSection":["category_news"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/zero.redgem.net\/?p=9958#respond"]}]},{"@type":"WebPage","@id":"https:\/\/zero.redgem.net\/?p=9958","url":"https:\/\/zero.redgem.net\/?p=9958","name":"Perplexity AI ignores no-crawling rules on websites, crawls them anyway - zero redgem","isPartOf":{"@id":"https:\/\/zero.redgem.net\/#website"},"datePublished":"2025-08-06T09:55:55+00:00","breadcrumb":{"@id":"https:\/\/zero.redgem.net\/?p=9958#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/zero.redgem.net\/?p=9958"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/zero.redgem.net\/?p=9958#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/zero.redgem.net\/"},{"@type":"ListItem","position":2,"name":"Perplexity AI ignores no-crawling rules on websites, crawls them anyway"}]},{"@type":"WebSite","@id":"https:\/\/zero.redgem.net\/#website","url":"https:\/\/zero.redgem.net\/","name":"zero redgem","description":"","publisher":{"@id":"https:\/\/zero.redgem.net\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/zero.redgem.net\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/zero.redgem.net\/#organization","name":"zero redgem","url":"https:\/\/zero.redgem.net\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/zero.redgem.net\/#\/schema\/logo\/image\/","url":"","contentUrl":"","width":191,"height":188,"caption":"zero redgem"},"image":{"@id":"https:\/\/zero.redgem.net\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/zero.redgem.net\/#\/schema\/person\/fbfeae8dfad117ac08a7621bee1a1dca","name":"invoker","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/f17c01d7338e6932bcde121cf83569393df3374625d25afd62677cfb528f2e3e?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/f17c01d7338e6932bcde121cf83569393df3374625d25afd62677cfb528f2e3e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f17c01d7338e6932bcde121cf83569393df3374625d25afd62677cfb528f2e3e?s=96&d=mm&r=g","caption":"invoker"},"sameAs":["https:\/\/zero.redgem.net"],"url":"https:\/\/zero.redgem.net\/?author=1"}]}},"_links":{"self":[{"href":"https:\/\/zero.redgem.net\/index.php?rest_route=\/wp\/v2\/posts\/9958","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/zero.redgem.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/zero.redgem.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/zero.redgem.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/zero.redgem.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=9958"}],"version-history":[{"count":0,"href":"https:\/\/zero.redgem.net\/index.php?rest_route=\/wp\/v2\/posts\/9958\/revisions"}],"wp:attachment":[{"href":"https:\/\/zero.redgem.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=9958"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/zero.redgem.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=9958"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/zero.redgem.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=9958"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}