Cynicus RextoPrivacy@lemmy.mlEnglish·2 months agoHow to block AI Crawler Bots using robots.txt file(www.cyberciti.biz)external-linkarrow-up1110arrow-down132message-square64fedilink
arrow-up178arrow-down1external-linkHow to block AI Crawler Bots using robots.txt file(www.cyberciti.biz)Cynicus RextoPrivacy@lemmy.mlEnglish·2 months agomessage-square64fedilink
minus-squareCynicus RexOParrow-up12arrow-down3·2 months agolinkfedilink#TL;DR: User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: Google-Extended Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Amazonbot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Omgilibot Disallow: / User-Agent: FacebookBot Disallow: / User-Agent: Applebot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Bytespider Disallow: / User-agent: Claude-Web Disallow: / User-agent: Diffbot Disallow: / User-agent: ImagesiftBot Disallow: / User-agent: Omgilibot Disallow: / User-agent: Omgili Disallow: / User-agent: YouBot Disallow: /
minus-squaremoxarrow-up7arrow-down0·2 months agolinkfedilinkOf course, nothing stops a bot from picking a user agent field that exactly matches a web browser.
minus-squareJackbyDevEnglisharrow-up4arrow-down1·2 months agolinkfedilinkNothing stops a bot from choosing to not read robots.txt
minus-squaremoxarrow-up2arrow-down0·2 months agoedit-22 months agolinkfedilinkIndeed, as has already been said repeatedly in other comments.
#TL;DR:
User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: Google-Extended Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Amazonbot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Omgilibot Disallow: / User-Agent: FacebookBot Disallow: / User-Agent: Applebot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Bytespider Disallow: / User-agent: Claude-Web Disallow: / User-agent: Diffbot Disallow: / User-agent: ImagesiftBot Disallow: / User-agent: Omgilibot Disallow: / User-agent: Omgili Disallow: / User-agent: YouBot Disallow: /
Of course, nothing stops a bot from picking a user agent field that exactly matches a web browser.
Nothing stops a bot from choosing to not read robots.txt
Indeed, as has already been said repeatedly in other comments.