Sitemap: https://hyperborea.org/sitemap.xml User-Agent: lmspider Disallow: / # It's a shopping search engine. Nothing's relevant except maybe the comics list User-Agent: BecomeBot Disallow: / # AI bots User-agent: AI2Bot User-agent: Ai2Bot-Dolma User-agent: aiHitBot User-agent: Amazonbot User-agent: Andibot User-agent: anthropic-ai User-agent: Applebot User-agent: Applebot-Extended User-agent: bedrockbot User-agent: Brightbot 1.0 User-agent: Bytespider User-agent: CCBot User-agent: ChatGPT-User User-agent: Claude-SearchBot User-agent: Claude-User User-agent: Claude-Web User-agent: ClaudeBot User-agent: cohere-ai User-agent: cohere-training-data-crawler User-agent: Cotoyogi User-agent: Crawlspace User-agent: Diffbot User-agent: DuckAssistBot User-agent: EchoboxBot User-agent: FacebookBot User-agent: Factset_spyderbot User-agent: FirecrawlAgent User-agent: FriendlyCrawler User-agent: Google-CloudVertexBot User-agent: Google-Extended User-agent: GoogleOther User-agent: GoogleOther-Image User-agent: GoogleOther-Video User-agent: GPTBot User-agent: iaskspider/2.0 User-agent: ICC-Crawler User-agent: ImagesiftBot User-agent: img2dataset User-agent: ISSCyberRiskCrawler User-agent: Kangaroo Bot User-agent: meta-externalagent User-agent: Meta-ExternalAgent User-agent: meta-externalfetcher User-agent: Meta-ExternalFetcher User-agent: MistralAI-User/1.0 User-agent: MyCentralAIScraperBot User-agent: NovaAct User-agent: OAI-SearchBot User-agent: omgili User-agent: omgilibot User-agent: Operator User-agent: PanguBot User-agent: Panscient User-agent: panscient.com User-agent: Perplexity-User User-agent: PerplexityBot User-agent: PetalBot User-agent: PhindBot User-agent: Poseidon Research Crawler User-agent: QualifiedBot User-agent: QuillBot User-agent: quillbot.com User-agent: SBIntuitionsBot User-agent: Scrapy User-agent: SemrushBot User-agent: SemrushBot-BA User-agent: SemrushBot-CT User-agent: SemrushBot-OCOB User-agent: SemrushBot-SI User-agent: SemrushBot-SWA User-agent: Sidetrade indexer bot User-agent: TikTokSpider User-agent: Timpibot User-agent: VelenPublicWebCrawler User-agent: Webzio-Extended User-agent: wpbot User-agent: YandexAdditional User-agent: YandexAdditionalBot User-agent: YouBot Disallow: /journal Disallow: /writing Disallow: /temp/ Disallow: /stuff/ Disallow: /utils/ Disallow: /usage/ Disallow: /cgi-bin/ Disallow: /selling/ Disallow: /ebay/ Disallow: /flash/bigimage.php Disallow: /flash/image.php Disallow: /flash/drzoom.cgi Disallow: /flash/drzoom.php Disallow: /journal/archives Disallow: /journal/wp-comments-popup.php Disallow: /journal/wp-commentsrss2.php Disallow: /journal/wp-trackback.php Disallow: /journal/wp-login.php Disallow: /journal/wp-mobile.php Disallow: /mirror/ Disallow: /latest.html User-agent: * Disallow: /temp/ Disallow: /stuff/ Disallow: /utils/ Disallow: /usage/ Disallow: /cgi-bin/ Disallow: /selling/ Disallow: /ebay/ Disallow: /flash/bigimage.php Disallow: /flash/image.php Disallow: /flash/drzoom.cgi Disallow: /flash/drzoom.php Disallow: /journal/archives Disallow: /journal/wp-comments-popup.php Disallow: /journal/wp-commentsrss2.php Disallow: /journal/wp-trackback.php Disallow: /journal/wp-login.php Disallow: /journal/wp-mobile.php Disallow: /mirror/ Disallow: /latest.html # URLs that keep getting requested despite being invalid. Causes include: # - Broken spiders that don't handle links/image maps/scripts/base correctly # - Typos in links on other sites # - Old removed files that don't need redirects # - Typos and bugs on this site that have since been corrected # - Most importantly, crawlers that won't remove bad URLs from their database. Disallow: /journal/b2login Disallow: /journal/b2comments Disallow: /flash/%28 Disallow: /flash/0 Disallow: /flash/1 Disallow: /flash/2 Disallow: /flash/3 Disallow: /flash/4 Disallow: /flash/5 Disallow: /flash/6 Disallow: /flash/7 Disallow: /flash/8 Disallow: /flash/9 # Google has picked up a strange typo... and Apache just serves up the file, # relative links and all. Disallow: /humor/comiccon2004.phtml/ # Googlebot is trying to load the Social login buttons. This should take care of it. Disallow: /journal/index.php?social_controller