Are We Open? — Junghwan Park

The motivation is concrete and civic. When a citizen asks an AI assistant how to apply for unemployment benefits, the assistant needs to read the responsible ministry's current pages to answer correctly. If that site blocks AI crawlers in robots.txt, the assistant is left guessing from stale or wrong information. The project reframes "is your site AI-accessible" as a measurable public-accountability question and answers it transparently across the whole apparatus of Korean government: 19 ministries, the agencies and commissions under them, independent and constitutional bodies, the judiciary and prosecution, police, the legislature and regional councils, education offices, metropolitan and 227 basic local governments, and 344 public institutions, sourced from the Government Organization Act and the ALIO public-institution registry.

The evaluation has three stages. A Node.js analyzer collects each site's robots.txt, llms.txt, and sitemap.xml, and inspects the homepage for metadata and technical accessibility (HTTPS, response speed, server-side rendering). A robots parser checks a specific set of AI crawlers — GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, Google-Extended, CCBot, and Bytespider — against reference crawlers. The crucial second stage is a browser-based deep analysis that renders each homepage in a real browser to handle client-side-rendered sites, detects the press-release, notice, and policy links in the DOM, actually follows them through redirects to the final URL, and then judges whether that destination is disallowed by robots.txt. The worked example is the Health Ministry, whose press-release link redirects to a board endpoint that robots.txt disallows, so the content is ruled blocked even though the homepage itself is open. The third stage scores 100 points across six weighted categories, with hard ceilings: a blanket disallow caps a site at 25 points, and content paths found blocked in deep analysis score zero for that category.

The findings are sobering and grounded in the published data. Across 734 institutions the mean score is 56 out of 100, and the grade distribution is bottom-heavy: 8 A+, 50 A, 197 B+, 151 B, 105 C, 218 D, and 5 F. Ninety-four institutions block all crawlers outright with a blanket robots.txt disallow, and named ministries block their press releases and notices at the board-endpoint level. The site itself models the openness it advocates, serving its own permissive robots.txt, an llms.txt, and a sitemap.

As honest context, this is a snapshot — version 1, dated to its scan, and reproducible by a documented re-analysis procedure rather than a live continuous monitor. It is a static front end of plain HTML, CSS, and JavaScript on GitHub Pages over a JSON and CSV dataset, with the heavy lifting in offline Node and browser-automation scripts. The scoring weights are a deliberate editorial judgment about what matters for AI accessibility, not an official standard, and browser-based link detection can miss sites with unusual navigation, so the deep-analysis verdicts are best read as well-evidenced indicators rather than infallible per-site rulings.