Navigating the Future of Federal Recordkeeping – How NARA Should Handle AI-Generated Records and Decisional Models

By Greg Godbout

As artificial intelligence (AI) becomes deeply embedded in how the U.S. federal government operates, questions around records management are gaining urgency. A recent article from the Texas State Library and Archives Commission reminds agencies that AI-assisted records must be treated like any other record—classified, retained, and managed under existing frameworks.

This insight is especially important for the National Archives and Records Administration (NARA), which oversees how federal agencies create, manage, and preserve records. As agencies deploy AI tools—such as machine learning models, generative language tools, and automated decision systems—NARA must evolve its guidance and oversight accordingly.

NARA’s Core Role in the Federal Recordkeeping Ecosystem

NARA is responsible for ensuring that all federal records—regardless of their format or method of creation—are appropriately managed. Under the Federal Records Act (FRA), records include “all recorded information… made or received by a Federal agency… in connection with the transaction of public business.”

That means AI-generated materials—forecasts, summaries, chat responses, even policy recommendations—are considered records if they contribute to government operations or decisions.

AI-Generated Outputs Are Federal Records

Following the Texas State Library’s logic, outputs created with AI (e.g., via tools like ChatGPTClaude, or Gemini) must be retained if:

  • They support decisions or actions.
  • They represent work done on behalf of an agency.
  • They document public policy or agency processes.

NARA should explicitly reaffirm this in updated guidance, stating that outputs from AI tools fall within existing record definitions.

Recordkeeping Requirements for AI Models

When agencies use AI models for tasks like forecasting, eligibility screening, or risk scoring, NARA should ensure the models themselves are recorded as systems of decision logic.

Essential record elements include:

  • Model Description: Purpose, objectives, and scope.
  • Training Data Sources: Including curation steps and exclusions.
  • Model Artifacts: Code, parameters, and version logs.
  • Testing & Validation: Performance, accuracy, bias audits.
  • Change Logs: Modifications, retraining events, and rationale.
  • Operational Logs: Use cases and outcomes over time.

This helps ensure transparency, reproducibility, and alignment with regulations such as the OMB’s AI Risk Management Framework (M-21-06) and the AI in Government Act.

Managing Prompts and Outputs from LLMs

Prompt-based tools like ChatGPT (OpenAI), Claude (Anthropic), Gemini (Google), LLaMA (Meta), Command R (Cohere), and Mistral present unique recordkeeping challenges. Since their responses are dynamically generated, agencies must preserve:

  • The Prompt: Especially if it initiated work on policy, research, or public communication.
  • The Output: When relied upon in decision-making or communication.
  • Contextual Metadata: Session info, user ID, model version, and date/time.

Agencies using such tools should log sessions and outputs using secure APIs or enterprise platforms to maintain continuity and auditability.

AI Making Decisions Without Humans in the Loop

When AI systems operate autonomously—approving applications, issuing warnings, or flagging noncompliance—NARA must ensure agencies document:

  • The Decision Event: Input data, timestamp, and action taken.
  • Automated vs. Human Override: If any occurred.
  • Decision Logic: How the AI aligns with policy or legal frameworks.
  • Error and Appeal Logs: Accuracy, rejections, overrides, or complaints.

This becomes vital in systems where rights, services, or funding are affected—such as in benefits determination or public safety applications.

A Proposed Framework for NARA AI Recordkeeping

Use Case Examples Required Records
AI-Generated Outputs Summaries, decisions, communications Prompt, output, metadata
Predictive & ML Models Risk scores, fraud detection Model code, training data, test results, documentation
Prompt-Based LLM Sessions ChatGPT, Gemini, Claude Full session logs, context, outputs
Automated Decisions (No Human) Eligibility approvals, compliance scoring Input/output logs, decision maps, policy traceability

Conclusion: Why NARA Must Lead

As more federal agencies adopt AI, NARA must modernize its guidance to ensure AI-produced knowledge is as traceable and accountable as human work. By extending existing frameworks, introducing new model retention standards, and providing clarity on generative tools, NARA can secure the integrity of public records in the algorithmic age.

As Texas rightly put it, “AI-generated records are still records.” Now, it’s NARA’s turn to define what that means for the future of federal governance.

About Greg Godbout

Greg Godbout is the CEO of Flamelit – a Data Science and AI/ML consultancy. He was the former Chief Growth Officer at Fearless; and formally the Chief Technology Officer (CTO) and U.S. Digital Services Lead at the EPA. Greg was the first Executive Director and Co-Founder of 18F, a 2013 Presidential Innovation Fellow, Day One Accelerator Fellow, GSA Administrator’s Award Recipient, and a The Federal 100 and Fedscoop 50 award recipient. He received a Masters in Management of IT from the University of Virginia, and a Masters in Business Analytics and AI from NYU.

LEAVE A REPLY

Please enter your comment!
Please enter your name here