GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents
GateMem is a benchmark evaluating LLM agents deployed in multi-user institutional settings (hospitals, offices, schools) on three competing goals: utility for legitimate requests, role-based access control, and reliable data deletion. Testing across all current methods reveals none simultaneously achieve all three properties, exposing a critical gap before real institutional deployment.
Why it matters
First systematic benchmark for memory governance in shared-agent deployments; directly relevant to enterprise safety and compliance as agentic systems enter regulated environments
Importance: 3/5
63 upvotes on HF Daily Papers; novel benchmark on a previously unmeasured safety dimension for production agents