Background
While working on LLM for Software Security research, I identified a core pain point in PentestGPT: in multi-step penetration testing, the model tends to “forget” earlier context, leading to incoherent commands and redundant exploration of already-known paths.
Approach
Two modules were introduced to improve context retention:
1. RAG (Retrieval-Augmented Generation)
Store historical pentest steps and discovered vulnerability info in a vector database. Before generating each new command, retrieve relevant history and inject it into the prompt context.
results = vector_store.similarity_search(current_state, k=5)
context = "\n".join([r.page_content for r in results])
prompt = f"History context:\n{context}\n\nCurrent task: {task}"
2. Knowledge Graph Reasoning
Model the network topology, discovered services, and CVE relationships as a graph, allowing the model to perform multi-hop reasoning when planning attack paths — rather than starting from scratch each step.
Takeaways
- RAG noticeably reduces redundant probing
- Knowledge graphs show greater advantage in complex network scenarios
- Local deployment via Ollama enables fully offline operation — useful for sensitive testing environments
Lots of room for improvement, more experiments to come 🔧