It's been a wild few months building out Ava, my Telegram assistant. I wanted to share where things stand, what I've learned, and some patterns that have really paid off.

This is a follow-up to my earlier post on Ava's architecture, if you want to see how things started.

The Big Changes

Switched to Kimi K2.5 I moved off my previous AI provider and I'm now running on Kimi K2.5. The quality jump is noticeable, and at $0.60 per million input tokens, the pricing is pretty reasonable. It's been rock solid for both quick responses and longer reasoning tasks.

Supermemory for the Win I integrated supermemory.ai for memory, RAG, user profiles, connectors, and extractors. It's super plug-and-play and let me move away from my limited "Memory as JSON" structure. Having semantic search for past conversations has changed how the bot feels. Context actually works now.

Architecture Patterns That Work

Queue Per User I use asyncio queues for message processing, one queue per user. This prevents race conditions and ensures conversations flow sequentially. No more weird overlapping responses when someone sends multiple messages quickly.

Skill Registry Pattern Skills are markdown files with YAML frontmatter in a skills/ directory. I load metadata cheaply and only pull full content when needed. I also track which skills were used in each conversation turn, which helps with debugging and optimization.

Smart Memory Search The bot only searches memory when a message contains questions or memory-related keywords. This prevents unnecessary API calls while still maintaining context when it actually matters.

Provider Abstraction I built support for multiple AI providers with automatic fallback. Using OpenAI-compatible APIs wherever possible makes swapping providers much easier. When one goes down or gets expensive, I can route traffic elsewhere without rewriting everything.

Configuration & Error Handling

Environment-Based Config Everything lives in dotenv files. I also use compile-time feature flags to enable or disable integrations. This makes testing and deployment way cleaner.

Logging & Resilience I log everything with timestamps and context. On orphaned tool_use errors, the bot resets state gracefully. For webhooks, I use a sliding window rate limiter per IP. It handles abuse without being overly aggressive.

What I'd Avoid Next Time

In-memory-only state Use Redis if you're running multiple instances. I learned this the hard way.

Loading full context every request Cache your system instructions. Those tokens add up fast.

Single-threaded message processing Use queues and async. The performance difference is massive.

Hardcoded configuration Just use environment variables. You'll thank yourself later.

One Big Lesson

Work with an AI to improve your test coverage and set up a proper testing framework. When the bot can run its own tests and even write new ones, it avoids breaking itself in the future. This has been a huge force multiplier for stability.

What's Next

I'm excited to keep pushing on voice capabilities and see how the memory system evolves. The possibilities are making my head spin.

Good luck with your own builds, and have fun!