Unleashing autonomous agents: Is the power of brainy bots worth it?

Autonomous agents (ChatGPT on steroids) have taken the world by storm with their promise of being a more intelligent, high agency version of traditional agents. The concept is fairly simple:

Traditional agents (ChatGPT) leverage LLM as a reasoning engine, tools (API, internet search) for execution and memory for context. In this case, task completion requires back and forth interaction and collaboration between a human (or primary API) and agent, which leads to a) relatively simpler tasks b) need for constant external feedback .

Auto agents (AutoGPT, BabyAGI, Jarvis by Microsoft) are AI agents having the ability to take self/ auto prompts (instead of relying on human feedback/ inputs), making them autonomous thinkers and task executors. This powerful theme yields ability to –

Process complex tasks by breaking into sub tasks and identifying methods to solve each
Create better quality output as agent is responsible for iterating / triangulating
Execute task automatically if linked to APIs (permissionless execution)

Because of auto agent’s ability to self prompt and run a recursive loop of prompts, theoretically a superior version of this would have the ability to think/ learn and take its own feedback. E.g.,

ChatGPT – Help me find Michelin 3 star restaurant in New York
AutoGPT – Help me find Michelin 3 star restaurant in New York, search for availability and book for the week of 15th may and send a mail invite to John, Peter

Use cases – Self prompting provides auto agents with an ability to execute complex problems as well as “think” for itself (sub-tasks to be executed, prioritization, critique) with limited human handholding. Currently most applications seem like magic but are still early versions of toys – restricted to personal assistants helping in research work (what positioning should a new age IT service company in India have?) or shopping automation. Variations of these are specifically being built for channels like Slack, Discord or web browsing, but the use cases are still limited.

Is it worth the hype? Although the concept of auto and self-prompting is a powerful one as it provides “high agency” and thinking capability to agents, but the current model efficacy hasn’t been effective. With a <15% success rate in complex task completion, the model is plagued with challenges –

Foundational models (GPT 3.5/ 4) are trained on generic, comprehensive data to give probabilistic results vs deterministic. Additionally, often times these models hallucinate. With the recursive infinite loop of AutoGPT, these yields to compounding of hallucinated false answers and potential of being stuck in infinite loops.
High cost resulting from excessively large API calls to LLMs

But let’s not forget that it’s been only five weeks since the release of these projects and they’ve been the fastest to reach 100k+ stars on Github. The AutoGPT hype is not unreasonable, but more forward-looking to appreciate how swiftly it got created and how powerful the theme of self-prompting (AI based thinking) is!

What can we expect going forward? Everything said at this stage is a calculated crystal ball gazing. Nevertheless, these auto agent use cases here would only strengthen as we see more specialization (LLMs trained for specific tasks on focused vertical datasets), which in turn would improve model accuracy and efficiency. Some potential applications can be –

Consumer focused: Enabling complex decision making or automating mundane, repetitive tasks

As an e.g., wealth management for mass / mass affluent has been a big hairy, unsolved problem that has failed to scale driven by high cost of servicing (due to high touch, trust building being critical for enabling large AUM). Auto agents can theoretically be trained on stock market fundamentals and technical under a specific investment philosophy, personalized to your risk-return appetite to deterministically suggest what each portfolio construction would look like. Cognosys / AutoGPT already do a half decent job at this!
In a category where supply isn’t differentiated and performance (pricing, availability, seller rating, delivery timing) can be easily objectified – e.g., Ashirwad Atta at cheapest price delivered within 2 days, Uber at cheapest price delivered within 10 mins with rating of >4, Pizza within 45 mins at <INR 300 with jalapenos from a place rated >4.5. If done accurately with limited friction, this solution has a potential of driving platform shift basis the 10x better convenience / time saving

Enterprise focused: Specialized agents adept at replacing entry level staff across high frequency, low complexity jobs – e.g., market research, social media analysts, customer service, legal and accounting and what not! ChatGPT has been seen more as an enabler for employee productivity vs a replacement, which can potentially be challenged with auto agents (driven by thinking abilities, potential of solving complex tasks and permissionless execution). We are already seeing early signs of marketplaces for agents. One early but promising specialized example here is AutoRPG (based on BabyAGI) which is automating game development (e.g., level creation in mid/ hard core games) at a faster (5 mins vs 24 hrs), cheaper (<$1 vs $25-100) and equally creative way!

Excited about the next wave of improvements and the applications that can unfold. The purpose here was to spark a discussion rather than giving deterministic answers on a space that’s moving at non-deterministic lightening speed. Thoughts are welcome!

Authors

Priyal Motwani