Microfusion Technology: Focusing on Customer Pain Points, Driving Tangible AI Benefits with Proof of Concept
At Microfusion Technology, we deeply understand that AI has progressively transformed from a stage of potential-filled experimentation into a crucial driving force capable of creating real impact on enterprise core business. Therefore, when we assist clients with Proof of Concept (POC), our focus is not merely “What can AI do?” but more importantly, “How well can AI perform?” — because this directly relates to whether the client’s AI adoption can bring substantial business transformation benefits.
To address the urgent need for AI adoption in enterprises, Microfusion Technology offers a comprehensive set of rigorous evaluation services, deeply focusing on actual customer pain points. We not only provide professional POC validation but also conduct detailed interviews and Q&A sessions to fully understand your unique challenges and expectations. All of this is aimed at ensuring your AI applications possess high quality, high reliability, and excellent security, as this is an imperative strategy in the current digital transformation.
To guide you toward success, a complete “evaluation” mechanism must serve as your guiding principle. It acts like a beacon that continuously verifies direction throughout the entire development lifecycle. From carefully designing prompts, selecting the most suitable models, to deciding whether fine-tuning is worthwhile, and even evaluating complex AI agents, this robust evaluation service will provide all the key answers.
In addition to Microfusion Technology assisting with POC implementation, you can also perform initial evaluations according to Google Cloud’s Generative AI evaluation service. This service is capable of evaluating diverse models, covering Google’s foundational models, open-source models, proprietary foundational models, and even custom models. It offers an online evaluation mode with pointwise and pairwise criteria, utilizing efficient computing and Autorater methods. This article aims to delve into the new features of the Generative AI evaluation service, which are designed to help you scale your evaluations, evaluate your autoraters, customize your autoraters with rubrics, and evaluate your agents in production environments.Framework for Evaluating Generative AI
1. Scaling Your Evaluations with Generative AI Batch Evaluation
One of the most pressing questions for AI developers is: “How do I perform evaluations at scale?” In the past, large-scale evaluation could be resource-intensive, difficult to maintain, and costly. You had to build your own batch evaluation pipelines by combining multiple Google Cloud services. The new batch evaluation feature simplifies this process, providing a single API for large datasets. This means you can efficiently evaluate large amounts of data, supporting all methods and metrics of the Generative AI evaluation service in Vertex AI. It is designed to be cheaper and more efficient than previous methods. You can learn more about how to perform batch evaluation using the Gemini API in Vertex AI through this tutorial.
2. Scrutinizing Your Autoraters and Building Trust
A common and critical concern we often hear from developers is: “How can I customize and truly evaluate my autoraters?” While using Large Language Models (LLMs) to evaluate LLM-based applications offers scale and efficiency, it also raises legitimate questions about their limitations, robustness, and potential biases. The fundamental challenge lies in building trust in their results. We believe trust does not come from thin air; it is built through transparency and control. Our features are designed to enable you to rigorously scrutinize and refine your autoraters. This is achieved through two key functionalities: First, you can evaluate the quality of your autorater. By creating a baseline dataset of human-rated examples, you can directly compare the autorater’s judgments with your “ground truth.” This allows you to calibrate its performance, measure its alignment with your expectations, and clearly understand areas for improvement. Second, you can actively improve its alignment. We offer several methods to customize the behavior of the autorater. You can refine the autorater’s prompts through specific criteria, chain-of-thought reasoning, and detailed scoring guidelines. Additionally, advanced settings and the ability to import and fine-tune the autorater using your own reference data ensure it meets your specific needs and can capture unique use cases. This is an example of the analysis you can build using the new autorater customization features.
Please refer to the official documentation on the Advanced Judgement Model Customization Series for more information on how to evaluate and configure judgment models. For a practical example, here is a tutorial on how to customize evaluations using the Vertex AI Generative AI evaluation service.
3. Rubrics-driven Evaluation
Evaluating complex AI applications can sometimes present a frustrating challenge: How do you use a fixed set of criteria when every input is different? One-size-fits-all evaluation criteria often fail to capture the nuances of complex multimodal use cases (e.g., image understanding). To address this, our rubrics-driven evaluation feature breaks down the evaluation experience into a two-step approach. Step 1 – Rubric Generation: First, instead of requiring users to provide a static list of criteria, the system acts like a tailored test generator. For each individual data point in the evaluation set, it automatically generates a unique set of rubrics — specific, measurable criteria adjusted to the content of that entry. You can review and customize these tests if needed. Step 2 – Targeted Autorating: Next, the autorater uses these custom-generated rubrics to evaluate the AI’s response. This is like a teacher writing unique test questions for each student’s essay topic, rather than using the same generic questions for the entire class. This process ensures that each evaluation is contextually relevant and insightful. It enhances interpretability by linking each score directly to criteria specific to the task, allowing you to more accurately measure the model’s true performance. Here, you can see an example of a rubrics-based pairwise evaluation that you will be able to generate through the Generative AI evaluation service on Vertex AI.
4. Agent Evaluation
We are at the dawn of the Agent era, where agents can reason, plan, and use tools to complete complex tasks. However, evaluating these agents presents unique challenges. Merely evaluating the final response is no longer sufficient; we need to validate the entire decision-making process. “Did the agent choose the right tool?” “Did it follow a logical sequence of steps?” “Did it effectively store and use information to provide personalized answers?” These are some of the key questions that determine an agent’s reliability. To address some of these challenges, the Generative AI evaluation service in Vertex AI introduces features specifically for agent evaluation. You can not only evaluate the agent’s final output but also gain insight into its “trajectory” — the sequence of actions and tool calls it took. With metrics specifically for trajectories, you can evaluate the agent’s reasoning path. Whether you are building with Agent Development Kit, LangGraph, CrewAI, or other frameworks, and hosting it locally or on Vertex AI Agent Engine, you can analyze whether the agent’s actions are logical and whether it used the right tools at the right time. All results are integrated with Vertex AI Experiments, providing a powerful system to track, compare, and visualize performance, enabling you to build more reliable and effective AI agents. Google Cloud also launched the powerful Agent2Agent platform last year, connecting different internal application searches to help enterprises break down data silos. Read more about What is Google Agentspace? Breaking Data Silos, Unleashing Enterprise Internal Intelligence. This article is adapted from Google Blog. As a Google Cloud Premier Partner, Microfusion Technology will continue to assist in effectively implementing cutting-edge AI capabilities, helping enterprise clients seamlessly adopt and utilize Google’s latest AI technologies, and together move towards an intelligent future. If you have any questions or needs, please feel free to contact Microfusion Technology. If you are interested in the diverse applications of Google Cloud, please pay close attention to Google’s event information, and we look forward to meeting you at the events!