Semantic Kernel - Function calling as a planner replacement

In the previous posts, we have learned a few of the changes introduced in Semantic Kernel 1.0: the new setup, prompt functions, the new attributes to define a native function. However, so far, we have seen mostly refactoring and renaming of classes and methods, which required us to change our code, but not to change the way we architect our applications. Things, however, are going to change in this post since Semantic Kernel 1.0 has introduced some deep changes to one of its most powerful features: planners, which bring the capability to automatically orchestrate AI workflows, by automatically figuring out which are the right plugins to use to satisfy the ask of the user.

Why does Semantic Kernel 1.0 brings such a deep change? The reason is that, after that the Semantic Kernel team introduced the planner, OpenAI released a new feature called function calling which, in a way, tries to solve the same problem that the planner was trying to solve: orchestrating complex AI workflows, which might require the interaction with 3rd party APIs.

Let’s learn more about this feature.

# Function calling

Function calling overlaps with one of the most interesting features of the planner in Semantic Kernel: the ability, given a series of available functions, to figure out which ones to call to satisfy the ask of the user. However, function calling is much more efficient than the planner, since it’s a native feature of the LLM, while the planner was a custom implementation of the Semantic Kernel team, which required performing multiple LLM calls to create the plan.

With function calling, you can include in the request you send to an OpenAI model not just the prompt, but also a series of functions which are available to call. If the model realizes that, to provide a response to the prompt, it needs to call one or more of the available functions, it automatically generates a request with the needed JSON payload. It’s very important to highlight that the model won’t call the functions on your behalf, but it will return everything you need to call the function in the proper way. It’s up to the developer to use the generated JSON payload to call the needed API, get a response and use it to generate the final answer to the user.

Since this feature was added at a later stage, you will need to use one of the most recent versions of the LLMs, like gpt-3.5-turbo or gpt-4. You can see a full list of the supported models here (the documentation is about OpenAI, but the same models are available also in Azure OpenAI and they include support for function calling).

The official documentation includes a very detailed step-by-step overview of what happens when you use function calling:

Call the model with the user query and a set of functions defined in the functions parameter.
The model can choose to call one or more functions; if so, the content will be a stringified JSON object adhering to your custom schema.
Parse the string into JSON in your code, and call your function with the provided arguments if they exist.
Call the model again by appending the function response as a new message, and let the model summarize the results back to the user.

In the OpenAI blog, you can find a very good example to understand function calling in action. Let’s say that you want to use an OpenAI model to get the current weather in a specific city. We can use the Chat Completion APIs, which offer support for function calling. This is how the JSON payload of the request you send to OpenAI might look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


{
  "model": "gpt-3.5-turbo-0613",
  "messages": [
    {"role": "user", "content": "What is the weather like in Boston?"}
  ],
  "functions": [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  ]
}

The request includes:

The prompt of the user, which is What is the weather like in Boston?. The Chat Completion APIs, instead of using a single prompt, support including multiple messages so that you can manage the chat history, each of them with a role property to specify who generated the message (the user, the LLM, etc.). As such, we add the prompt as a message with user as a role.
The list of available functions that the LLM can use to generate a response. In this case, we have a single function called get_current_weather. The JSON includes the description of the function and the parameters it accepts. The goal of the function is to get the current weather in a given location; it accepts two parameters in input: one required (the city) and one optional (the unit of measure).

When you send this request to the LLM, you will get a response like this one:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


{
  "id": "chatcmpl-123",
  ...
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": null,
      "function_call": {
        "name": "get_current_weather",
        "arguments": "{ \"location\": \"Boston, MA\"}"
      }
    },
    "finish_reason": "function_call"
  }]
}

You’re getting a response from the LLM (the role is assistant), but the content property is empty. However, the function_call property contains a value: the LLM has realized that, in order to provide a response to the user, it needs to call the get_current_weather function, passing a JSON with the location property set to Boston, MA. With this information, it’s now up to you as a developer to call the function using the parameters provided by the LLM. For example, you could perform a REST call to the Weather APIs and get a response like the following one:

1
2
3
4
5


{
  "temperature": 20,
  "unit": "celsius",
  "location": "Boston, MA"
}

Once you have the response, you can now send again the original request. However, this time, in the messages collection you’re going to include also the information returned by the Weather API, as in the following example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


{
  "model": "gpt-3.5-turbo-0613",
  "messages": [
    {"role": "user", "content": "What is the weather like in Boston?"},
    {"role": "assistant", "content": null, "function_call": {"name": "get_current_weather", "arguments": "{ \"location\": \"Boston, MA\"}"}},
    {"role": "function", "name": "get_current_weather", "content": "{\"temperature\": "22", \"unit\": \"celsius\", \"description\": \"Sunny\"}"}
  ],
  "functions": [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  ]
}

You can see how the messages collection now includes:

The original prompt of the user, with role user.
The request to call a function, with role assistant, since it was generated by the LLM.
The JSON data you have obtained from the Weather API, with role function, since it’s the response we retrieved by calling the requested function.

Now the LLM has all the information it needs to generate a final answer to the user, so you’re going to get the following JSON response back:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


{
  "id": "chatcmpl-123",
  ...
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The weather in Boston is currently sunny with a temperature of 22 degrees Celsius.",
    },
    "finish_reason": "stop"
  }]
}

As you can see, this time the content property contains a value, while we don’t have anymore the function_call one. This means that the LLM has determined that it doesn’t need to call a function anymore to generate a response to the user, so the content property contains the final response. If the LLM would have determined, instead, that another function was needed, it would have returned another function_call property, with the name of the function and the parameters to pass to it and we would have repeated the same process.

Function calling and planners help to achieve a similar goal: we give to the LLM the tools we have, we define the ask and then we let the LLM figure out which tools are needed to satisfy the ask. The main difference is that, with function calling, we must call the function on our own, while with the planner, Semantic Kernel was taking care of calling function on our behalf. Semantic Kernel 1.0 brings the best of both worlds: instead of reinventing the wheel, the team has decided to use the same approach, but to add an extra layer that simplifies the workflow we have seen so far.

Semantic Kernel supports two way to manage functions:

Manually: you have full control over the function calling process. It’s the same flow we have seen in the previous example, but made it easier thanks to the usage of Semantic Kernel classes and methods.
Automatically: this is basically a replacement of the planner. With this approach, you let Semantic Kernel to call the functions on your behalf and to use the response to process the result.

Let’s see both of them in action. We’re going to use the native plugin we have migrated in the previous post: the one called UnitedStatesPlugin, which provides a GetPopulation function to retrieve the population of the US in a given year. We described the creation of this native plugin in this post.

# Function calling: the manual approach

When you use the manual approach, you’re going to implement the following flow:

You send the prompt to the LLM.
You evaluate if the response includes a function calling.
If the answer is affirmative, you call the function and you get the response.
You add the response to the chat history.
You repeat again from step 1 until there are no more functions to call.

Let’s start by setting up the kernel and importing the plugins we have seen in the previous posts:

1
2
3
4
5
6
7
8
9


string apiKey = configuration["AzureOpenAI:ApiKey"];
string deploymentName = configuration["AzureOpenAI:DeploymentName"];
string endpoint = configuration["AzureOpenAI:Endpoint"];

var kernel = Kernel.CreateBuilder()
    .AddAzureOpenAIChatCompletion(deploymentName, endpoint, apiKey)
    .Build();

kernel.ImportPluginFromType<UnitedStatesPlugin>();

We have set up the service to use the Chat Completion APIs provided by Azure OpenAI and we imported the plugin in the kernel using the new ImportPluginFromType<T>() method. Make sure you’re using a recent version of gtp-3.5-turbo or gtp-4, which support function calling. The next step is to get a reference to the Chat Completion service provided by Semantic Kernel, which is a wrapper around the Chat Completion APIs.

1

var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();

We use the dependency injection approach to retrieve it. When we created the kernel using the AddAzureOpenAIChatCompletion() method, Semantic Kernel has created an IChatCompletionService object for us, so we just need to retrieve the one already registered by using the GetRequiredService<T>() method exposed by the kernel. Then, we define the prompt and we store it in the chat history:

1
2
3
4


string prompt = @"Write a paragraph to share the population of the United States in 2015.";

var chatHistory = new ChatHistory();
chatHistory.AddMessage(AuthorRole.User, prompt);

ChatHistory is a collection that simplifies the management of the messages history that you send to the LLM. It’s a wrapper around the messages collection that we have seen in the JSON request to the Chat Completion APIs. It offers various methods to add the various types of messages that the LLM supports. In this case, we’re using the AddMessage() method, passing as first parameter the value AuthorRole.User, which is translated with the following JSON entry in the history:

1
2
3
4


{
  "role": "user",
  "content": "Write a paragraph to share the population of the United States in 2015."
}

Before executing our prompt, we need a final step: we need to enable function calling. This is done by setting up an OpenAIPromptExecutionSettings object. It’s the same one we have learned about in the previous posts to customize the parameters of the execution, like temperature and maximum number of tokens. However, in this case, we’re going to use it to enable function calling by setting the ToolCallBehavior property to EnableKernelFunctions:

1
2
3
4


OpenAIPromptExecutionSettings settings = new()
{
    ToolCallBehavior = ToolCallBehavior.EnableKernelFunctions,
};

Now we’re ready to use the Chat Completion service to send the request to Azure OpenAI:

1

var result = await chatCompletionService.GetChatMessageContentAsync(chatHistory, settings, kernel);

We call the GetChatMessageContentAsync() method, passing as parameters the chat history, the execution settings and the kernel. We have completed the first part of the our workflow. Now let’s start the fun :-) We will get a response and we need to check if the LLM has determined that it needs to call one of the available plugins to generate a response.

1

var functionCalls = ((OpenAIChatMessageContent)result).GetOpenAIFunctionToolCalls();

We convert the result we receive from the service (which is a generic ChatMessageContent) into the specific OpenAIChatMessageContent object. We do this because this class offers a method called GetOpenAIFunctionToolCalls(), that will return a reference to the functions to call if the LLM has determined that it needs to call one of the available ones. Since the LLM can call one function at a time and it might need to call multiple functions, the method returns a collection, that we need to iterate to make sure we process all of them before we try to execute the prompt again. This is the complete code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


var functionCalls = ((OpenAIChatMessageContent)result).GetOpenAIFunctionToolCalls();
foreach (var functionCall in functionCalls)
{
    KernelFunction pluginFunction;
    KernelArguments arguments;
    kernel.Plugins.TryGetFunctionAndArguments(functionCall, out pluginFunction, out arguments);
    var functionResult = await kernel.InvokeAsync(pluginFunction!, arguments!);
    var jsonResponse = functionResult.GetValue<object>();
    var json = JsonSerializer.Serialize(jsonResponse);
    chatHistory.AddMessage(AuthorRole.Tool, json);
}

result = await chatCompletionService.GetChatMessageContentAsync(chatHistory, settings, kernel);

Console.WriteLine(result.Content);
Console.ReadLine();

The Plugins collection of the kernel offers a method called TryGetFunctionAndArgument() which, given a function definition, it’s going to return a reference to a function registered into the kernel. If the method is successful, we will get in output the function and the arguments, in both cases mapped with the Semantic Kernel classes we have learned about in the previous posts: KernelFunction and KernelArguments. Thanks to these objects, we can simply call the function using the InvokeAsync() method of the kernel. Once we obtain a result, we need to store it in the ChatHistory collection, so that the LLM can use it the next time. We use again the AddMessage() method, but this time we use the value AuthorRole.Tool, which is specific to store the results of a function. We pass, as parameters, the result and the function name. Since ChatHistory can store only string values, we need to serialize the result from the function first into a JSON string, by using the System.Text.Json serializer to do this.

Once we have processed all the functions, we call again the GetChatMessageContentAsync() method to get a new response from the LLM. This time, we don’t get a function call anymore, because the information in the chat history are enough for the LLM to elaborate a response, so we’ll get a value in the Content property of the result. And the final response is the one we expect, which we have already seen when we worked with the planner:

1

In 2015, the population of the United States was 316,515,021

# Supporting multiple functions

The previous example required the usage of just a single function: GetPopulation. However, the same exact code works fine also in case the LLM needs to execute more than one function to achieve the result. Let’s see this with an example. Let’s change the UnitedStatesPlugin class to include a second function, which is able to retrieve the population number by gender:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


[KernelFunction, Description("Get the United States population who identifies with a specific gender in a given year")]
public async Task<UnitedStatesResponse> GetPopulationByGender([Description("The year")] int year, [Description("The gender")]string gender)
{
    string request = "https://datausa.io/api/data?drilldowns=Year,Gender&measures=Total+Population";
    HttpClient client = new HttpClient();
    var result = await client.GetFromJsonAsync<GenderResult>(request);
    var populationData = result.data.FirstOrDefault(x => x.Year == year.ToString() && x.Gender.ToLower() == gender);

    var response = new UnitedStatesResponse
    {
        Gender = gender,
        TotalNumber = populationData.TotalPopulation,
        Year = year
    };

    return response;
}

The code is very similar to the previous function, we’re just using a different endpoint provided by the DataUSA APIs. In addition, this function accepts two parameters in input instead of just one: the gender, other than the year.

Now let’s change our prompt in the following way:

1
2
3


string prompt = @"Write a paragraph to share the population of the United States in 2015. 
Make sure to specify how many people, among the population, identify themselves as male and female. 
Don't share approximations, please share the exact numbers.";

Now run the application again. This time, you will see that the collection returned by the GetOpenAIFunctionToolCalls() contains three functions:

The GetPopulation function, to get the total population number for 2015.
The GetPopulationByGender function, to get the population who identifies as male in 2015.
The GetPopulationByGender function, this time to get the population who identifies as female in 2015.

If we add a statement to print the function output in the console at every execution, we will see the following output:

1
2
3


{"year":2015,"totalNumber":316515021,"gender":null}
{"year":2015,"totalNumber":155728568,"gender":"male"}
{"year":2015,"totalNumber":160786456,"gender":"female"}

Finally, after these three loops, the LLM will have all the information it needs to generate a response, so we can break the loop and display the result:

1

In 2015, the population of the United States was 316,515,021. Out of this total, 155,728,568 individuals identified themselves as male, and 160,786,456 identified themselves as female. These figures represent the exact count of the population by gender for that year.

As you can see, the LLM has the capacity to not just call the functions it needs from the plugins which are registered in the kernel, but also to call the same function multiple times if needed (like in this case, in which it needed to call the GetPopulationByGender function twice, one for males and one for females).

# Function calling: the automatic approach

Compared to the REST based implementation provided by OpenAI, Semantic Kernel is able to simplify the process of function calling even if we use the manual approach. We are in charge of detecting if the LLM needs to call a function, retrieve it, execute it and then pass the response, but instead of using plain APIs and JSON, we can use classes and objects. Additionally, the whole function registration is simplified thanks to the usage of plugins.

However, Semantic Kernel 1.0 offers also an automatic approach, which is even simpler. This approach replaces the planner, which was available in the previous versions of Semantic Kernel, since it allows Semantic Kernel not just to figure out which are the right functions to call, but also to call them and to manage automatically the whole chat history. Let’s see how it works.

First, we have to change the OpenAIPromptExecutionSettings object, by changing the FunctionCallBehavior property to AutoInvokeKernelFunctions.

1
2
3
4


OpenAIPromptExecutionSettings settings = new()
{
    FunctionCallBehavior = FunctionCallBehavior.AutoInvokeKernelFunctions
};

Now, since everything is managed automatically, we don’t need to use the ChatCompletionService object anymore, but we can just invoke the prompt. The following example uses the streaming approach:

1
2
3
4
5
6
7
8
9


string prompt = @"Write a paragraph to share the population of the United States in 2015. 
Make sure to specify how many people, among the population, identify themselves as male and female. 
Don't share approximations, please share the exact numbers.";

var streamingResult = kernel.InvokePromptStreamingAsync(prompt, new KernelArguments(settings));
await foreach (var streamingResponse in streamingResult)
{
    Console.Write(streamingResponse);
}

The only difference is that, as second parameter, we must pass a new KernelArguments instance with the execution settings, which will tell to the kernel to use functions and to automatically invoke them when needed.

That’s it. Now if you execute this code, you will get the same output as the manual approach:

1

In 2015, the population of the United States was 316,515,021. Out of this total, 155,728,568 individuals identified themselves as male, and 160,786,456 identified themselves as female. These figures represent the exact count of the population by gender for that year.

Behind the scenes, Semantic Kernel, like before, has figured out that it needs to call the GetPopulation function once and the GetPopulationByGender function twice to generate the response. This time, however, it has also called the functions on our behalf, so we get the final response without having to write any additional code.

# Wrapping up

This was a long post! We have learned what function calling is and how it works. Then, we have seen how Semantic Kernel 1.0 supports it and how we can use it in two different ways: manually and automatically. We have also learned how function calling has replaced the need of using the planner for many scenarios, since function calling was built to solve the same problem: giving access to the LLM to a series of tools and let it figure out which ones to use to satisfy the ask of the user. Semantic Kernel takes this feature to the next level, by enabling the LLM not just to figure out which functions to call, but also to call them on our behalf.

Does this mean that the planner is dead? Not at all. There are scenarios, in fact, in which you need to perform more complex tasks, which requires more reasoning and more loops from the LLM in order to figure out the actions to perform. In the next post, we’re going to learn more about two new types of planners that have been introduced in 1.0.

In the meantime, you can find the whole sample code demonstrated in this post in this GitHub repository, more precisely in the SemanticFunction.FunctionCalling project.

Happy coding!