AWS ECS and Chime Voice Connector

Aurora 2024 Ottawa Image

AWS offers a wide range of services and tools that simplify the work of developers and solution architects. However, there are instances where certain limitations or missing features can be frustrating or challenging. In this post, I’ll share some workarounds I’ve implemented to make SIP trunks function smoothly on ECS Fargate with AWS Chime Voice Connector.

First, if you’re not familiar with Voice Connector, it’s a SIP trunking service for call origination and termination, part of the Chime SDK. It can be used to build conferencing solutions and other voice-enabled applications. For more information, you can explore the details here: AWS Voice Cconnectors

Problem: An important point about Voice Connector is that its proxy or edge SIP servers are only accessible via the internet using public IPs. While AWS offers a solution using Direct Connect, it doesn’t work if your servers are hosted within AWS itself. This creates the first limitation: your SIP servers must connect to Voice Connector through the internet, meaning they either need a public IP or must be behind a NAT (although NAT doesn’t work in this case). To address this limitation and the security concerns around call origination, AWS provides an Allowed Hosts List in the termination settings. You must add your server’s or NAT’s IPs to this list, which allows your servers to send SIP traffic to Voice Connector.

AWS Chime Voice Connector Termination Settings Image

If your SIP servers are running on EC2, you’re in luck—it’s easier to assign Elastic IPs to your instances. You simply need to add those IPs to the allowed hosts list and make a few adjustments to your security groups to allow incoming traffic from Chime. You can find the AWS IP ranges here: network-config

But what if you want to utilizing the benefits of containers and cost efficiency and use ECS Fargate (NOT ECS with EC2), you will be in trouble. You can enable your Fargate task to get dynamic public IP but how should you deal with that Allowed Hosts List? Maybe hiring someone to just sit there and whenever a task goes down or new task comes up, update IPs list :))

This is where AWS doesn’t currently offer a viable solution. I spoke with support, and their suggestion was to use EC2 or NAT. Recommending EC2 to a team that’s looking for a Fargate-based solution feels more like avoiding the problem than solving it. So, what’s the issue with NAT? NAT works when your server is in a subnet behind a NAT gateway and only needs to initiate requests to Chime. However, for incoming requests—like a BYE message at the end of a call—your server won’t receive it because it’s behind NAT.

What about using a network load balancer (NLB)? While it’s another potential approach, it doesn’t work either. If you put an NLB in front of your ECS cluster and use its IP in the SIP Contact header, you can receive incoming SIP requests from Chime. However, you still need to register your tasks’ public IPs with Chime to allow tasks to send SIP requests—complicating things further. Sound complex? Don’t worry—stay happy! 😉

The great thing about AWS is that there’s almost always a way to work around limitations, thanks to the extensive APIs available for most services. This flexibility allows you to build custom solutions to tackle challenges—just like the approach that worked for me in this scenario.

Solution: One solution I implemented was delegating the task of updating the IP whitelist for Voice Connectors to a simple service. When an ECS task starts, it calls this service and provides key information about itself. The service then takes care of updating the Voice Connector’s termination settings and allowed IP list automatically.

Step 1: Integrating a call to an API within your container’s entry point is straightforward. You can send any relevant information that the service needs to understand details about the ECS cluster, service, region, IPs, and more regarding that task.

In our scenario, our SIP servers already maintain a WebSocket connection to the API Gateway, so we leveraged that same connection. This approach eliminates the need for additional authentication steps typically required for calling a REST API, although managing authentication isn’t overly complicated.

Step 2: Behind the API Gateway, we have a Lambda function that receives the notifications and processes them. This Lambda function is responsible for updating the Voice Connector’s termination settings and managing the IP list.

Let’s include some snippets here to illustrate what this Lambda function does. Also I didn’t mention here how you should find the Voice Connector’s ID. Consider that Lambda function has it:

We can get settings of Voice Connector like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
    const {
        ChimeSDKVoiceClient,
        GetVoiceConnectorTerminationCommand,
        PutVoiceConnectorTerminationCommand,
    } = require("@aws-sdk/client-chime-sdk-voice");

    const voiceConnectorId = "";
    const voiceClient = new ChimeSDKVoiceClient({ region: REGION });
    const params = {
        VoiceConnectorId: voiceConnectorId,
    };

    const result = await voiceClient.send(
        new GetVoiceConnectorTerminationCommand(params)
    );

    const terminationSettings = result.Termination;
    const cidrList = terminationSettings.CidrAllowedList;

    // we update cidrList with new IPs and remove old IPs here
    const cleanedCidrs = [];

In this way, so updating is simple. Assume you updated the IP CIDR list:

1
2
3
4
5
6
7
8
9
    terminationSettings.CidrAllowedList = cleanedCidrs;
    const putParams = {
        VoiceConnectorId: voiceConnectorId,
        Termination: terminationSettings,
    };

    const updateResult = await voiceClient.send(
        new PutVoiceConnectorTerminationCommand(putParams)
    );

Let’s assume the event we receive from the Fargate task includes its public IP, which we can easily add to the allowed CIDR list. But what about removing invalid IPs that are no longer in use by our tasks? To handle this, we can retrieve the eni (Elastic Network Interface) IDs for all tasks in the service. Then, we can get the public IPs assigned to these ENIs and remove any IPs that are no longer present in the task list from the Voice Connector’s CidrAllowedList.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    const tasksPublicIps = [];

    const tasks = await ecsClient.send(
        new ListTasksCommand({ cluster: clusterName })
    );

    for (const taskArn of tasks.taskArns) {
        const task = await ecsClient.send(
            new DescribeTasksCommand({ cluster: clusterName, tasks: [taskArn] })
        );

        const enis = task.tasks[0].attachments[0].details.filter(
            (detail) => detail.name === "networkInterfaceId"
        );

        for (const interface of enis) {
            const niId = interface.value;
            const ec2Client = new EC2Client({ region: REGION });
            const niDetails = await ec2Client.send(
                new DescribeNetworkInterfacesCommand({
                    NetworkInterfaceIds: [niId],
                })
            );

            const publicIp = niDetails.NetworkInterfaces[0].Association?.PublicIp;

            if (publicIp) {
                tasksPublicIps.push(publicIp);
            }
        }
    }

Now that you have an updated list of valid public IPs for your tasks (SIP servers), the final step is simply updating the appropriate Voice Connector they’re connected to. And that’s it—you’re done!

I hope this post gave you some useful insights for handling similar scenarios in the future.

Take care! ;)

updatedupdated2024-10-122024-10-12