AWS ECS Windows introspection bug

2024-01-27

TL;DR: The following is just a copy-paste from the issue I opened on the Amazon ECS Agent GitHub repository.

The issue

Summary

I started noticing inconsistent behavior when trying to reach the EC2 machine’s ECS Agent from Linux containers and Windows containers through the containers’ respective gateways, via HTTP.

Description

I am using a mixed cluster with Windows and Linux instances on ECS EC2 using the ECS optimised images from AWS. I notice that Windows containers, in bridge network mode, cannot reach the machine’s ECS Agent via the container’s gateway on HTTP (unless the machine’s Windows Firewall is relaxed). It is not true on Linux instances where containers, in bridge network mode, can reach the machine’s ECS Agent via the container’s gateway on HTTP.

The Windows containers will be able to reach the machine’s ECS Agent if I explicitly add an allow rule to the Windows Firewall.

Expected Behavior

Both platforms should deny or allow traffic consistently (?)

Observed Behavior

From a Linux container:

/app # ip route
default via 172.17.0.1 dev eth0
172.17.0.0/16 dev eth0 scope link  src 172.17.0.7
/app # curl http://172.17.0.1:51678/
{"AvailableCommands":["/v1/metadata","/v1/tasks","/license"]}
/app #

From a Windows container:

PS C:\> route print
===========================================================================
Interface List
 18...........................Software Loopback Interface 2
 19...00 15 5d ca 12 db ......Hyper-V Virtual Ethernet Container Adapter
===========================================================================

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0     172.24.160.1   172.24.172.125   5256
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    331
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    331
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    331
     172.24.160.0    255.255.240.0         On-link    172.24.172.125   5256
   172.24.172.125  255.255.255.255         On-link    172.24.172.125   5256
   172.24.175.255  255.255.255.255         On-link    172.24.172.125   5256
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    331
        224.0.0.0        240.0.0.0         On-link    172.24.172.125   5256
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    331
  255.255.255.255  255.255.255.255         On-link    172.24.172.125   5256
===========================================================================
Persistent Routes:
  Network Address          Netmask  Gateway Address  Metric
          0.0.0.0          0.0.0.0     172.24.160.1  Default
          0.0.0.0          0.0.0.0     172.24.160.1  Default
===========================================================================

IPv6 Route Table
===========================================================================
Active Routes:
 If Metric Network Destination      Gateway
 18    331 ::1/128                  On-link
 19   5256 fe80::/64                On-link
 19   5256 fe80::4447:b1a9:a5af:28d/128
                                    On-link
 18    331 ff00::/8                 On-link
 19   5256 ff00::/8                 On-link
===========================================================================
Persistent Routes:
  None

PS C:\> curl http://172.24.160.1:51678/
curl: (28) Failed to connect to 172.24.160.1 port 51678 after 21026 ms: Couldn't connect to server

Environment Details

Windows: ECS Agent version: "Version":"Amazon ECS Agent - v1.74.1 (a23f2935)" AMI: Windows_Server-2022-English-Core-ECS_Optimized-2023.08.09 Instance type: t3a.2xlarge

Linux: ECS Agent version: "Version":"AmazonECS Agent - v1.75.0 (*e978160b)"} AMI: amzn2-ami-ecs-hvm-2.0.20230809-x86_64-ebs Instance type: t3a.large

The fix (from AWS engineering)

The fix for this issue was released with November 2023 AMIs.

During instance bootstrap, you can set ECSAllowOffHostIntrospectionAccess switch for the Initialize-ECSAgent method. Alternatively, you can also set ECS_ALLOW_OFFHOST_INTROSPECTION_ACCESS env to true for enabling access to introspection API. This env is documented here- https://github.com/aws/amazon-ecs-agent#environment-variables

The new user data would be similar to-

<powershell>
Import-Module ECSTools
[Environment]::SetEnvironmentVariable(“ECS_LOGLEVEL_ON_INSTANCE”,”debug", “Machine”)
Initialize-ECSAgent -Cluster 'windows' -EnableTaskENI -EnableTaskIAMRole -AwsvpcBlockIMDS -ECSAllowOffHostIntrospectionAccess
</powershell>

or

<powershell>
Import-Module ECSTools
[Environment]::SetEnvironmentVariable(“ECS_LOGLEVEL_ON_INSTANCE”,”debug", “Machine”)
[Environment]::SetEnvironmentVariable(“ECS_ALLOW_OFFHOST_INTROSPECTION_ACCESS”,”true", “Machine”)
Initialize-ECSAgent -Cluster 'windows' -EnableTaskENI -EnableTaskIAMRole -AwsvpcBlockIMDS
</powershell>