The hack that made the Spectrum possible This is a Korean translation of a prior post by Marek Majkowski . ] We recently released Spectrum : A new Cloudflare feature that allows DDoS protection, load balancing and content acceleration for any TCP-based protocol Staffan Vilcans CC BY-SA 2.0 Beginning to create Spectrum, it soon became an important technical challenge: Spectrum should allow access to any valid TCP port between 1 and 65535. On our Linux Edge Server " it is impossible to allow any port number inbound connections". This is not a Linux-only limitation: it is largely a property of the BSD socket API, which is the basis for network applications in the operating system. Internally there were two overlapping problems that had to be solved in order to complete Spectrum: How to accept TCP connections for all port numbers from 1 to 65535 How to set up a single Linux server to accept connections from a very large number of IP addresses (we have a lot of IP addresses in the anycast band) Assigning millions of IPs to the server Cloudflare's Edge Servers have almost identical configurations. In the early days, we assigned specific / 32 (and / 128) IP addresses to the loopback network interface [1] . This worked well when I only had dozens of IP addresses, but failed to extend it as it grew. Then "AnyIP" trick appeared. AnyIP allows you to assign the entire IP prefix (subnet), rather than a single address, to the loopback interface. In fact, we are using AnyIP a lot: 127.0.0.0/8 is assigned to the lub-back interface on your computer. From a computer perspective, all addresses from 127.0.0.1 to 127.255.255.254 are assigned to the local machine. This trick is applicable beyond the 127.0.0.1/8 band. To make the entire 192.0.2.0/24 look like it's locally assigned: ip route add local 192.0.2.0/24 dev lo Next, it's OK to bind to port 8080 on one of these IP addresses: nc -l 192.0.2.1 8080 Making IPv6 work that way is a bit more difficult: ip route add local 2001: db8 :: / 64 dev Unfortunately you can not assign a v6 IP address like that in the v4 example. To do this, you need to use the IP_FREEBIND socket option which requires additional privileges. For completeness net.ipv6.ip_nonlocal_bind There is a sysctl but it is recommended not to modify it. This AnyIP trick allows millions of IP addresses assigned as local interfaces to each server: $ ip addr show 1: lo: mtu 65536     inet 1.1.1.0/24 scope global lo        valid_lft forever preferred_lft forever     inet 104.16.0.0/16 scope global lo        valid_lft forever preferred_lft forever … Binding to all ports The second biggest problem is the ability to open a TCP socket on any port number. On systems that support the Linux and BSD socket APIs, it is generally possible to bind to only a specific TCP port number with a single bind system call. It is not possible to bind to multiple ports with a single command. Simply thinking is to have bind 65535 times for each of the possible 65535 ports. Of course you can think of this, but it can have terrible consequences: Internally, the Linux kernel stores the listening socket in a hash table indexed by port number in LHTABLE and uses 32 buckets / * Yes, really, this is all you need. * / #define INET_LHTABLE_SIZE 32 Looking at this table is very slow if you open up to 650,000 ports: each hash table bucket can contain 2,000 items. Another way to solve this problem is to use the rich NAT feature of iptable. The address of the incoming packet is changed to a specific address / port and the application binds to it I have not tried this, but I need the conntrack module of iptables. Previously we found



when it was a performance problem



Source link