Building for Users Behind the Great Firewall: Why We Replaced Geolocation With Backend Racing

Most apps figure out where a user is with IP geolocation or a CDN's edge logic, pick the nearest region, and move on. It's a solved problem almost everywhere. Then you start serving users inside mainland China, and every assumption you had quietly falls apart. I build MyChinaGuide , an app for foreign travelers in China. We run two backends: one in Singapore for everyone outside China, and one inside mainland China for users on the ground there. The whole game is getting each user's app to talk to the right one. Pick wrong, and a traveler standing in Shanghai is bouncing requests off a server across a border and a firewall, watching everything time out. Here's why the usual tricks don't work here, and what we did instead. IP geolocation lies in China IP geolocation is the obvious starting point, and it lies constantly here. Travelers show up on roaming SIMs, on hotel WiFi routed through who-knows-where, and on VPNs that may or may not be on at any given moment. The same person can look Singaporean at breakfast and Chinese by lunch. Worse, the network decides which backend is actually reachable, not just which one is closest. "Nearest" and "reachable" are different questions in a country with a firewall, and reachable is the one that matters. A server can be 30ms away on paper and completely unreachable in practice. So we race our own backends We stopped trying to figure out where the user is and started measuring what they can actually reach. On a cold start the app fires the same lightweight request at both backends at once, the Singapore one and the China one, and whichever answers first wins. Not the one with the better-looking IP. The one that actually responded, fastest, right now. final china = pingGeo(chinaEndpoint).then((_) => Region.china);final overseas = pingGeo(sgEndpoint).then((_) => Region.overseas); // whichever backend answers first decides the regionfinal region = await Future.any([china, overseas]) .timeout(const Duration(seconds: 4)); That one change fixed most of our misrouting. We don't care about the user's reported location. We care which of our servers is closest in the only sense that counts: round-trip time over their actual network, right now. When nothing answers, let the firewall tell you Sometimes neither backend responds, and you need a fallback. This is where it gets specific to China. The firewall has a tell. If you request a blocked host like google.com from inside the mainland, you don't get a clean timeout. You get a fast TCP reset, often in around a hundred milliseconds. From outside the wall, the same request just succeeds. So our tiebreaker is a quick HEAD request to a Google endpoint: try { await dio.head('https://www.google.com/generate_204'); return Region.overseas; // Google reachable, not behind the wall} on DioException catch (e) { if (e.type == DioExceptionType.connectionError || e.type == DioExceptionType.connectionTimeout) { return Region.china; // RST or SNI-drop: classic GFW signature } return null; // genuinely unsure, let other signals decide} We only reach for this when the server race comes up empty. On its own a blocked request is a noisy signal, but as a tiebreaker it's surprisingly reliable. The firewall, in trying to block you, tells you exactly where you are. Detection can never block startup There's a catch with all of this. Detection takes time, and you cannot make a user watch a spinner while you probe the network, especially on the patchy connection of someone who just landed. So the rule is that detection never blocks startup. The app reads the last known region from cache and boots immediately on that. The race runs in the background and silently corrects course if the cache was wrong. A first launch with no cache gets a hard 3-second timeout, and if that blows, we fall back to the cheapest hint available: the device timezone. UTC+8 leans China. It's a guess, but it's a fast guess, and the background check fixes it within seconds. Failover, and the bug that taught me to respect it The last piece is failover. Connections here are flaky, and the "right" backend can go unreachable mid-session. So the client counts consecutive failures, and after three in a row it flips to the other backend and starts a timer to re-check the original a few minutes later. Simple enough. Except this is exactly where I got burned. The router was a global singleton, and so was the failure counter. One screen with a buggy reload loop started throwing request errors in a tight cycle. The counter blew past the threshold in seconds, the app "failed over" the entire session to the wrong region, and a single broken screen turned into every-request-is-failing across the whole app. A local glitch became a global outage because the thing that was supposed to add resilience amplified it instead. The fix wasn't a clever one. It was being far stricter about what actually counts as a failure worth tripping a region-wide switch. A timeout on one flaky call is not the same as a backend being down, and only the second kind should ever move everyone to another continent. What carries over If you ever build for a hostile or restricted network, a few of these hold up well beyond China: Don't trust where the network says the user is. Measure what they can actually reach. Treat a blocked request as information, not just an error. Sometimes the thing trying to stop you is the clearest signal you've got. Never let detection sit on your startup path. Cache, boot, verify in the background. Be careful with global failover. A counter that's too eager will happily turn one local hiccup into a total outage. None of this is in the standard playbook, because the standard playbook assumes an open internet. Build for the other kind once, and you stop taking "just use geolocation" for granted. \

View original source — Hacker Noon ↗

ShareShare on X Share on Facebook