How to use undocumented web APIs

Ask questions Research chat →

https://jvns.ca/blog/2022/03/10/how-to-use-undocumented-web-apis/ · scraped

api

Attachments

Scraped Content

— 1272 words · 2026-02-14 17:38:57 UTC ·

Excerpt

As an example, let’s use Google Hangouts. I’m picking this not because it’s the most useful example (I think there’s an official API which would be much more practical to use), but because many sites where this is actually useful are smaller sites that are more vulnerable to abuse. So we’re just going to use Google Hangouts because I’m 100% sure that the Google Hangouts backend is designed to be resilient to this kind of poking around. Let’s get started! ### step 1: look in developer tools for a promising JSON response I start out by going to https://hangouts.google.com, opening the network tab in Firefox developer tools and looking for JSON responses. You can use Chrome developer tools too. Here’s what that looks like ![](https://prod-files-secure.s3.us-west-2.amazonaws.com/871f1661-80b8-4d0c-ac3b-2adfc6ff4c66/7bac608f-a669-47ba-ab6b-d06c54233a43/network-tab.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAZI2LB466VUZRSXRA%2F20260214
As an example, let’s use Google Hangouts. I’m picking this not because it’s the most useful example (I think there’s an official API which would be much more practical to use), but because many sites where this is actually useful are smaller sites that are more vulnerable to abuse. So we’re just going to use Google Hangouts because I’m 100% sure that the Google Hangouts backend is designed to be resilient to this kind of poking around. Let’s get started! ### step 1: look in developer tools for a promising JSON response I start out by going to https://hangouts.google.com, opening the network tab in Firefox developer tools and looking for JSON responses. You can use Chrome developer tools too. Here’s what that looks like ![](https://prod-files-secure.s3.us-west-2.amazonaws.com/871f1661-80b8-4d0c-ac3b-2adfc6ff4c66/7bac608f-a669-47ba-ab6b-d06c54233a43/network-tab.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAZI2LB466VUZRSXRA%2F20260214%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20260214T173857Z&X-Amz-Expires=3600&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEEEaCXVzLXdlc3QtMiJHMEUCIDivo%2FEFqednkfb7LN%2BHDVKqHG%2Bq8FLChk0G%2BKihCk8lAiEA3cScUo%2BDFH2rrV7ODEHVIeeglTI4lPvNPRtrqGrq8gUq%2FwMIChAAGgw2Mzc0MjMxODM4MDUiDFtkPUq33EeMJ%2BtFDSrcA%2FdRZhhZhejXSHxGcQZE0m%2BkJRJ89nMPLkxHDucmelP1QejRbbRCHETgN2CbSsy2HnNTwyGocryNEmplPe31LSw1%2F66nauZqeUblMv73r7g9EA7hkAR%2BClOOApvyc78SN9ExyRpwpVvK45imTHia2miyp9kGiHLY1nVPXg4atNUj3quJr98A8lanjcUEACz4QZPuIRUT27jQ7wAAQt4BG%2BqVvjFSgVxPxZ88lg8BivoYh55k8%2BqLXr6NZNeD1lEFHTbw9oocJh7MdTowxmV5mBF0LjBPqt9O1vY321QN74BdPOqsFP4%2Bs7DsgeWRGRLbs9IjCYVJEqr%2FizGlPeJ4uXiun3ezKHycgR%2Bsnp3chmI%2BSHeqQ%2BgmY2EvGyjp9FQSXfgzgcg3c08%2BBbCurV2bpOzYqh4BExkhP2ZIHWP3lhFUkz1wxlSBXNTJBjKUWtiGA8ZOzTrlzUEipWS62tqvt6X92l7kWPz7z8O55keBcIoN4w%2FGlF64BOC9v9RSFGTvaoLxDLUtrVx6DmSzP0TTJW5VDRsWnVO7k%2BDQHp0DpE0L%2Bvfnzcl9qcEbbuPnC7JiMnu5dzQWuImkkT%2Fa4wgo8vHhr4jPP6rQjPxN6NBSv5ulNUhyHQDJX1jwNvuQMK%2FRwswGOqUBbOwrfNpZkoguc6bv5cUw9vziEvnHH6dvYVdCQ%2F7sPux2lvBvbxvmJxHkXrq8szmGhXxwri19JZ8uYpdJyVDT9mUYtcaTYoU%2FSyhnStsYMoNDFMNcBE%2Bkm54YpM6j27YEUIE1CV8Zake402wBwW9VU0H4SsS5%2B17TgclsfvJrRnKCFK8kinS5PmohBhzELS5CUygy8xapBcDp3xjLJm2l2tUNh6%2F0&X-Amz-Signature=d3187cbfb4d2359e3a3aefa872b0e039ad79125ceea2fa3b192e655df5d62fd2&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject) The request is a good candidate if it says “json” in the “Type” column” I had to look around for a while until I found something interesting, but eventually I found a “people” endpoint that seems to return information about my contacts. Sounds fun, let’s take a look at that. ### step 2: copy as cURL Next, I right click on the request I’m interested in, and click “Copy” -> “Copy as cURL”. Then I paste the curl command in my terminal and run it. Here’s what happens. ```plain text $ curl 'https://people-pa.clients6.google.com/v2/people/?key=REDACTED' -X POST ........ (a bunch of headers removed) Warning: Binary output can mess up your terminal. Use "--output -" to tell Warning: curl to output it to your terminal anyway, or consider "--output Warning: <FILE>" to save to a file. ``` You might be thinking – that’s weird, what’s this “binary output can mess up your terminal” error? That’s because by default, browsers send an Accept-Encoding: gzip, deflate header to the server, to get compressed output. We could decompress it by piping the output to gunzip, but I find it simpler to just not send that header. So let’s remove some irrelevant headers. ### step 3: remove irrelevant headers Here’s the full curl command line that I got from the browser. There’s a lot here! I start out by splitting up the request with backslashes (\) so that each header is on a different line to make it easier to work with: ```plain text curl 'https://people-pa.clients6.google.com/v2/people/?key=REDACTED' \ -X POST \ -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:96.0) Gecko/20100101 Firefox/96.0' \ -H 'Accept: */*' \ -H 'Accept-Language: en' \ -H 'Accept-Encoding: gzip, deflate' \ -H 'X-HTTP-Method-Override: GET' \ -H 'Authorization: SAPISIDHASH REDACTED' \ -H 'Cookie: REDACTED' -H 'Content-Type: application/x-www-form-urlencoded' \ -H 'X-Goog-AuthUser: 0' \ -H 'Origin: https://hangouts.google.com' \ -H 'Connection: keep-alive' \ -H 'Referer: https://hangouts.google.com/' \ -H 'Sec-Fetch-Dest: empty' \ -H 'Sec-Fetch-Mode: cors' \ -H 'Sec-Fetch-Site: same-site' \ -H 'Sec-GPC: 1' \ -H 'DNT: 1' \ -H 'Pragma: no-cache' \ -H 'Cache-Control: no-cache' \ -H 'TE: trailers' \ --data-raw 'personId=101777723309&personId=1175339043204&personId=1115266537043&personId=116731406166&extensionSet.extensionNames=HANGOUTS_ADDITIONAL_DATA&extensionSet.extensionNames=HANGOUTS_OFF_NETWORK_GAIA_GET&extensionSet.extensionNames=HANGOUTS_PHONE_DATA&includedProfileStates=ADMIN_BLOCKED&includedProfileStates=DELETED&includedProfileStates=PRIVATE_PROFILE&mergedPersonSourceOptions.includeAffinity=CHAT_AUTOCOMPLETE&coreIdParams.useRealtimeNotificationExpandedAcls=true&requestMask.includeField.paths=person.email&requestMask.includeField.paths=person.gender&requestMask.includeField.paths=person.in_app_reachability&requestMask.includeField.paths=person.metadata&requestMask.includeField.paths=person.name&requestMask.includeField.paths=person.phone&requestMask.includeField.paths=person.photo&requestMask.includeField.paths=person.read_only_profile_info&requestMask.includeField.paths=person.organization&requestMask.includeField.paths=person.location&requestMask.includeField.paths=person.cover_photo&requestMask.includeContainer=PROFILE&requestMask.includeContainer=DOMAIN_PROFILE&requestMask.includeContainer=CONTACT&key=REDACTED' ``` This can seem like an overwhelming amount of stuff at first, but you don’t need to think about what any of it means at this stage. You just need to delete irrelevant lines. I usually just figure out which headers I can delete with trial and error – I keep removing headers until the request starts failing. In general you probably don’t need Accept*, Referer, Sec-*, DNT, User-Agent, and caching headers though. In this example, I was able to cut the request down to this: ```plain text curl 'https://people-pa.clients6.google.com/v2/people/?key=REDACTED' \ -X POST \ -H 'Authorization: SAPISIDHASH REDACTED' \ -H 'Content-Type: application/x-www-form-urlencoded' \ -H 'Origin: https://hangouts.google.com' \ -H 'Cookie: REDACTED'\ --data-raw 'personId=101777723309&personId=1175339043204&personId=1115266537043&personId=116731406166&extensionSet.extensionNames=HANGOUTS_ADDITIONAL_DATA&extensionSet.extensionNames=HANGOUTS_OFF_NETWORK_GAIA_GET&extensionSet.extensionNames=HANGOUTS_PHONE_DATA&includedProfileStates=ADMIN_BLOCKED&includedProfileStates=DELETED&includedProfileStates=PRIVATE_PROFILE&mergedPersonSourceOptions.includeAffinity=CHAT_AUTOCOMPLETE&coreIdParams.useRealtimeNotificationExpandedAcls=true&requestMask.includeField.paths=person.email&requestMask.includeField.paths=person.gender&requestMask.includeField.paths=person.in_app_reachability&requestMask.includeField.paths=person.metadata&requestMask.includeField.paths=person.name&requestMask.includeField.paths=person.phone&requestMask.includeField.paths=person.photo&requestMask.includeField.paths=person.read_only_profile_info&requestMask.includeField.paths=person.organization&requestMask.includeField.paths=person.location&requestMask.includeField.paths=person.cover_photo&requestMask.includeContainer=PROFILE&requestMask.includeContainer=DOMAIN_PROFILE&requestMask.includeContainer=CONTACT&key=REDACTED' ``` So I just need 4 headers: Authorization, Content-Type, Origin, and Cookie. That’s a lot more manageable. ### step 4: translate it into Python Now that we know what headers we need, we can translate our curl command into a Python program! This part is also a pretty mechanical process, the goal is just to send exactly the same data with Python as we were with curl. Here’s what that looks like. This is exactly the same as the previous curl command, but using Python’s requests. I also broke up the very long request body string into an array of tuples to make it easier to work with programmmatically. ```plain text import requests import urllib data = [ ('personId','101777723'), # I redacted these IDs a bit too ('personId','117533904'), ('personId','111526653'), ('personId','116731406'), ('extensionSet.extensionNames','HANGOUTS_ADDITIONAL_DATA'), ('extensionSet.extensionNames','HANGOUTS_OFF_NETWORK_GAIA_GET'), ('extensionSet.extensionNames','HANGOUTS_PHONE_DATA'), ('includedProfileStates','ADMIN_BLOCKED'), ('includedProfileStates','DELETED'), ('includedProfileStates','PRIVATE_PROFILE'), ('mergedPersonSourceOptions.includeAffinity','CHAT_AUTOCOMPLETE'), ('coreIdParams.useRealtimeNotificationExpandedAcls','true'), ('requestMask.includeField.paths','person.email'), ('requestMask.includeField.paths','person.gender'), ('requestMask.includeField.paths','person.in_app_reachability'), ('requestMask.includeField.paths','person.metadata'), ('requestMask.includeField.paths','person.name'), ('requestMask.includeField.paths','person.phone'), ('requestMask.includeField.paths','person.photo'), ('requestMask.includeField.paths','person.read_only_profile_info'), ('requestMask.includeField.paths','person.organization'), ('requestMask.includeField.paths','person.location'), ('requestMask.includeField.paths','person.cover_photo'), ('requestMask.includeContainer','PROFILE'), ('requestMask.includeContainer','DOMAIN_PROFILE'), ('requestMask.includeContainer','CONTACT'), ('key','REDACTED') ] response = requests.post('https://people-pa.clients6.google.com/v2/people/?key=REDACTED', headers={ 'X-HTTP-Method-Override': 'GET', 'Authorization': 'SAPISIDHASH REDACTED', 'Content-Type': 'application/x-www-form-urlencoded', 'Origin': 'https://hangouts.google.com', 'Cookie': 'REDACTED', }, data=urllib.parse.urlencode(data), ) print(response.text) ``` I ran this program and it works – it prints out a bunch of JSON! Hooray! You’ll notice that I replaced a bunch of things with REDACTED, that’s because if I included those values you could access the Google Hangouts API for my account which would be no good. ### and we’re done! Now I can modify the Python program to do whatever I want, like passing different parameters or parsing the output. I’m not going to do anything interesting with it because I’m not actually interested in using this API at all, I just wanted to show what the process looks like. But we get back a bunch of JSON that you could definitely do something with. ### this always works (in theory) Some of you might be wondering – can you always do this? The answer is sort of yes – browsers aren’t magic! All the information browsers send to your backend is just HTTP requests. So if I copy all of the HTTP headers that my browser is sending, there’s literally no way for the backend to tell that the request isn’t sent by my browser and is actually being sent by a random Python program. Of course, we removed a bunch of the headers the browser sent so theoretically the backend could tell, but usually they won’t check. There are some caveats though – for example a lot of Google services have backends that communicate with the frontend in a totally inscrutable (to me) way, so even though in theory you could mimic what they’re doing, in practice it might be almost impossible. Now that we’ve seen how to use undocumented APIs like this, let’s talk about some things that can go wrong. ### problem 1: expiring session cookies One big problem here is that I’m using my Google session cookie for authentication, so this script will stop working whenever my browser session expires. That means that this approach wouldn’t work for a long running program (I’d want to use a real API), but if I just need to quickly grab a little bit of data as a 1-time thing, it can work great! ### problem 2: abuse If I’m using a small website, there’s a chance that my little Python script could take down their service because it’s doing way more requests than they’re able to handle. So when I’m doing this I try to be respectful and not make too many requests too quickly. This is especially important because a lot of sites which don’t have official APIs are smaller sites with less resources. In this example obviously this isn’t a problem – I think I made 20 requests total to the Google Hangouts backend while writing this blog post, which they can definitely handle. Also if you’re using your account credentials to access the API in a excessive way and you cause problems, you might (very reasonably) get your account suspended. I also stick to downloading data that’s either mine or that’s intended to be publicly accessible – I’m not searching for vulnerabilities. ### remember that anyone can use your undocumented APIs

Visibility

Visible to everyone

Reading Status

Related Bookmarks

My Note


Saved!

Annotations

Export as Markdown
+ Annotate selection

Add Annotation