What browsing with ChatGPT taught me
--
There are websites out there which aren’t ready for the primitive browsing ChatGPT affords
ChatGPT plugins are how GPT-4, the LLM backing ChatGPT, pulls in information from the real world into its inferences.
If you are a ChatGPT Plus subscriber, you should have a special plugin which browses urls you supply during a chat session to add more context to the conversation.
Here’s the thing about this browser plugin.
It is essentially a headless browser which uses Selenium to interact with the given url.
Selenium, great as it is, has a limitation. It is not Javascript run time.
This means that if your page is only rendered after the client has downloaded and executed a bunch of javascript, ChatGPT is not going to be able to read your website.
You will end up with errors like this:
This is especially problematic because a lot of modern websites are Single Page Apps or a hybrid of static content hydrated with live data pulled by javascript running inside the browser.
How come this does not affect Google/SEO?
Google’s web crawling infrastructure has had years and years to mature. It has the ability to wait for the page to hydrate — just like a typical user would — before indexing the page.
To be fair, this makes sense. Correctly indexing content is pretty much the only thing Google can do to properly “..organize the world’s information” as they say in their mission statement.
ChatGPT is not a search engine and its crawling infrastructure is very nascent.
At this stage of its life, handling SPAs and other client-side rendered pages is out of scope.
Conclusion
This means that we are at an impasse.
On the one hand, there are millions of SPAs and pages rendered client-side. On the other, ChatGPT is not going immediately implement every gizmo that search engines have added to their web crawler bots.
To get around this, it is likely that users who really, really, really want ChatGPT to ingest the content of an SPA will copy&paste it into their chat conversation.
But that leaves a lot of useful-but-not-essential websites which are SPAs unusable by ChatGPT.
Maybe some startup will emerge whose job is to index SPAs as Google would index them and make them available to ChatGPT.