As promised I have made my embeddings Chrome extension and all of the source code public. The extension allows to create, compare and visualize embeddings for any text data which can be imported via CSV.
Some things I think make this different and worthwhile:
• The extension immediately is usable for some real world SEO problems. Like redirect mapping or comparing and clustering data for content gaps. Though I would recommend to build a production system and not rely on this extension.
•
• There are some great articles and Python code out there. But not everyone feels comfortable with Python or code in general. This is why this is a more hands-on approach with an UI.
•
• Most code relies on commercial APIs which is an additional hurdle for some people. This is why this extension makes use of the free services of HuggingFace.
•
• Additionally: Making use of HuggingFace lets you experiment with different embedding models. (Disclaimer: while it does work there is still some work to make it more bugfree)
•
• I could have deployed this on my own domain (e.g. on ) but then this wouldn’t work offline that easily and people might think they would upload their private data to the internet / me. A real application would reduce this even more but then again it wouldn’t be feasible for me. That’s how I ended up with a Chrome extension.
Some things I’d love to talk about:
• Quality of different models for different use cases
• Approximate nearest neighbour algorithms like HNSW (used in this extension)
• pros & cons of vector quantization
• …