The Enterprise Brain is powered by our proprietary vector similarity engine called TopicBrain.

  • Topic encodes data as high-dimensional vectors that can be joined together, indexed, searched, and compared using the top-performing nearest neighbors algorithm.

  • It finds clusters within large datasets and extracts human-readable topics from text, enabling previously impossible workflows for information discovery.

  • The API offers a wide range of functionality for data scientists who want to integrate their own tools.

In machine learning, there is “No Free Lunch” when it comes to models, no one-size-fits-all solution to all problems. For a given piece of text, there is no singular “meaning” that can be represented as a vector. So we designed TopicBrain to be flexible. General-purpose and subject expert BERT models are provided out-of-the-box, and any model with a transform() method can be integrated and used to create vectors. Additionally, TopicBrain supports weighted vector concatenation, meaning that multiple models (and multiple modalities) can be used simultaneously on the same data.

Vectors of all kinds are made searchable by building Approximate Nearest Neighbors indexes. TopicBrain uses ScANN, which boasts the highest Recall/Speed tradeoff among ANN algorithms, to provide lightning-fast results searching over billions of documents. To build indexes in a fast and highly available way, TopicBrain leverages auto-scaling and shards its indexes across multiple nodes. This combination of speed and accuracy means that users can get results faster and enable new workflows more easily.

Organizations need their data to be discoverable the day that it enters the system, and their resources need to scale with the amount of data streaming in. TopicBrain stores content and vectors in a multi-tenant SQL database with a custom graph schema. Boards created through the user interface are linked, through this graph, to the pieces of content that define them. New content added to the database is automatically vectorized and integrated into existing indexes.

The TopicBrain API is available to application developers with access to the cluster endpoints. This API can infer vectors and topics for new text, list available indexes, build new distributed indexes, and find Nearest Neighbors for primary keys in the database. Because the API exposes only vectors and primary keys for content, and not the content itself, developers can build on it with whatever limited database permissions they have.