Learning From Structure: Discord’s Entity-Relationship Embeddings

https://discord.com/blog/learning-from-structure-discords-entity-relationship-embeddings · scraped

What is DERE? DERE is the mechanism Discord uses to build meaningful representations from raw data. At its core, DERE pre-trains embeddings for each user, guild, game, and various other entities. Effectively, it maps entity IDs, like guild ID or game IDs, to a vector which can then be used in various ways. DERE relies solely on social graph-based features, such as relationships between users and their interactions within the platform (e.g. what guilds you’re in). If you re-imagine the NLP example above and tilt your head slightly, you could sort of make a sentence out of this… maybe something like: “Nelly is friends with Clyde.” In DERE, our setup is pretty much exactly like this! Nelly->is_friend->Clyde. While simple, this is very powerful at scale.Under the hood, DERE uses an unsupervised machine learning technique known as “contrastive learning,” which trains on triplets of head-relation-tail (h, r, t) examples. The data used in DERE is broken down into these triples, which our ML models can use to unravel the relationships and build useful representations.Examples of (h, r, t) triples include:An example of what the model sees during training time is:(h, r, t) = (661027446241361930, 17, 974519864045756446)Where this particular example is the edge between my user ID and the OpenAI server ID. Relation 17 (at the time of writing) is the “user_in_guild” relationship.Training is thus two embedding lookups: one for the embedding for my user ID and the other for the embedding of the server ID. The relation ID is then used to choose which model we’ll use to transform these entities into the same space:Our positive examples are all of the edges that exist between any two entities in our graph, such as my user ID and a guild that I’m in. Negative examples are constructed on-the-fly during train time by randomly corrupting positive examples. So continuing the example above, an example would be my user id in guild <xyz> where I’m not actually in that guild. Because our training data is massive, corrupting positive edges is a safe operation. To give an idea of how big these graphs can get, we operate on billions of entities and tens of billions of relationships.Our loss function during training is a ranking loss called triplet margin loss which optimizes related entities to be nearby each other in their embedding space, and unrelated entities to be further away from each other. We could also use logistic or softmax loss, depending on use case.Continuing from the above example, my (h, r, t) triple is considered a positive example since it exists in the training data. If we corrupted the edges within the batch, would could wind up with a negative example like(h, r, t) = (661027446241361930, 17, 560127830160048128)Where the tail was randomly selected and is a Rust server I’m not a member of.Now we can take the positive and negative examples and calculate our loss:We use this loss to update the learned transformation as well as the embeddings themselves.

▼

Scraped Content

— 493 words · 2026-02-14 02:59:43 UTC ·

Excerpt

Visibility

Visible to everyone

Reading Status

Related Bookmarks

My Note

Saved!

Annotations

Export as Markdown