The Tale of Identity Graph and Identity Resolution | RudderStack

neub9
By neub9
4 Min Read


Game Analytics for Mobile using RudderStack

Our previous article on Game Analytics for Mobile demonstrated the development of an open-source analytics solution using RudderStack. It emphasized the importance of understanding user behavior in analysis. Understanding users is essential in tying events or activities to the individual users generating those events. Analytics platforms play a crucial role in collecting this data. Nevertheless, the complexity arises when users browse anonymously or utilize multiple identities with the same product or across different devices and channels.

One of the challenges in analytics is the problem of ID resolution. This refers to the ability to tie different identities together in a privacy-preserving way. An identity graph helps in understanding these relationships better. This blog is the first part of a two-part series on the topic. It describes the problem, while the second part will delve into possible approaches to solving it.

A Real-Life Scenario

To further illustrate the problem, let’s consider a sample user journey on an eCommerce platform.

  • The user visits the website on a laptop, browses anonymously, and leaves without making a purchase.
  • The user installs the eCommerce application on a mobile device, logs in with a phone number but doesn’t make a purchase.
  • Finally, the user returns to the website, makes a purchase, and registers with an email and provides a phone number to receive notifications.

Different User Identifiers

For the eCommerce platform using RudderStack to collect user activity data, each event is associated with a user identifier. This identifier could be explicitly set by the application or auto-assigned by RudderStack. Each event in the sample journey is associated with a specific identifier which changes throughout the user’s interaction with the platform.

The Identity Graph

An identity graph represents the association formed by these identifiers. The graph enables the visualization of how different identities are tied together, leading to a better understanding of the user’s behavior across different devices and channels.

The Identity Graph is not Static

The associations within the identity graph are not static, as new identifiers may be created when a user interacts with a platform using different devices. These associations need to be constantly tracked and managed.

Assigning Virtual IDs

The goal of ID resolution is to associate virtual user IDs with all the nodes in the identity graph. This association ensures that nodes connected to each other, directly or indirectly, are assigned the same virtual ID. This process can be achieved by running the connected component algorithm on the identity graph.

Conclusion

Handling the identity graph in real-life applications can be complex due to the large number of nodes and the evolving nature of the associations. Using a connected component algorithm and a query language, such as SQL, can be effective in managing and analyzing the identity graph at scale.

For more on this topic, stay tuned for Part II, where we will discuss a solution for finding connected components using SQL.

Sign up for Free and Start Sending Data

Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app. Get started.


Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *