Lens Content Recommendations

Recommendations for relevant and engaging content to help with discovery

Karma3 Labs (K3L) has implemented a set of algorithms to help the Lens Protocol community to build amazing user experiences with highly engaging and relevant content. The algorithms are implemented on a highly scalable compute infrastructure that leverages EigenTrust, a foundational algorithm co-authored by one of K3L's co-founders, Sep Kamvar.

We offer social applications a collection of ‘feed’ algorithms which are designed to provide users with relevant and personalized content suggestions. Developers can choose the algorithms that best fit their needs and audience. Soon, developers will also be able to choose various parameters to tune a specific algorithm to their need.

We combine graph based reputation algorithms like EigenTrust with Machine Learning to create a set of custom feeds. Our feed algorithms use EigenTrust-based reputation scores of users/profiles as foundational signals to surface good quality posts, while reducing spam or users and content.

Our current APIs provide global content and personalized algorithms:

  1. Recent - Recent posts sorted by time of posting.

  2. Popular - Posts sorted by top ranked profiles engagement.

  3. Recommended - New and interesting content powered by AI + EigenTrust.

  4. Crowdsourced (Hubs and Authorities) - posts that garnered much interest interactions weighted by the reputation of the interacting parties.

  5. Following - Posts authored or engaged by people you Follow.

Non-personalized Algorithms

Apps can choose from a collection of algorithms that generate a list of the most relevant posts to recommend to all users irrespective of their network or activity on Lens protocol.

1. Recent

This strategy offers users a real-time view of the latest content shared across the platform.

We provide a feed of the most recent posts, arranged in descending order of their posting time. We query the Lens Public BigQuery dataset every hour and fetch posts ordered by their block timestamp.

Configurable Parameters

Depending on the specific needs, we can tune the feed by making any of the following changes:

  1. Frequency of pull from Lens BigQuery — as quick as every 15 mins, to daily

  2. Filtering for type of posts — we can filter a combination of memes/photos and texts, etc.

  3. Filtering for profiles — based on creation age of the profile of the age of the authors

This strategy is designed to highlight the most recent viral posts by the most popular profiles. We compute EigenTrust scores for every single profile. We then combine the trust scores of profiles with the publication stats (number of comments, mirrors, collects and age) of every post authored by each profile to produce a recommendation score for every post.

The posts with top scores and created within the last 14 days are then recommended to users. The formula used to combine scores is a linear equation:

10 * profile_global_trust_score \\
+ 2 * comments_count/max(comments_count) \\
+ 5 * mirrors_count/max(mirrors_count) \\ 
+ 3 * collects_count/max(collects_count) \\
- 2 * age_in_days/max(age_in_days)

# limit posts per profileId < 10

Configurable Parameters

The algorithm can be customized to fit the needs of a specific front-end by changing any of the following:

  1. Weights of features (coefficients in the linear equation above)

  2. Normalization method (max normalizer in the equation above)

  3. EigenTrust scoring strategy (refer to this doc)

This Machine Learning (ML) Recommended Strategy is designed to help users discover new and interesting posts that may not be viral.

We start with the EigenTrust scores of profiles and fetch the most engaging posts of the top scoring profiles. We assume that these posts are of high quality because they are the most popular posts of the most reputable profiles in the network. We then use the features of the posts and the profiles to train an ML model (gradient boosted decision tree) that is used to classify all posts.

Model Training

Feature Computation:

Before training the model, we compute some features - the following is pseudo-code explanation for some of the computed features :

post_score = 1 * mirrors_count/max(mirrors_count, profile_id) \\
           + 1 * collects_count/max(collects_count, profile_id) \\
           + 1 * comments_count/max(comments_count, profile_id) \\
           - 5 * age/max(age, profile_id)

followship_score = eigentrust_score(profile, strategy='followship')
followship_rank = eigentrust_rank(profile, strategy='followship')

is_original = post.is_related_to_post.isNull() \\
            & post.is_related_to_comment.isNull()

Training Set

To train the model, we create a training set using the following pseudo-code:

select random dataset of posts
top_1000post =  sort(post_score) and pick top 1000 where followship_rank <= 1000

for each post:
	if rank of author >= 50000, recommend = 'NO'
	if rank of author <= 1000:
		if post_score is top_1000post, recommend = 'YES' 
		else recommend = 'MAYBE' 

Model Predictions

With the trained model we evalu1. ate all posts to determine whether a post should be recommended or not. We classify posts into 3 classes - YES, MAYBE, and NO.

With posts classified, we then sample recommendable (YES and MAYBE) posts from different time windows (pseudo-code below):

sample(1day_old_post, 100)
sample(7days_old_post, 50)
sample(30days_old_post, 50)
sample(gt_30days_old_post, 50)

With the samples taken, we assign a letter grade (A, B, C)to each post using the following heuristic:

engagement_score = (1 * num_upvotes) + (3 * num_mirrors) + (5 * num_comments)
grade_A = top(engagement_score, 33_pct) -> high engagement metrics
grade_B = middle(engagement_score, 33_pct) -> medium engagement metrics
grade_C = bottom(engagement_score, 33_pct) -> low engagement metrics

Finally, we take another sample of 100 posts using a combination of the engagement grade (A/B/C) and the recommendation class (YES/MAYBE). The sample has the following composition:

40% of posts with a YES recommendation and A grade
30% of posts with a YES recommendation and B grade 
10% of posts with a YES recommendation and C grade
20% of posts with a MAYBE recommendation irrespective of grade

Configurable Parameters

To fit the needs of different clients that we partner with, we can tune the algorithm with the following changes:

  1. Adjust the criteria for the training set (refer to this section)

  2. Change the sampling percentages to bias towards newer or older posts

  3. Change the engagement score heuristic and/or the binning logic to change the letter grades assigned to posts.

  4. Changing the sampling logic that combines recommendation class and letter grade.

  5. Changing the hyper-parameters used to train the XGBoost model

4. Crowdsourced

The Crowdsourced approach, based on Hubs and Authorities, assigns trust scores to both profiles and publications (posts, mirrors, and comments). It uses some of the same input as the Popular approach: Relationship established between profiles and publications via actions:

  • Authoring (writing)

  • Commenting

  • Mirroring

  • Liking

Unlike the Popular approach above, where profile scores are first calculated (see ℹ️) then used to recommend publications of highly scored profiles, Crowdsourced recognizes publications as first-class trust peers for the purpose of EigenTrust that can gain trust from/impart trust to profiles.

ℹ️ Profile score calculations are done by aggregating and reducing profile–publication relationships to profile–profile relationships using publication-to-author relationship as the reduction path

The profile–publication actions are no longer aggregated, but are used directly as peer-to-peer local trust arcs (since profiles and publications are both peers in this case):

The key mechanism for recursive trust calculation here is credit trust, gained via the orange action arc in the table/diagram above. By doing this, we make both publications and profiles a trust-gathering vehicle for each other:

  • Publication gains interaction trusts from other people, then imparts/back-channels the gained trust back to its author as credit trust.

  • Profile gains back-channeled credit trust from popular (higher-trust) publications they authored, then imparts the gained trust to their less popular (lower-trust) publications as authorship trust.

Likes are special, in that each publication records only the number of likes it has received, and not the actual list of likes associated with the actor, so they cannot be modeled as local trust arc (the actor part is missing). Instead, we use likes as a basis of pre-trust: Highly liked posts receive high pre-trust.

In order to reduce computation burden, likes have a threshold of 10: Only the number of likes above that threshold matter: That is, publications with up to 10 likes receive no pre-trust; 11 likes result in unit weight of pre-trust, 12 likes double unit weight of pre-trust, and so on. Alpha value (of EigenTrust) is used to limit the efficacy of pre-trust, compared to other local trust arcs.

The following table summarizes the tuning parameters:

Personalized Algorithms

Non-personalized algorithms are very good at allowing users to discover not only new content but also new profiles to follow and interact with, through the content. However, sometimes users want to engage with content that matches closely with their preferences and interests. For situations like this, we provide a personalized ‘following’ feed that front-ends can integrate with.

5. Following

This strategy aims to surface content posted by the profiles that a given user follows.

We collect all posts created within the last 14 days and filter out posts that do not have any engagement (defined as mirrors + comments + collects). We then compute a score for each post combining the different publication stats about the post and the EigenTrust score of the profile that authored the post.

The post score computation uses the following equation:

post_score = 2 * comments_count/max(comments_count) \\ 
           + 5 * mirrors_count/max(mirrors_count) \\ 
           + 3 * collects_count/max(collects_count) \\
           - 2 * age/max(age) \\
           + 10 * profile_trust_score

# limit posts per profileId < 10

We then order the posts by the post_score but giving preference to posts by profiles that are being followed by the given user:

ORDER BY following_or_not DESC, post_score DESC
-- following_or_not is 1 if the user is a follower and 0 otherwise

Configurable Parameters

This strategy can be tuned to a front-end’s specific needs by changing the following:

  1. Weights of features (coefficients in the post_score equation above)

  2. Normalization method (max normalizer in the post_score equation above)

  3. EigenTrust scoring strategy (refer to this doc)

For a deep dive into all of the algorithms discussed above, check out our blog at https://karma3labs.com

Last updated