Tumblr Engineering — Hello! I’ve noticed that Tumblr now uses React. I…


The reblog is a beautiful thing unique to Tumblr – often imitated, but never successfully reproduced elsewhere. The reblog puts someone else’s post on your own Tumblr blog, acting as a kind of signal boost, and also giving you the ability to add your own comment to it, which your followers and anyone looking at the post’s notes will see. Reblogs can also be reblogged themselves, creating awesome evolving reblog trails that are the source of so many memes we love. But what is a reblog trail versus a reblog tree, and how does it all work under the hood?

A “reblog tree” starts at the original post (we call it the “root post” internally at Tumblr) and extends outwards to each of its reblogs, and then each reblog of those reblogs, forming a tree-like structure with branches of “reblog trails”. As an example, you can imagine @staff​ making a post, and then someone reblogging it, and then others reblogging those reblogs. I can even come through and reblog one of the reblogs:

A “reblog trail” is one of those branches, starting at the original post and extending one at a time down to another post. In the reblog trail, there may actually be some reblogs that added their own content and some that didn’t – reblogs that added content are visible in the trail, while the intermediate ones that didn’t may not be visible.

image

You’ll notice that the reblog trail you’re viewing somewhere (like on your dashboard) doesn’t show all of this reblog tree – only part of it. If you open up the notes on any wildly popular post, you’ll probably see lots of reblogs in there that you aren’t seeing in your current view of the post’s reblog trail. The above diagram shows the whole reblog tree (which you don’t see) and the current reblog trail you’re actually viewing (in orange). If you want to visualize a post’s entire reblog tree, the reblog graphs Tumblr Labs experiment shows off these reblog trees and trails as kind of big floppy organisms. They’re a useful visualization of how content percolates around Tumblr via reblogs. You can turn on the experiment and see it on web only right now, but here’s an example:

image

The tiny orange dot is the post we’re viewing, and the green line is a reblog trail showing how the post got reblogged along many blogs. And there are tons of other branches/trails from the original post, making dozens of different reblog trails. This is a much larger, more realistic example than my simplified diagrams above. You can imagine that my diagram above is just the start of one of these huge reblog trees, after more and more people have reblogged parts of the existing tree.

Storing Reblog Trail Information

The way we actually store the information about a reblog and its trail has changed significantly over the last year. For all posts made before this year, all of a post’s content was stored as a combination of HTML and properties specific on our Post data model. A specific reblog also stored all of the contents of its entire reblog trail (but not the whole reblog tree). If you have ever built a theme on Tumblr or otherwise dug around the code on a reblog, you’ll be familiar with this classic blockquote structure:

<p><a class="tumblr_blog" href="http://maria.tumblr.com/post/5678">maria</a>:</p>
<blockquote>
    <p><a class="tumblr_blog" href="http://cyle.tumblr.com/post/1234">cyle</a>:</p>
    <blockquote>
        <!-- original post content -->
        <p>look at my awesome original content</p>
    </blockquote>
    <!-- the reblog of the original post's content -->
    <p>well, it's just okay original content</p>
</blockquote>
<!-- this is the new content, added in our reblog of the reblog -->
<p>jeez. thanks a lot.</p>

This HTML represents a (fake) old text post. The original post is the blockquote most deeply nested in the HTML: “look at my awesome original content” and it was created by cyle. There’s a reference to the original post’s URL in the anchor tag above its blockquote tag. Moving out one level to the next blockquote is a reblog of that original post, made by maria, which itself adds some of its own commentary to the reblog trail. Moving out furthest, to the bottom of the HTML, is the latest reblog content being added in the post we’re viewing. With this structure, we have everything we need to show the post and its reblog trail without having to load those posts in between the original and this reblog.

If this looks and sounds confusing, that’s because it is quite complex. We’re right there with you, but the reasons behind using this structure were sound at the time. In a normal, traditional relational database, you’d expect something like the reblog trail to be represented as a series of references: a reblog post references its parent post, root post, and any intermediate posts, and we’d load those posts’ contents at runtime with a JOIN query or something very normalized and relational like that, making sure we don’t copy any data around, only reference it.

However, the major drawback of that traditional approach, especially at Tumblr’s scale, is that loading a reblog could go from just one query to several queries, depending on how many posts are in the reblog trail. Some of the reblog trails on Tumblr are thousands of posts long. Having to load a thousand other posts to load one reblog would be devastating. Instead, by actually copying the reblog trail content every time a reblog is made, we keep the number of queries needed constant: just one per post! A dashboard of 20 reblogs loads those 20 posts, not a variable amount based on how many reblogs are in each post’s trail. This is still an oversimplification of what Tumblr is really doing under the hood, but this core strategy is real.

Broken Reblog Trails

There is another obvious problem with the above blockquote/HTML strategy, one that you may have not realized you were seeing but you’ve probably experienced it before. If the only reference we have in the reblog trail above is a trail post’s permalink URL, what happens if that blog changes its name? Tumblr does not go through all posts and update that name in every copy of every reblog that blog has ever been involved in. Instead, it gracefully fails, and you may see a default avatar there as a placeholder. We literally don’t have any other choice, since no other useful information is stored with the old post content.

At worst, someone else takes the name of a blog used in the trail. Imagine if, in the above example, oli changed his blog name to british-oli and someone else snagged the name oli afterwards. Thankfully in that case, the post URL still does not work, as the post ID is tied to the old oli blog. The end result is that it looks like there’s a “broken” item in the reblog trail, usually manifesting as the blog looking deactivated or otherwise not accessible. This isn’t great.

As a part of the rollout of the Neue Post Format (NPF), we changed how we store the reblog trail on each post. For fully NPF reblog trails, we actually do store an immutable reference to each blog and post in the trail, instead of just the unreliable post URL. This allows us to have a much lower failure rate when someone changes their blog name or otherwise becomes unavailable. We keep the same beneficial strategy of usually having all the information we need so we don’t need to load any of those posts along the trail, but the option to load the individual post or blog is there if we absolutely need it, especially in cases like if one of those blogs is somebody you’re blocking.

If you’ve played around with reblog trails in NPF, you’ll see the result of this change. The reblog trail is no longer a messy nested blockquote chain, but instead a friendly and easy to parse JSON array, always starting with the original post and working down the trail. This includes a special case when an item in the trail is broken in a way we can’t recover from, which happens sometimes with very old posts.

The same reblog trail and new content as seen above, but in the Neue Post Format:

{
    "trail": [
        {
            "post": {
                "id": "1234",
            },
            "blog": {
                "name": "cyle"
            },
            "content": [
                {
                    "type": "text",
                    "text": "look at my awesome original content"
                }
            ],
            "layout": []
        },
        {
            "post": {
                "id": "3456",
            },
            "blog": {
                "name": "maria"
            },
            "content": [
                {
                    "type": "text",
                    "text": "well, it's just okay original content"
                }
            ],
            "layout": []
        }
    ],
    "content": [
        {
            "type": "text",
            "text": "jeez. thanks a lot."
        }
    ]
}

Got questions?

If you’ve ever wondered how something works on Tumblr behind the scenes, feel free to send us an ask!

– @cyle 



Source link