Derecho: Group communication at the speed of light

Version 0.9.1

Derecho is a new distributed system out of Cornell University

Categorizer tier functionality

The final file we’ll inspect is src/derecho-component/categorizer_tier.cpp, containing the implementation of the categorizer tier. Since we’re only focusing on the Derecho-specific functionality, we’ll choose to focus on install_model and ordered_install_model. We’ll also briefly explain the inference method, which calls out to MxNet to do the heavy lifting.

Install model

This class has chosen to maintain a simple std::map<uint32_t, Model> associating each Model’s user-specified tag with the Model data. All the install_model functionality needs to ultimately do is update this map. First, we’ll look at the P2P entry point, install_model:

int CategorizerTier::install_model(const uint32_t& tag,
                                   const ssize_t& synset_size,
                                   const ssize_t& symbol_size,
                                   const ssize_t& params_size,
                                   const BlobWrapper& model_data) {
    int ret = 0;
    auto& subgroup_handler = group->template get_subgroup<CategorizerTier>();
    // pass it to all replicas
    derecho::rpc::QueryResults<int> results = subgroup_handler.ordered_send<RPC_NAME(ordered_install_model)>(
            tag, synset_size, symbol_size, params_size, model_data);
    // check results
    decltype(results)::ReplyMap& replies = results.get();
    for(auto& reply_pair : replies) {
        int one_ret = reply_pair.second.get();
        if(one_ret != 0) {
            ret = one_ret;
        }
    }
    return ret;
}

This takes quite a few parameters! The first interesting line creates a subgroup_handler:

 auto& subgroup_handler = group->template get_subgroup<CategorizerTier>();

Similar to the logic from the function tier, this handler will serve as an entry point for RPC calls. Unlike the external handler that the function tier acquired, this internal group handler can only be used by nodes which belong to the selected group, and can be used to invoke ordered_send. We’ll use it to forward our received P2P request to the entire group.

We go ahead and immediately use this handler to call ordered_install_model:

    derecho::rpc::QueryResults<int> results = subgroup_handler.ordered_send<RPC_NAME(ordered_install_model)>(

This code uses the subgroup_handler to consistently invoke ordered_install_model on every member of this shard. It uses Derecho’s paxos-like virtual synchrony feature to ensure that every node of the shard receives the invocation at exactly the same logical time, and that the invocation cannot run concurrently with any other ordered_send invocation. It stores the QueryResults of this invocation in the results object.

It next inspects these results to see if any of the members of this shard returned an unexpected result:

    decltype(results)::ReplyMap& replies = results.get();
    for(auto& reply_pair : replies) {
        int one_ret = reply_pair.second.get();
        if(one_ret != 0) {
            ret = one_ret;
        }
    }

This QueryResult<int> object will contain a future corresponding to each shard member’s individual returned result. These should all be identical; as our demo has been designed to only mutate the categorizer tier via ordered_send, all of our shard members should have identical state. Nonetheless errors can happen—for example via a shard running out of memory or a non-deterministic internal MxNet error—so we would be remiss not to check the return results, just in case.

ordered_install_model

We’ve finally reached the end of the forwarding chain for the install_model call. Let’s look at the implementation of ordered_install_model (with debugging and printing stripped out):

int CategorizerTier::ordered_install_model(const uint32_t& tag,
                                           const ssize_t& synset_size,
                                           const ssize_t& symbol_size,
                                           const ssize_t& params_size,
                                           const BlobWrapper& model_data) {
    Model model;
    model.synset_size = synset_size;
    model.symbol_size = symbol_size;
    model.params_size = params_size;
    model.model_data = Blob(model_data.bytes, model_data.size);
    raw_models.emplace(tag, model);
    return 0;
}

That’s pretty simple! All we’re doing is initializing a Model struct and placing it into the raw_models map. Simple!

Inference

By examining the install model code, we’ve seen examples of every Derecho feature used in this demo. But for completeness, let me briefly explain how the inference function works.

Guess CategorizerTier::inference(const Photo& photo) {
    std::shared_lock read_lock(inference_engines_mutex);
    if(inference_engines.find(photo.tag) == inference_engines.end()) {
        if(raw_models.find(photo.tag) == raw_models.end()) {
            Guess guess;
            return guess;
        }
        read_lock.unlock();
        try {
            std::unique_ptr<InferenceEngine> engine = std::make_unique<InferenceEngine>(raw_models[photo.tag]);
            std::unique_lock write_lock(inference_engines_mutex);
            inference_engines[photo.tag] = std::move(engine);
            return inference_engines[photo.tag]->inference(photo);
        } catch(...) {
            Guess guess;
            return guess;
        }
    }

    return inference_engines[photo.tag]->inference(photo);
}

This demo is designed to lazily initialize the underlying MxNet models. The inference code first checks to see if any model with a matching tag has been installed at all; if not, it returns the default guess. Next it checks to see if the installed model has been explicitly initialized as an MxNet object yet. If not, it initializes it; if anything goes wrong, it just returns the default guess. Finally, if all has gone well to this point, it calls out to the MxNet model to actually perform the inference.

And that’s it! You now understand the Derecho components of this demo code. If you have any questions, please feel free to post an issue on the demo’s github page or email one of the Derecho paper’s authors

Last updated on 22 Oct 2019
Published on 22 Oct 2019
Edit on GitHub