Tutorial

Create a function to crawl a web page and extract links

In this tutorial, you will learn how to write a simple function to crawl a web page and extract links in Rust language and run on Planetr decentralized cloud using Planetr gateway CLI interface.

This tutorial is aimed at someone who has basic knowledge of Function-as-a-service (AWS Lambda, Google Functions etc..) and wants to get a basic and quick understanding of Planetr decentralized functions. We will not be going into depth about the intricacies of developing on Planetr, but will hopefully satisfy your curiosity so that you will continue your journey!

This tutorial should take you about 10 minutes to complete.

We only expect that:

  • You are generally familiar with software development and command line interfaces.
  • You are generally familiar with Rust programming language.
  • You are open to learning about the bleeding edge of decentralized development.

Install and run Planetr Gateway on your computer

To install Planetr gateway on your computer, please read the installation section. This can be achieved on a laptop, a desktop computer, or a cloud instance.

Make sure Planetr gateway is installed and running on your computer.

$ planetr -v

Create a web assembly

Follow this link to create a web assembly binary (.wasm file). Provide the project name as web-crawler.

The Rust code to crawl the web page

Edit web-crawler/src/handler.rs and put the below rust code.


#![allow(unused)]
fn main() {
use serde::{Deserialize, Serialize};
use planetr::wasm::Context;
use planetr::wasm::PlanetrError;
use std::collections::HashSet;
use select::document::Document;
use select::predicate::Name;

#[derive(Deserialize, Serialize)]
pub struct InputPayload {
    url: String,
}

#[derive(Deserialize, Serialize)]
pub struct OuputPayload {
   links: HashSet<String>,
}

pub fn handle_req(args: InputPayload, ctx: Context) -> Result<OuputPayload, PlanetrError> {
    //error condition
    if args.url == "" {
        return Err(PlanetrError::new("url cannot be empty"));
    }

    ctx.log(args.url.to_string());
    
    let html = match ctx.http_get(args.url){
        Ok(html) => html,
        Err(err) => return Err(err)
    };

    if html.len() == 0 {
        return Err(PlanetrError::new("HTML empty"));
    }

    let found_urls = Document::from(html.as_str())
        .find(Name("a"))
        .filter_map(|n| n.attr("href"))
        .map(str::to_string)
        .collect::<HashSet<String>>();
    
    Ok(OuputPayload{
        links: found_urls,
    })
}
}

Now build the wasm binary.

$ cd web-crawler
$ wasm-pack build

After completing the steps, you should be having a wasm file named web-crawler.wasm.

Deploy the 'web-crawler' function

Deploy the web-crawler function on Planetr network using func-create command. You can provide any name for the function using -n parameter.

$ planetr func-create web-crawler.wasm -n web-crawler
INSTANCE ID            NAME         ENDPOINT                                       ERROR  
c2mfref2hraufcj56qeg   web-crawler  http://localhost:7001/f/c2mfref2hraufcj56qeg          

Please note the endpoint URL.

Run the 'web-crawler' function using CLI

You can run the web-crawler function using planetr func-run command.

$ planetr func-run c2mfref2hraufcj56qeg -a "{\"url\":\"https://planetr.io\"}" -p
{
    "code": 200,
    "log": "begin\nhttps://planetr.io\nend\n",
    "payload": {
        "links": [
            "index.php",
            "why-planetr.php",
            "getstarted.php",
            "docs/usecases.html",
            "https://planetr.io",
            "docs/providers.html",
            "https://github.com/planetrio",
            "pricing.php",
            "products.php",
            "docs/index.html"
        ]
    }
}

Where c2mfref2hraufcj56qeg is the Instance Id of the sum function created earler. It will be different for you.

Run the 'sum' function using API

curl \
  --header "Content-type: application/json" \
  --request POST \
  --data '{"url" : "https://planetr.io"}' \
  http://localhost:7001/f/c2mfref2hraufcj56qeg

Thank you for trying the tutorial.