Using Vue.js in a Jupyter notebook

Objectives

  • Generate data using webcrawling with requests from Canada's Top 100
  • Use of Scrapy
  • Use of Pandas
  • Integrate VueJS in a notebook
  • Create simple table with filter functionality

Scraping data

Approach

To scrape the data, we will use the Scrapy library. Instead of writing our own scrapers, it is faster for this tutorial to simply use a proper library that was build to scrape for you.

  1. Load the main page
  2. Find all company links
  3. For each company link, open the corresponding page
  4. For each company page, find all ratings

Markup for companies links

<div id="winners" class="page-section">
...
  <li><span><a target="_blank" href="http://content.eluta.ca/top-employer-3m-canada">3M Canada Company</a></span></li>
...
</div>

This corresponds with the Python code from the CompanySpider class:

for href in response.css('div#winners a::attr(href)').extract():

Markup for ratings

<h3 class="rating-row">
    <span class="nocolor">Physical Workplace</span>
    <span class="rating">
        <span class="score" title="Great-West Life Assurance Company, The's physical workplace is rated as exceptional. ">A+</span>
    </span>
</h3>

Python crawler

The crawler in Scrapy is defined in the following code snippet.

import logging
import scrapy
from scrapy.crawler import CrawlerProcess

class CompanySpider(scrapy.Spider):
    name = "companies"
    start_urls = [
        "http://www.canadastop100.com/national/"
    ]
    custom_settings = {
        'LOG_LEVEL': logging.CRITICAL,
        'FEED_FORMAT':'json',               
        'FEED_URI': 'canadastop100.json' 
    }

    def parse(self, response):
        for href in response.css('div#winners a::attr(href)').extract():
            yield scrapy.Request(response.urljoin(href),
                                 callback=self.parse_company)

    def parse_company(self, response):
        name = response.css('div.side-panel-wrap div.widget h4::text').extract_first()
        for rating in response.css('h3.rating-row')[1:]:
            yield {
                'name': name,
                'title': rating.css('span.nocolor::text').extract_first(),
                'value': rating.css('span.rating span.score::text').extract_first(),
            }

Make sure the output file does not exist in the directory where the script is going to be executed.

rm canadastop100.json

Next we need to define the crawling processor with the following:

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(CompanySpider)
process.start()

Executing this will give the following result:

2017-10-06 12:09:45 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot)
2017-10-06 12:09:45 [scrapy.utils.log] INFO: Overridden settings: {'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'}

Preparing data

import pandas as pd

Read the output file from the scraper.

df = pd.read_json('canadastop100.json')
df.head()
name title value
0 Bell Canada Physical Workplace A+
1 Bell Canada Work Atmosphere & Communications A
2 Bell Canada Financial Benefits & Compensation A
3 Bell Canada Health & Family-Friendly Benefits B
4 Bell Canada Vacation & Personal Time-Off B

Get the unique names in the database.

len(df['name'].unique())
101

Filter out the companies without a title.

df = df[df['title'].notnull()]

The unique elements in the value column are given by

df['value'].unique()

and result in

array(['A+', 'A', 'B', 'B+', 'B-', 'A-', 'C+'], dtype=object)

Lets map these values to a number to make it easier to work them in the dataset. Define the mapping:

mapping = {'A+': 10,
           'A': 9,
           'A-': 8,
           'B+': 7,
           'B': 6,
           'B-': 5,
           'C+': 4}

and apply the mapping to the value column:

df['value'] = df['value'].map(mapping)

Now we need to transpose the dataframe, since we want a matrix with the companies per row and the different scores as a column.

df = df.pivot(index='name', columns='title', values='value')

We add a column to get the total score:

df['Total Score'] = df.sum(axis=1)

The dataframe has the following layout after adding the extra column:

df.head()
title Community Involvement Employee Engagement & Performance Financial Benefits & Compensation Health & Family-Friendly Benefits Physical Workplace Training & Skills Development Vacation & Personal Time-Off Work Atmosphere & Communications Total Score
name
3M Canada Company 10 7 9 9 10 9 6 10 70
Aboriginal Peoples Television Network Inc. / APTN 9 6 7 9 7 9 9 9 65
Accenture Inc. 10 9 7 9 7 7 6 9 64
Agrium Inc. 10 7 7 6 10 10 8 9 67
Air Canada 10 6 9 7 9 10 4 6 61

As a last step we need to attach the dataframe to the body of the notebook by using some JavaScript. We import the proper libraries

from IPython.display import HTML, Javascript, display

and attach the dataframe, after converting, to the window.

Javascript("""ยง
           window.companyData={};
           """.format(df.reset_index().to_json(orient='records')))
<IPython.core.display.Javascript object>

Write to JSON file on disk if you want. This can be used in turn to move to the server where the VueJS application will be deployed.

df.reset_index().to_json('canadastop100.json', orient='records')

Visualizing data

Next step is to visualize the data using VueJS. VueJS can be included from https://cdnjs.cloudflare.com/ajax/libs/vue/2.4.0/vue. This notebook will make use of the example of the grid-component from the official documentation to create a table representing the crawled data.

Add the requirement to the notebook.

%%javascript
require.config({
    paths: {
        vue: "https://cdnjs.cloudflare.com/ajax/libs/vue/2.4.0/vue"
    }
});
<IPython.core.display.Javascript object>

Define the template for displaying the data in a table using the x-template script type and the VueJS syntax.

%%html
<script type="text/x-template" id="data-template">
  <table class="canada">
    <thead>
      <tr>
        <th v-for="key in columns"
          @click="sortBy(key)"
          :class="{ active: sortKey == key }">
          {{ key | capitalize }}
          <span class="arrow" :class="sortOrders[key] > 0 ? 'asc' : 'dsc'">
          </span>
        </th>
      </tr>
    </thead>
    <tbody>
      <tr v-for="entry in filteredData">
        <td v-for="key in columns">
          {{entry[key]}}
        </td>
      </tr>
    </tbody>
  </table>
</script>

Define the main HTML that contains the template we defined earlier.

%%html
<div id="vue-app">
  <form id="search">
    Search <input name="query" v-model="searchQuery">
  </form>
  <data-grid
    :data="gridData"
    :columns="gridColumns"
    :filter-key="searchQuery">
  </data-grid>
</div>

Initialize the VueJS application using Javascript by extracting the data from the window, attaching the component with the table for the data and creating a new Vue instance.

%%javascript
require(['vue'], function(Vue) {
    console.log(Vue.version);
    var companyData = window.companyData;
    console.log(JSON.stringify(companyData));
    Vue.component('data-grid', {
      template: '#data-template',
      props: {
        data: Array,
        columns: Array,
        filterKey: String
      },
      data: function () {
        var sortOrders = {}
        this.columns.forEach(function (key) {
          sortOrders[key] = 1
        })
        return {
          sortKey: '',
          sortOrders: sortOrders
        }
      },
      computed: {
        filteredData: function () {
          var sortKey = this.sortKey
          var filterKey = this.filterKey && this.filterKey.toLowerCase()
          var order = this.sortOrders[sortKey] || 1
          var data = this.data
          if (filterKey) {
            data = data.filter(function (row) {
              return Object.keys(row).some(function (key) {
                return String(row[key]).toLowerCase().indexOf(filterKey) > -1
              })
            })
          }
          if (sortKey) {
            data = data.slice().sort(function (a, b) {
              a = a[sortKey]
              b = b[sortKey]
              return (a === b ? 0 : a > b ? 1 : -1) * order
            })
          }
          return data
        }
      },
      filters: {
        capitalize: function (str) {
          return str.charAt(0).toUpperCase() + str.slice(1)
        }
      },
      methods: {
        sortBy: function (key) {
          this.sortKey = key
          this.sortOrders[key] = this.sortOrders[key] * -1
        }
      }
    })

    var vueApp = new Vue({
      el: '#vue-app',
      data: {
        searchQuery: '',
        gridColumns: Object.keys(companyData[0]),
        gridData: companyData
      }
    })

});
<IPython.core.display.Javascript object>

Attach a style to make the table more attractive.

%%html
<style>
table.canada {
  border: 2px solid rgb(102, 153, 255);
  border-radius: 3px;
  background-color: #fff;
}

table.canada th {
  background-color: rgb(102, 153, 255);
  color: rgba(255,255,255,0.66);
  cursor: pointer;
  -webkit-user-select: none;
  -moz-user-select: none;
  -ms-user-select: none;
  user-select: none;
}

table.canada td {
  background-color: #f9f9f9;
}

table.canada th, table.canada td {
  min-width: 120px;
  padding: 10px 20px;
}

table.canada th.active {
  color: #fff;
}

table.canada th.active .arrow {
  opacity: 1;
}

.arrow {
  display: inline-block;
  vertical-align: middle;
  width: 0;
  height: 0;
  margin-left: 5px;
  opacity: 0.66;
}

.arrow.asc {
  border-left: 4px solid transparent;
  border-right: 4px solid transparent;
  border-bottom: 4px solid #fff;
}

.arrow.dsc {
  border-left: 4px solid transparent;
  border-right: 4px solid transparent;
  border-top: 4px solid #fff;
}
</style>

The result can also be tested on the jsfiddle that I have created. The source for the page can be found in my Vue repository and is visible on my bl.ocks.org. The notebook can be found on my Github and the final result is shown on this page.