Using Vue.js in a Jupyter notebook

Thu 05 October 2017

Objectives

Generate data using webcrawling with requests from Canada's Top 100
Use of Scrapy
Use of Pandas
Integrate VueJS in a notebook
Create simple table with filter functionality

Scraping data

Approach

To scrape the data, we will use the Scrapy library. Instead of writing our own scrapers, it is faster for this tutorial to simply use a proper library that was build to scrape for you.

Load the main page
Find all company links
For each company link, open the corresponding page
For each company page, find all ratings

Markup for companies links

<div id="winners" class="page-section">
...
  <li><span><a target="_blank" href="http://content.eluta.ca/top-employer-3m-canada">3M Canada Company</a></span></li>
...
</div>

This corresponds with the Python code from the CompanySpider class:

for href in response.css('div#winners a::attr(href)').extract():

Markup for ratings

<h3 class="rating-row">
    <span class="nocolor">Physical Workplace</span>
    <span class="rating">
        <span class="score" title="Great-West Life Assurance Company, The's physical workplace is rated as exceptional. ">A+</span>
    </span>
</h3>

Python crawler

The crawler in Scrapy is defined in the following code snippet.

import logging
import scrapy
from scrapy.crawler import CrawlerProcess

class CompanySpider(scrapy.Spider):
    name = "companies"
    start_urls = [
        "http://www.canadastop100.com/national/"
    ]
    custom_settings = {
        'LOG_LEVEL': logging.CRITICAL,
        'FEED_FORMAT':'json',               
        'FEED_URI': 'canadastop100.json' 
    }

    def parse(self, response):
        for href in response.css('div#winners a::attr(href)').extract():
            yield scrapy.Request(response.urljoin(href),
                                 callback=self.parse_company)

    def parse_company(self, response):
        name = response.css('div.side-panel-wrap div.widget h4::text').extract_first()
        for rating in response.css('h3.rating-row')[1:]:
            yield {
                'name': name,
                'title': rating.css('span.nocolor::text').extract_first(),
                'value': rating.css('span.rating span.score::text').extract_first(),
            }

Make sure the output file does not exist in the directory where the script is going to be executed.

rm canadastop100.json

Next we need to define the crawling processor with the following:

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(CompanySpider)
process.start()

Executing this will give the following result:

2017-10-06 12:09:45 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot)
2017-10-06 12:09:45 [scrapy.utils.log] INFO: Overridden settings: {'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'}

Preparing data

import pandas as pd

Read the output file from the scraper.

df = pd.read_json('canadastop100.json')
df.head()

	name	title	value
0	Bell Canada	Physical Workplace	A+
1	Bell Canada	Work Atmosphere & Communications	A
2	Bell Canada	Financial Benefits & Compensation	A
3	Bell Canada	Health & Family-Friendly Benefits	B
4	Bell Canada	Vacation & Personal Time-Off	B

Get the unique names in the database.

len(df['name'].unique())

Filter out the companies without a title.

df = df[df['title'].notnull()]

The unique elements in the value column are given by

df['value'].unique()

and result in

array(['A+', 'A', 'B', 'B+', 'B-', 'A-', 'C+'], dtype=object)

Lets map these values to a number to make it easier to work them in the dataset. Define the mapping:

mapping = {'A+': 10,
           'A': 9,
           'A-': 8,
           'B+': 7,
           'B': 6,
           'B-': 5,
           'C+': 4}

and apply the mapping to the value column:

df['value'] = df['value'].map(mapping)

Now we need to transpose the dataframe, since we want a matrix with the companies per row and the different scores as a column.

df = df.pivot(index='name', columns='title', values='value')

We add a column to get the total score:

df['Total Score'] = df.sum(axis=1)

The dataframe has the following layout after adding the extra column:

df.head()

title	Community Involvement	Employee Engagement & Performance	Financial Benefits & Compensation	Health & Family-Friendly Benefits	Physical Workplace	Training & Skills Development	Vacation & Personal Time-Off	Work Atmosphere & Communications	Total Score
name
3M Canada Company	10	7	9	9	10	9	6	10	70
Aboriginal Peoples Television Network Inc. / APTN	9	6	7	9	7	9	9	9	65
Accenture Inc.	10	9	7	9	7	7	6	9	64
Agrium Inc.	10	7	7	6	10	10	8	9	67
Air Canada	10	6	9	7	9	10	4	6	61

As a last step we need to attach the dataframe to the body of the notebook by using some JavaScript. We import the proper libraries

from IPython.display import HTML, Javascript, display

and attach the dataframe, after converting, to the window.

Javascript("""§
           window.companyData={};
           """.format(df.reset_index().to_json(orient='records')))

<IPython.core.display.Javascript object>

Write to JSON file on disk if you want. This can be used in turn to move to the server where the VueJS application will be deployed.

df.reset_index().to_json('canadastop100.json', orient='records')

Visualizing data

Next step is to visualize the data using VueJS. VueJS can be included from https://cdnjs.cloudflare.com/ajax/libs/vue/2.4.0/vue. This notebook will make use of the example of the grid-component from the official documentation to create a table representing the crawled data.

Add the requirement to the notebook.

%%javascript
require.config({
    paths: {
        vue: "https://cdnjs.cloudflare.com/ajax/libs/vue/2.4.0/vue"
    }
});

<IPython.core.display.Javascript object>

Define the template for displaying the data in a table using the x-template script type and the VueJS syntax.

%%html
<script type="text/x-template" id="data-template">
  <table class="canada">
    <thead>
      <tr>
        <th v-for="key in columns"
          @click="sortBy(key)"
          :class="{ active: sortKey == key }">
          {{ key | capitalize }}
          <span class="arrow" :class="sortOrders[key] > 0 ? 'asc' : 'dsc'">
          </span>
        </th>
      </tr>
    </thead>
    <tbody>
      <tr v-for="entry in filteredData">
        <td v-for="key in columns">
          {{entry[key]}}
        </td>
      </tr>
    </tbody>
  </table>
</script>

Define the main HTML that contains the template we defined earlier.

%%html
<div id="vue-app">
  <form id="search">
    Search <input name="query" v-model="searchQuery">
  </form>
  <data-grid
    :data="gridData"
    :columns="gridColumns"
    :filter-key="searchQuery">
  </data-grid>
</div>

Initialize the VueJS application using Javascript by extracting the data from the window, attaching the component with the table for the data and creating a new Vue instance.

%%javascript
require(['vue'], function(Vue) {
    console.log(Vue.version);
    var companyData = window.companyData;
    console.log(JSON.stringify(companyData));
    Vue.component('data-grid', {
      template: '#data-template',
      props: {
        data: Array,
        columns: Array,
        filterKey: String
      },
      data: function () {
        var sortOrders = {}
        this.columns.forEach(function (key) {
          sortOrders[key] = 1
        })
        return {
          sortKey: '',
          sortOrders: sortOrders
        }
      },
      computed: {
        filteredData: function () {
          var sortKey = this.sortKey
          var filterKey = this.filterKey && this.filterKey.toLowerCase()
          var order = this.sortOrders[sortKey] || 1
          var data = this.data
          if (filterKey) {
            data = data.filter(function (row) {
              return Object.keys(row).some(function (key) {
                return String(row[key]).toLowerCase().indexOf(filterKey) > -1
              })
            })
          }
          if (sortKey) {
            data = data.slice().sort(function (a, b) {
              a = a[sortKey]
              b = b[sortKey]
              return (a === b ? 0 : a > b ? 1 : -1) * order
            })
          }
          return data
        }
      },
      filters: {
        capitalize: function (str) {
          return str.charAt(0).toUpperCase() + str.slice(1)
        }
      },
      methods: {
        sortBy: function (key) {
          this.sortKey = key
          this.sortOrders[key] = this.sortOrders[key] * -1
        }
      }
    })

    var vueApp = new Vue({
      el: '#vue-app',
      data: {
        searchQuery: '',
        gridColumns: Object.keys(companyData[0]),
        gridData: companyData
      }
    })

});

<IPython.core.display.Javascript object>

Attach a style to make the table more attractive.

%%html
<style>
table.canada {
  border: 2px solid rgb(102, 153, 255);
  border-radius: 3px;
  background-color: #fff;
}

table.canada th {
  background-color: rgb(102, 153, 255);
  color: rgba(255,255,255,0.66);
  cursor: pointer;
  -webkit-user-select: none;
  -moz-user-select: none;
  -ms-user-select: none;
  user-select: none;
}

table.canada td {
  background-color: #f9f9f9;
}

table.canada th, table.canada td {
  min-width: 120px;
  padding: 10px 20px;
}

table.canada th.active {
  color: #fff;
}

table.canada th.active .arrow {
  opacity: 1;
}

.arrow {
  display: inline-block;
  vertical-align: middle;
  width: 0;
  height: 0;
  margin-left: 5px;
  opacity: 0.66;
}

.arrow.asc {
  border-left: 4px solid transparent;
  border-right: 4px solid transparent;
  border-bottom: 4px solid #fff;
}

.arrow.dsc {
  border-left: 4px solid transparent;
  border-right: 4px solid transparent;
  border-top: 4px solid #fff;
}
</style>

The result can also be tested on the jsfiddle that I have created. The source for the page can be found in my Vue repository and is visible on my bl.ocks.org. The notebook can be found on my Github and the final result is shown on this page.

Jupyter notebook VueJS JavaScript frontend