{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Fetching and Cleaning HTML Text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NLP 2018 - HW1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "November 2017 - [NLP17(http://www.cs.bgu.ac.il/~elhadad/nlp18.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will compare two different methods to clean raw HTML text into text.\n", "HTML pages contain many \"non textual\" elements, in the form of HTML tags, jscript code, lots of advertisement and in general repetitive content which we will refer to as \"boilerplate\" content (menus, navigation etc).\n", "\n", "We are intersted in extracting from a random HTML page the non-boilerplate textual content.\n", "\n", "We will compare two libraries that achieve this.\n", "\n", "First, let us get raw HTML from a URL:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import requests\n", "\n", "url = \"http://www.bbc.com/news/technology-26415021\"\n", "html = requests.get(url).text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us inspect the resulting raw HTML string we obtained:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", " \n", " \n", " An\n" ] } ], "source": [ "print(html[:200])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Too many white spaces and empty lines, let us clean it up a bit:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import re\n", "\n", "html = re.sub(\"[\\r\\n]+\", \"\\n\", html)\n", "html = re.sub(\"[\\n]+\", \"\\n\", html)\n", "html = re.sub(\"[\\t, ]+\",\" \", html)\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'<!DOCTYPE html>\\n<html lang=\"en\" id=\"responsive-news\">\\n<head prefix=\"og: http://ogp.me/ns#\">\\n <meta charset=\"utf-8\">\\n <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge chrome=1\">\\n <title>An hour to catch the coding bug - BBC News\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n\\n\\n\\n\\n \\n\\n\\n\\n \\n \\n\\n\\n\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n\\n\\n
\\n \\n
\\n

Accessibility links

\\n \\n Notifications\\n \\n \\n \\n \\n
Search\\n
\\n
\\n \\n \\n \\n \\n \\n
\\n
\\n
\\n
\\n
\\n
\\n
\\n \\n \\n \\n \\n News\\n

BBC News Navigation

\\n \\n Sections\\n \\n
\\n
\\n \\n
\\n \\n
\\n
\\n \\n \\n
\\n
\\n \\n
\\n
\\n
\\n
\\n \\n \\n Technology\\n \\n \\n \\n
\\n
\\n \\n
\\n
\\n
\\n \\n
\\n \\n Technology\\n \\n \\n \\n
\\n
\\n \\n
\\n
\\n
\\n \\n
\\n

An hour to catch the coding bug

\\n
\\n By Mark Ward\\n Technology correspondent BBC News\\n
\\n
\\n
\\n \\n
\\n
\\n
\\n
\\n \\n \\n \"Child\\n \\n \\n \\n \\n \\n
\\n Image caption\\n \\n We did our Hour of Code on a lazy Sunday morning\\n \\n
\\n \\n

Okay 60 minutes and counting. Here we go.

I\\'ve got an hour to convince my kids that programming is for them.

Well sort of.

I\\'m putting my twin 10-year-old boys Toby and Callum through the Hour of Code - a campaign that seeks to ignite an interest in programming - the part we\\'re doing using specially created web-based exercises.

The campaign begun in the US has landed in the UK where it also coincides with government calls for as many children as possible to get coding.

\\n
\\n \\n
\\n

Programming is being pushed because in an ever more technological world it can only be a good thing to give people a peep into what goes on behind the touch screen cash point and website.

The Hour of Code is supposed to be the start of that journey and I like many other parents feel it\\'s one my children should be embarking on. I do feel like a clock somewhere is ticking and unless they get started with this essential skill they\\'ll be left behind.

\"In the future kids are going to be doing programming \" said Callum when I asked him why it was worth learning how to code. \"We need to learn so we can do stuff with the computer otherwise it will be a blank page and never work.\"

Zombie dance

\\n \\n \\n \\n
\\n \\n \\n Image copyright\\n Code.org\\n \\n
\\n \\n
\\n Image caption\\n \\n The coding exercises involve familiar characters such as the Angry Birds\\n \\n
\\n \\n

It\\'s the getting started for both parents and their offspring that the Hour is intended to help with. I\\'m probably like most parents in that I have no significant qualification in computer science. All I can offer my boys is a lifetime of tinkering backed up over the past 18 months with online courses in Python HTML CSS Javascript and the like.

There\\'s good precedents for the value of something like the Hour of Code. I\\'ve quizzed every chief technology officer (CTO) developer or IT worker I\\'ve met in the past couple of weeks about what got them started and every one knew exactly the moment they got it. Their eyes lit up as they talked about typing in code listings from magazines working through every Dos command or designing their own blocky 8-bit game characters.

And finally at the back of my mind are all those statistics about the shortages of skilled IT staff and my hope that if this works then their future career options will be much wider than they would be without this skill.

So no pressure then.

Yesterday morning we gave it a go. I was keen for it to be fun rather than feel like school as there\\'s nothing more likely to turn them off than for it to be sold as good for them. I think we struck the right note of informality - Cal did it in his onesie.

It went OK no better than that. Pretty good. We worked through all the exercises getting angry birds to mash pigs and zombies to chomp sunflowers. The coding is done by dragging blocks representing different commands into a work area and building the blocks into a tower of coherent instructions.

The block-building system is based on the Scratch programming language build by MIT\\'s Mitch Resnick.

The exercises start easy - just getting an angry bird to hop on to a pig. By the end we were using \"If… else\" statements and loops to help a zombie navigate a tricky maze to reach the sunflower.

This went down well with Callum. \"Cool! It\\'s scanning for a path \" he said as the zombie worked its way towards the hapless flower.

Mistakes were made but we learned from them we debated over which way to make the birds and zombies turn and the time went really fast.

Toby was surprised to find that this counted as programming.

\"I thought coding was just a lot of people tapping in letter and numbers until they got it right \" he said.

Next steps

And yet I felt it was a bit too easy. I wanted to make the coding connection to real life more tangible. So as we had about 15 minutes of our Hour of Code left we went further.

This time we used the MIT App Inventor to build a basic program that would run on the tablets they own.

The app inventor uses the same \"drag the block\" method to build a program and following the instructions we had soon created an app that turned anything typed in text into speech.

\\n \\n \\n \\n
\\n \\n \\n Image copyright\\n MIT\\n \\n
\\n \\n
\\n Image caption\\n \\n The App Inventor uses the same block dragging system as the Hour of Code exercises\\n \\n
\\n \\n

We got it working on Cal\\'s tablet and soon they were getting that gadget to call out lots of phrases. Almost inevitably as they are 10 years old a lot of these phrases involved the words \"willy\" and \"bum\".

But they had a lot of fun with it and it brought home to them how straight-forward coding can be. In just over an hour they went from being pretty much novices to creating an Android app - a basic one that trades on the expertise of the people that built the coding tools but it was an accomplishment nonetheless.

Did they catch the coding bug as a result?

Maybe later in the day they were programming each other after one of their regular wrestling matches left Toby lying exhausted on the floor. Suddenly Cal called out commands such as \"roll left\" and Toby started obeying even to the point of crashing into the sofa when too many roll commands were given.

Then Toby had his turn and did the same they even worked out that they had to compensate for the changes in left and right as they rolled.

So I think that hour started something. Both with them and with me. Building that Android app made me realise that it is straight-forward. That my lack of formal qualifications do not matter as much as I thought. And maybe that\\'s the point of the hour. Making people realise that it is not scary and difficult. You just have to find an hour and give it a try. You can even do it in your onesie.

\\n
\\n
\\n
\\n
\\n
\\n

\\n Share this story About sharing\\n

\\n \\n
\\n \\n \\n
\\n \\n

The BBC is not responsible for the content of external Internet sites

\\n
\\n
\\n
\\n
\\n
\\n
\\n \\n
\\n
\\n
\\n
\\n \\n
\\n
\\n
\\n
\\n \\n
\\n
\\n
\\n
\\n \\n
\\n
\\n \\n
\\n
\\n \\n\\n \\n
\\n
\\n \\n
\\n
\\n
\\n
\\n
\\n
\\n \\n
\\n
\\n
\\n
\\n \\n
\\n
\\n
\\n
\\n \\n
\\n
\\n
\\n
\\n \\n
\\n
\\n
\\n
\\n \\n
\\n \\n \\n \\n
\\n
\\n \\n
\\n
\\n

BBC News Services

\\n \\n
\\n
\\n \\n
\\n
\\n \\n \"\"
\\n\\n\\n \\n\\n\\n \\n'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is becoming a mess, let us get line breaks back:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", " \n", " \n", " An hour to catch the coding bug - BBC News\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", " \n", "\n", "\n", "\n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "
\n", " \n", "
\n", "

Accessibility links

\n", " \n", " Notifications\n", " \n", " \n", " \n", " \n", "
Search\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", " \n", "
\n", "
\n", " \n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " Technology\n", " \n", " \n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", " Technology\n", " \n", " \n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", "
\n", "

An hour to catch the coding bug

\n", "
\n", " By Mark Ward\n", " Technology correspondent BBC News\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
    \n", "
  • 3 March 2014
    \n", "
  • \n", "
  • From the section Technology
  • \n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", " \n", " \n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \"Child\n", " \n", " \n", " \n", " \n", " \n", "
\n", " Image caption\n", " \n", " We did our Hour of Code on a lazy Sunday morning\n", " \n", "
\n", " \n", "

Okay 60 minutes and counting. Here we go.

I've got an hour to convince my kids that programming is for them.

Well sort of.

I'm putting my twin 10-year-old boys Toby and Callum through the Hour of Code - a campaign that seeks to ignite an interest in programming - the part we're doing using specially created web-based exercises.

The campaign begun in the US has landed in the UK where it also coincides with government calls for as many children as possible to get coding.

\n", "
\n", " \n", "
\n", "

Programming is being pushed because in an ever more technological world it can only be a good thing to give people a peep into what goes on behind the touch screen cash point and website.

The Hour of Code is supposed to be the start of that journey and I like many other parents feel it's one my children should be embarking on. I do feel like a clock somewhere is ticking and unless they get started with this essential skill they'll be left behind.

\"In the future kids are going to be doing programming \" said Callum when I asked him why it was worth learning how to code. \"We need to learn so we can do stuff with the computer otherwise it will be a blank page and never work.\"

Zombie dance

\n", " \n", " \n", " \n", "
\n", " \n", " \n", " Image copyright\n", " Code.org\n", " \n", "
\n", " \n", "
\n", " Image caption\n", " \n", " The coding exercises involve familiar characters such as the Angry Birds\n", " \n", "
\n", " \n", "

It's the getting started for both parents and their offspring that the Hour is intended to help with. I'm probably like most parents in that I have no significant qualification in computer science. All I can offer my boys is a lifetime of tinkering backed up over the past 18 months with online courses in Python HTML CSS Javascript and the like.

There's good precedents for the value of something like the Hour of Code. I've quizzed every chief technology officer (CTO) developer or IT worker I've met in the past couple of weeks about what got them started and every one knew exactly the moment they got it. Their eyes lit up as they talked about typing in code listings from magazines working through every Dos command or designing their own blocky 8-bit game characters.

And finally at the back of my mind are all those statistics about the shortages of skilled IT staff and my hope that if this works then their future career options will be much wider than they would be without this skill.

So no pressure then.

Yesterday morning we gave it a go. I was keen for it to be fun rather than feel like school as there's nothing more likely to turn them off than for it to be sold as good for them. I think we struck the right note of informality - Cal did it in his onesie.

It went OK no better than that. Pretty good. We worked through all the exercises getting angry birds to mash pigs and zombies to chomp sunflowers. The coding is done by dragging blocks representing different commands into a work area and building the blocks into a tower of coherent instructions.

The block-building system is based on the Scratch programming language build by MIT's Mitch Resnick.

The exercises start easy - just getting an angry bird to hop on to a pig. By the end we were using \"If… else\" statements and loops to help a zombie navigate a tricky maze to reach the sunflower.

This went down well with Callum. \"Cool! It's scanning for a path \" he said as the zombie worked its way towards the hapless flower.

Mistakes were made but we learned from them we debated over which way to make the birds and zombies turn and the time went really fast.

Toby was surprised to find that this counted as programming.

\"I thought coding was just a lot of people tapping in letter and numbers until they got it right \" he said.

Next steps

And yet I felt it was a bit too easy. I wanted to make the coding connection to real life more tangible. So as we had about 15 minutes of our Hour of Code left we went further.

This time we used the MIT App Inventor to build a basic program that would run on the tablets they own.

The app inventor uses the same \"drag the block\" method to build a program and following the instructions we had soon created an app that turned anything typed in text into speech.

\n", " \n", " \n", " \n", "
\n", " \n", " \n", " Image copyright\n", " MIT\n", " \n", "
\n", " \n", "
\n", " Image caption\n", " \n", " The App Inventor uses the same block dragging system as the Hour of Code exercises\n", " \n", "
\n", " \n", "

We got it working on Cal's tablet and soon they were getting that gadget to call out lots of phrases. Almost inevitably as they are 10 years old a lot of these phrases involved the words \"willy\" and \"bum\".

But they had a lot of fun with it and it brought home to them how straight-forward coding can be. In just over an hour they went from being pretty much novices to creating an Android app - a basic one that trades on the expertise of the people that built the coding tools but it was an accomplishment nonetheless.

Did they catch the coding bug as a result?

Maybe later in the day they were programming each other after one of their regular wrestling matches left Toby lying exhausted on the floor. Suddenly Cal called out commands such as \"roll left\" and Toby started obeying even to the point of crashing into the sofa when too many roll commands were given.

Then Toby had his turn and did the same they even worked out that they had to compensate for the changes in left and right as they rolled.

So I think that hour started something. Both with them and with me. Building that Android app made me realise that it is straight-forward. That my lack of formal qualifications do not matter as much as I thought. And maybe that's the point of the hour. Making people realise that it is not scary and difficult. You just have to find an hour and give it a try. You can even do it in your onesie.

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

\n", " Share this story About sharing\n", "

\n", " \n", "
\n", " \n", " \n", "
\n", " \n", "

The BBC is not responsible for the content of external Internet sites

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", "

\n", " \n", " Features\n", "

\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", " \n", " \n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "

BBC News Services

\n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", " \"\"
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n" ] } ], "source": [ "html = html.split(\"\\n\") # html is now a list of lines\n", "html = \"\\n\".join(html) # we turn it back into a single string\n", "print(html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We give up -- to much noise in this page! How can we get just the text out of this?\n", "\n", "Let us use existing libraries.\n", "\n", "The first we try is called BeautifulSoup. It is a library to parse \"noisy\" HTML in general. \n", "Once parsed, the HTML string can be navigated in a convenient manner.\n", "Make sure you install beautifulsoup4 by running:\n", "\n", "% pip install beautifulsoup4\n", "\n", "We can then run this:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from bs4 import BeautifulSoup\n", "\n", "def clean_html1(html):\n", " soup = BeautifulSoup(html)\n", " return soup.get_text()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us try this version of clean_html:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n", "An hour to catch the coding bug - BBC News\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " {\"@context\":\"http:\\/\\/schema.org\" \"@type\":\"ReportageNewsArticle\" \"url\":\"http:\\/\\/www.bbc.com\\/news\\/technology-26415021\" \"publisher\":{\"@type\":\"NewsMediaOrganization\" \"name\":\"BBC News\" \"logo\":{\"@type\":\"ImageObject\" \"url\":\"http:\\/\\/www.bbc.co.uk\\/news\\/special\\/2015\\/newsspec_10857\\/bbc_news_logo.png?cb=1\"}} \"datePublished\":\"2014-03-03T10:22:55+00:00\" \"dateModified\":\"2014-03-03T10:22:55+00:00\" \"headline\":\"How to get your children coding\" \"image\":{\"@type\":\"ImageObject\" \"width\":720 \"height\":450 \"url\":\"https:\\/\\/ichef-1.bbci.co.uk\\/news\\/720\\/media\\/images\\/73325000\\/jpg\\/_73325163_olly009.jpg\"} \"thumbnailUrl\":\"https:\\/\\/ichef.bbci.co.uk\\/news\\/208\\/media\\/images\\/73325000\\/jpg\\/_73325163_olly009.jpg\" \"author\":{\"@type\":\"Person\" \"name\":\"Mark Ward\"} \"mainEntityOfPage\":\"http:\\/\\/www.bbc.com\\/news\\/technology-26415021\"}\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "var _sf_startpt=(new Date()).getTime()\n", "\n", " (function() {\n", " if (navigator.userAgent.match(/IEMobile\\/10\\.0/)) {\n", " var msViewportStyle = document.createElement(\"style\");\n", " msViewportStyle.appendChild(\n", " document.createTextNode(\"@-ms-viewport{width:auto!important}\")\n", " );\n", " document.getElementsByTagName(\"head\")[0].appendChild(msViewportStyle);\n", " }\n", " })();\n", " \n", "window.fig = window.fig || {}; window.fig.async = true;\n", "\n", " window.bbcredirection={geo:true}\n", "\n", "\n", "\n", "\n", " bbcRequireMap = {\"jquery-1\":\"http://static.bbci.co.uk/frameworks/jquery/0.4.1/sharedmodules/jquery-1.7.2\" \"jquery-1.4\":\"http://static.bbci.co.uk/frameworks/jquery/0.4.1/sharedmodules/jquery-1.4\" \"jquery-1.9\":\"http://static.bbci.co.uk/frameworks/jquery/0.4.1/sharedmodules/jquery-1.9.1\" \"jquery-1.12\":\"http://static.bbci.co.uk/frameworks/jquery/0.4.1/sharedmodules/jquery-1.12.0.min\" \"jquery-2.2\":\"http://static.bbci.co.uk/frameworks/jquery/0.4.1/sharedmodules/jquery-2.2.0.min\" \"istats-1\":\"//nav.files.bbci.co.uk/nav-analytics/0.1.0-43/js/istats-1\" \"swfobject-2\":\"http://static.bbci.co.uk/frameworks/swfobject/0.1.10/sharedmodules/swfobject-2\" \"demi-1\":\"http://static.bbci.co.uk/frameworks/demi/0.10.1/sharedmodules/demi-1\" \"gelui-1\":\"http://static.bbci.co.uk/frameworks/gelui/0.9.13/sharedmodules/gelui-1\" \"cssp!gelui-1/overlay\":\"http://static.bbci.co.uk/frameworks/gelui/0.9.13/sharedmodules/gelui-1/overlay.css\" \"relay-1\":\"http://static.bbci.co.uk/frameworks/relay/0.2.6/sharedmodules/relay-1\" \"clock-1\":\"http://static.bbci.co.uk/frameworks/clock/0.1.9/sharedmodules/clock-1\" \"canvas-clock-1\":\"http://static.bbci.co.uk/frameworks/clock/0.1.9/sharedmodules/canvas-clock-1\" \"cssp!clock-1\":\"http://static.bbci.co.uk/frameworks/clock/0.1.9/sharedmodules/clock-1.css\" \"jssignals-1\":\"http://static.bbci.co.uk/frameworks/jssignals/0.3.6/modules/jssignals-1\" \"jcarousel-1\":\"http://static.bbci.co.uk/frameworks/jcarousel/0.1.10/modules/jcarousel-1\" \"bump-3\":\"//emp.bbci.co.uk/emp/bump-3/bump-3\"}; require({ baseUrl: 'http://static.bbci.co.uk/' paths: bbcRequireMap waitSeconds: 30 }); /*-1){e(t);return true}return false} get:function(){return document.cookie} getCrumb:function(t){if(!t){return null}return decodeURIComponent(document.cookie.replace(new RegExp(\"(?:(?:^|.*;)\\\\s*\"+encodeURIComponent(t).replace(/[\\-\\.\\+\\*]/g \"\\\\$&\")+\"\\\\s*\\\\=\\\\s*([^;]*).*$)|^.*$\") \"$1\"))||null} policyRequiresRefresh:function(){var u=new Date();u.setHours(0);u.setMinutes(0);u.setSeconds(0);u.setMilliseconds(0);if(bbccookies.POLICY_REFRESH_DATE_MILLIS<=u.getTime()){var t=bbccookies.getCrumb(bbccookies.POLICY_EXPIRY_COOKIENAME);if(t){t=new Date(parseInt(t));t.setYear(t.getFullYear()-1);return bbccookies.POLICY_REFRESH_DATE_MILLIS>=t.getTime()}else{return true}}else{return false}} _setPolicy:function(t){return f.apply(this arguments)} readPolicy:function(){return l.apply(this arguments)} _deletePolicy:function(){s(m \"\" q)} _isConfirmed:function(){return n()!==null} _acceptsAll:function(){var t=l();return t&&!(j(t).indexOf(\"0\")>-1)} _getCookieName:function(){return b.apply(this arguments)} _showPrompt:function(){var t=((!this._isConfirmed()||this.policyRequiresRefresh())&&window.cta_enabled&&this.cookiesEnabled()&&!window.bbccookies_disable);return(window.orb&&window.orb.fig)?t&&(window.orb.fig(\"no\")||window.orb.fig(\"ck\")):t} _getPolicy:this.readPolicy};function b(u){var t=(\"\"+u).match(/^([^=]+)(?==)/);return(t&&t.length?t[0]:\"\")}function j(t){return\"\"+(t.ads?1:0)+(t.personalisation?1:0)+(t.performance?1:0)}function f(x){if(typeof x===\"undefined\"){x=i}if(typeof arguments[0]===\"string\"){var u=arguments[0] w=arguments[1];if(u===\"necessary\"){w=true}x=l();x[u]=w}else{if(typeof arguments[0]===\"object\"){x.necessary=true}}var v=new Date();v.setYear(v.getFullYear()+1);bbccookies.set(m+\"=\"+j(x)+\";domain=bbc.co.uk;path=/;expires=\"+v.toUTCString()+\";\");bbccookies.set(m+\"=\"+j(x)+\";domain=bbc.com;path=/;expires=\"+v.toUTCString()+\";\");bbccookies.set(m+\"=\"+j(x)+\";domain=bbci.co.uk;path=/;expires=\"+v.toUTCString()+\";\");var t=new Date(v.getTime());t.setMonth(t.getMonth()+1);bbccookies.set(bbccookies.POLICY_EXPIRY_COOKIENAME+\"=\"+v.getTime()+\";domain=bbc.co.uk;path=/;expires=\"+t.toUTCString()+\";\");bbccookies.set(bbccookies.POLICY_EXPIRY_COOKIENAME+\"=\"+v.getTime()+\";domain=bbc.com;path=/;expires=\"+t.toUTCString()+\";\");bbccookies.set(bbccookies.POLICY_EXPIRY_COOKIENAME+\"=\"+v.getTime()+\";domain=bbci.co.uk;path=/;expires=\"+t.toUTCString()+\";\");return x}function o(t){if(t===null){return null}var u=t.split(\"\");return{ads:!!+u[0] personalisation:!!+u[1] performance:!!+u[2] necessary:true}}function n(){var t=new RegExp(\"(?:^|; ?)\"+m+\"=(\\\\d\\\\d\\\\d)($|;)\") u=document.cookie.match(t);if(!u){return null}return u[1]}function l(t){var u=o(n());if(!u){u=i}if(t){return u[t]}else{return u}}function e(t){return document.cookie=t+\"=;expires=\"+q+\";\"}var g=!(window.bbccookies_flag===\"ON\"&&!bbccookies._acceptsAll()&&!window.bbccookies_disable);var k={} d={\"personalisation\":\"ckps_.+|X-AB-iplayer-.+|ACTVTYMKR|BBC_EXAMPLE_COOKIE|BBCIplayer|BBCiPlayerM|BBCIplayerSession|BBCMediaselector|BBCPostcoder|bbctravel|CGISESSID|ed|food-view|forceDesktop|h4|IMRID|locserv|MyLang|myloc|NTABS|ttduserPrefs|V5|WEATHER|BBCScienceDiscoveryPlaylist_.+|bitratePref|correctAnswerCount|genreCookie|highestQuestionScore|incorrectAnswerCount|longestStreak|MSCSProfile|programmes-oap-expanded|quickestAnswer|score|servicePanel|slowestAnswer|totalTimeForAllFormatted|v|BBCwords|score|correctAnswerCount|highestQuestionScore|hploc|BGUID|BBCWEACITY|mstouch|myway|BBCNewsCustomisation|cbbc_anim|cbeebies_snd|bbcsr_usersx|cbeebies_rd|BBC-Latest_Blogs|zh-enc|pref_loc|m|bbcEmp.+|recs-.+|_lvd2|_lvs2|tick|_fcap_CAM1|_rcc2\" \"performance\":\"ckpf_.+|optimizely.*|BBCLiveStatsClick|id|_em_.+|cookies_enabled|mbox|mbox-admin|mc_.+|omniture_unique|s_.+|sc_.+|adpolicyAdDisplayFrequency|s1|ns_session|ns_cookietest|ns_ux|NO-SA|tr_pr1|gvsurvey|bbcsurvey|si_v|sa_labels|obuid|mm_.+|mmid|mmcore.+|mmpa.+\" \"ads\":\"ckad_.+|rsi_segs|c\" \"necessary\":\"ckns_.+|BBC-UID|blq\\\\.dPref|SSO2-UID|BBC-H2-User|rmRpDetectReal|bbcComSurvey|IDENTITY_ENV|IDENTITY|IDENTITY-HTTPS|IDENTITY_SESSION|BBCCOMMENTSMODULESESSID|bbcBump.+|IVOTE_VOTE_HISTORY|pulse|BBCPG|BBCPGstat|ecos\\\\.dt\"};function r(){var x=document.cookie.replace(/; +/g \";\").split(\";\") u v=[];for(var w=0 t=x.length;w*/ define('orb/cookies' function() { return window.bbccookies; }); /*<'+\"/script>\")}else{j.write('