Exploratory Data Analysis I: The McClay Library

McClay Library (source: RPP Architects)

McClay Library (source: RPP Architects)

The McClay Library at Queen's University is an award winning library. It contains more than a million volumes and more than 2,000 reader spaces. As the exam period approaches the number of occupants gradually rise. At its peak during exams, finding a PC or a seat can be quite difficult. The library provides occupancy data, showing the different floors, the number of available PCs on each, and the total number of occupants in the library. This service is updated once every 10 seconds and can be a useful guide when finding a place to study on a very busy day. I suppose that the information on the website is sourced from the library's controlled entry system.

Having been a student at Queen's quite some time now, I am well aware of how the library can become saturated with rather disconcerted-looking students when occupancy peaks. I was curious to know how useful the occupancy information could be at such times. So I began exploring the website and later found a way to collect the data on it. 

Over the course of the summer exams period, I aim to gather as many datapoints as possible and play with the dataset. I hope to find something which can be benefial to every Queen's student, in or out of exam season. In this post I shall discuss what I have done so far with the ever-growing dataset, and how I'm using the Wolfram Language and other technologies such as Wolfram Data Drop and Wolfram|Alpha to store and analyse it. I hope you find this both interesting and useful.

 

1. Simple display

I will begin by discussing how I accessed the occupancy information and visualised it the way I wanted. Two URLs are of interest here: http://go.qub.ac.uk/mcclay-availablepcs and http://go.qub.ac.uk/mcclay-occupancy. The former shows the number of available PCs on all floors while the later shows the total number of people in the building.

1.1 Create dispay

Combining data from both sources, we can create a simple interface similar to the one on the occupancy site. We'll first extract the data from our two sources and process it, and then represent occupancy and PC usage in the form of a horizontal gauge.

display = Block[{
  url = "https://www.qub.ac.uk/directorates/InformationServices/pcs/impero/", 
  url2 = "https://www.qub.ac.uk/directorates/InformationServices/pcs/sentry",
  rawDataPCs, dataPCs, dataOccupancy, timeAndDate, mcclayLib, 
  currentOccupancy, maxOccupancy, gaugesPCs, gaugesOccupancy},
 
 (* import info. from sites *)
 rawDataPCs = Import[url];
 (* extract useful info. *)
 dataPCs = ImportString[rawDataPCs, "RawJSON"];
 dataOccupancy = Import[url2];
 {currentOccupancy, maxOccupancy} = 
  First /@ ( 
    StringCases[
       dataOccupancy, ___ ~~ "<" <> # <> ">" ~~ occ___ ~~ 
         "</" <> # <> ">" ~~ ___ -> occ] & /@ {"occupancy", 
      "maxoccupancy"});
 Row[{"Occupancy  ", Row[{currentOccupancy, maxOccupancy}, "/"]}];
 
 (* extract time and date *)
 timeAndDate = dataPCs["dateTimeNow"];
 (* associate each floor with its info and sort list *)
 mcclayLib = 
  With[{order = {"Ground Floor Atrium", "Ground Floor Extended Hours",
       "First Floor", "Second Floor", "Third Floor"}},
   {#[[1]], Row[{#[[3]] - #[[2]], #[[3]]}, " / "]} & /@ 
    SortBy[Values[dataPCs["rooms"][[1 ;;, 2 ;; 4]]], 
     Position[order, #[[1]]] &]];
 
 (* template for horizontal gauges *)
 makeGauge[value_] := 
  HorizontalGauge[value, {0, 1}, 
   GaugeMarkers -> Placed[Graphics[Disk[]], "DivisionInterval"],
   GaugeFrameSize -> None, GaugeFrameStyle -> None,
   GaugeStyle -> {Orange, Lighter[Brown, .8]},
   GaugeFaceStyle -> None, 
   ScaleRangeStyle -> {Lighter[Brown, #] & /@ {.9, .5}},
   GaugeLabels -> {None, None},
   TicksStyle -> None,
   ScaleDivisions -> None, ScaleRanges -> None, ScalePadding -> 0
   ];
 
 (* gauges for PC usage on each floor *)
 gaugesPCs = Table[
   makeGauge[
    N[(mcclayLib[[fl, 2, 1, 1]]/mcclayLib[[fl, 2, 1, 2]]), 2]],
   {fl, Range@Length@mcclayLib}];
 
 (* gauge for total number of occupants *)
 gaugesOccupancy = 
  makeGauge[(FromDigits@currentOccupancy/FromDigits@maxOccupancy)];
 
 (* put everything together *)
 Column[{
   Framed[timeAndDate,FrameStyle -> Directive[Black, Thin]],
   Panel@Grid[
     Prepend[
      Reverse[Flatten /@ 
        Transpose[{mcclayLib, gaugesPCs}]], {"Total occupancy", 
       Row[{currentOccupancy, maxOccupancy}, " / "], gaugesOccupancy}],
     Alignment -> {"/", Center}, Dividers -> {None, {False, True}}, 
     Spacings -> {Automatic, 2 -> 2}]
   }, Alignment -> Center]
 ]

Note that the values of all gauges are normalised between 0 and 1. This is what we get:

Occupancy and PC usage on all floors.

Occupancy and PC usage on all floors.

1.2 Deploy display to cloud

We can go ahead and deploy this to the Wolfram Cloud to make it accessible to every student.

CloudDeploy[Delayed[
  ExportForm[display,
   "PNG", ImageFormattingWidth -> 400, ImageResolution -> 250], 
  UpdateInterval -> 20
  ], "McClayLibraryOccupancy", Permissions -> "Public"]

The display then gets deployed to this website which automatically refreshes every 20 seconds:

https://www.wolframcloud.com/objects/user-8fa71eda-c6a8-4555-a394-46fd67a27d48/McClayLibraryOccupancy

and here, also:

http://go.qub.ac.uk/mcclay-usagedisplay.

And of course, you can scan the QR code below!

Click or scan image to visit site.

Click or scan image to visit site.

 

2. Collecting data

2.1 Create databin

To analyse the occupancy and PC usage over a period of time, we'll have to store the data somewhere. I chose to use the Wolfram Data Drop to do this. It allows you to easily create a databin and periodically add data to it using a custom API, email, Arduino, Raspberry Pi, IFTTT and even Twitter!

I used the Wolfram Langauge to create a databin, but you can also do this online:

(* connect to Wolfram Cloud first *)
If[$CloudConnected == False, CloudConnect["username","password"], 
  Print["Connected to Wolfram Cloud."]];

(* create bin *)
libraryBin = 
 CreateDatabin[<|
   "Name" -> "McClay Library",
   "Interpretation" ->
    {
     "datetime" -> "DateTime",
     "totalNum" -> Restricted["StructuredQuantity", "People"],
     "grndFloorATR" -> Restricted["StructuredQuantity", "People"],
     "grndFloorEXT" -> Restricted["StructuredQuantity", "People"],
     "floor1" -> Restricted["StructuredQuantity", "People"],
     "floor2" -> Restricted["StructuredQuantity", "People"],
     "floor3" -> Restricted["StructuredQuantity", "People"],
     "totalUsingPCs" -> Restricted["StructuredQuantity", "People"],
     "totalNotUsingPCs" -> Restricted["StructuredQuantity", "People"]
     },
   "Administrator" -> $WolframID,
   Permissions -> "Public"|>]
Properties of databin.

Properties of databin.

You can also view the databin here.

You can also view the databin here.

The databin is available to the "Public" and it automatically interpretes and associates input values with their appropriate quantites, e.g., number of "People" on each floor. Note that this databin is only temporary and expires a month from now. Data can not be added to it after it expires.

The following table explains what the inputs are:

datetime date and time
totalNum current library occupancy
grndFloorATR Ground Floor Atrium
grndFloorEXT Ground Floor (Extended Hours)
floor1 First Floor
floor2 Second Floor
floor3 Third Floor
totalUsingPCs total no. of students using PCs
totalNotUsingPCs total no. of students NOT using PCs

2.2 Properties of databin and metadata

Properties of databin:

Grid[Options[libraryBin] /. (property___ -> value___) -> {property, value},Frame -> All]

Metadata of databin:

Column@Normal@libraryBin["Information"]
Properties of databin.

Properties of databin.

Metadata of databin.

Metadata of databin.

2.3 Prepare data for upload

Using the Wolfram Language, data is usually added to the bin in the form of an Association. Since we already know what the required inputs for our databin are, we can extract the occupance and immediately process it into our desired association.

getLibraryData[] := Block[
   {
 urlPCs = 
  "https://www.qub.ac.uk//directorates/InformationServices/pcs/impero/",urlOccupancy = 
  "https://www.qub.ac.uk//directorates/InformationServices/pcs/sentry",
 rawDataPCs, PCsUsageData,OccupancyData, occ,
 datetime, f0atr, f0ext, f1, f2, f3, totalUsingPCs, totalNotUsingPCs
 },

   rawDataPCs = Import[urlPCs];
   PCsUsageData = ImportString[rawDataPCs, "RawJSON"];
   datetime = PCsUsageData["dateTimeNow"];
   
   (* occupancy *)
OccupancyData = Import[urlOccupancy];
occ = First@StringCases[OccupancyData, ___ ~~ "<occupancy>" ~~ occ___ ~~"</occupancy>" ~~ ___ :> FromDigits@occ];

   (* PC usage on all floors *)
{f0atr, f0ext, f1, f2, f3} = 
  Abs@With[{order = {"Ground Floor Atrium", 
       "Ground Floor Extended Hours", "First Floor", "Second Floor", 
       "Third Floor"}},
    (#[[3]] - #[[2]]) & /@ 
     SortBy[Values[PCsUsageData["rooms"][[1 ;;, {2, 3, 4}]]], 
      Position[order, #[[1]]] &]];
totalUsingPCs = f0atr + f0ext + f1 + f2 + f3;
totalNotUsingPCs = occ - totalUsingPCs;
   
   (* our desired association *)
<|"datetime" -> datetime, "totalNum" -> occ, "grndFloorATR" -> f0atr, 
 "grndFloorEXT" -> f0ext, "floor1" -> f1, "floor2" -> f2, 
 "floor3" -> f3, "totalUsingPCs" -> totalUsingPCs, 
 "totalNotUsingPCs" -> totalNotUsingPCs|>
  ];

This version handles failure which arises from lack of internet connection, etc.

getLibraryData2[] := Block[
   {
    urlPCs = "https://www.qub.ac.uk//directorates/InformationServices/pcs/impero/", 
    urlOccupancy = "https://www.qub.ac.uk//directorates/InformationServices/pcs/sentry",
    rawDataPCs, PCsUsageData, PCsUsageDataAssoc,
    OccupancyData, occ,
    datetime, f0atr, f0ext, f1, f2, f3, totalUsingPCs, totalNotUsingPCs
    },
   
   (* import all data *)
   rawDataPCs = Check[Import[urlPCs], $Failed];
   OccupancyData = Check[Import[urlOccupancy], $Failed];
   
   (* import failure criteria *)
   ImportFailedQ[dataToImport_] := FailureQ[dataToImport];
   
   (* proceed (if data is imported successfully; all are free from failure) *)
   If[FreeQ[{ImportFailedQ@rawDataPCs, ImportFailedQ@OccupancyData}, 
     True],
    (
     (* extract PC usage, timestamp & occupancy data *)
     
     PCsUsageData = ImportString[rawDataPCs, "RawJSON"];
     datetime = PCsUsageData["dateTimeNow"];
     occ = 
      First@ StringCases[
        OccupancyData, ___ ~~ "<occupancy>" ~~ occupancy___ ~~ 
          "</occupancy>" ~~ ___ :> 
         If[occupancy == "NA", Missing["NotAvailable"], 
          FromDigits@occupancy]];
     
     (* processing *)
     PCsUsageDataAssoc = 
      Cases[PCsUsageData[["rooms"]][[;; , 2 ;; 4]],
        KeyValuePattern[{"roomDescription" -> x__, 
           "freeCount" -> y_ /; y >= 0, 
           "totalCount" -> z_ /; z >= 0}] :> (x -> (z - y))] // 
       Association;
     
     {f0atr, f0ext, f1, f2, f3} = 
      With[{order = {"Ground Floor Atrium", 
          "Ground Floor Extended Hours", "First Floor", 
          "Second Floor", "Third Floor"}},
       Lookup[PCsUsageDataAssoc, #, Missing["NotAvailable"]] & /@ 
        order];
     
     totalUsingPCs = With[{floors = {f0atr, f0ext, f1, f2, f3}},
         If[FreeQ[floors, #], Total@floors, #]
         ] &@Missing["NotAvailable"];
     totalNotUsingPCs = 
      If[totalUsingPCs === #, #, occ - totalUsingPCs] &@
       Missing["NotAvailable"];
     
     (* our desired association *)
     <|"datetime" -> datetime, 
      "totalNum" -> occ, "grndFloorATR" -> f0atr, 
      "grndFloorEXT" -> f0ext, "floor1" -> f1, "floor2" -> f2, 
      "floor3" -> f3, "totalUsingPCs" -> totalUsingPCs, 
      "totalNotUsingPCs" -> totalNotUsingPCs|>
     )
    (* else do nothing - upload skipped *)
    ]
   ];

Running this gets the data and creates an association we can directly send to our databin. For example, evaluating getLibraryData[] gives:

<|"datetime" -> "Saturday, 23rd April 2016 14:40:05", 
 "totalNum" -> 630, "grndFloorATR" -> 88, "grndFloorEXT" -> 50, 
 "floor1" -> 24, "floor2" -> 25, "floor3" -> 33, "totalUsingPCs" -> 220, "totalNotUsingPCs" -> 410|>

 

3. Uploading data

The are various ways to upload extracted data to our bin. We'll so this using the Wolfram Language, since our data will fisrt need to be extracted from a website.

3.1 Using a scheduled task

Scheduled tasks are great for running code in the background. QUB updates the data on the occupancy website once every 10 seconds. However, a basic Wolfram Data Drop only allows a maximum of 60 data entries per hours. So we'll upload our data once every minute to add as much as possible to the databin each hour. Note that the short ID for our databin is cbeUoury.

libraryBin = Databin["cbeUoury"]
RunScheduledTask[(
  assoc = getLibraryData[];
  DatabinAdd[libraryBin, assoc]), 60]
We've set up a task to upload info to our databin once every 60 seconds.

We've set up a task to upload info to our databin once every 60 seconds.

As long as we allow this to run without interrupting the task or closing Mathematica, it'll run forever. This is fantastic, given that every Raspberry Pi comes with Mathematica pre-installed! We can also add data to the bin using a custom API, Arduino and other different methods.

After a few minutes (and entries) our data can be viewed on the databin site. You can deploy your scheduled task to the Wolfram Cloud and it'll keep running until you choose to stop it. Although a paid subscription is required to use this service. This will be ideal for gathering data over a lengthy period of time, which is what I aim to do . Unfortunately, I do not have a paid subscription at the moment but will look into getting one. For now, data uploads will be made from Mathematica on my PC or using a Raspberry Pi, or an Intel Edison.

After a few minutes (and uploads) we can view our data.

After a few minutes (and uploads) we can view our data.

3.2 Cloud-based scheduled task

If we're deploying our scheduled task to run autonomously in the cloud, we'd use the following code:

dataUploadCloudObject = Block[{shortid = "cbeUoury", databin, assoc},
  databin = Databin[shortid];
  CloudDeploy[ScheduledTask[
  (assoc = getLibraryData[]; DatabinAdd[databin, assoc]), 
   600],"McClayLibraryDataUpload", Permissions -> "Private"
   ]]

 

4. Data Analysis

I ran the program pretty much throughout the exam period with short intervals of inactivity. The entries amassed into a dataset large enough to spot trends and run some analyses.

4.1 Dataset

Let's see an overview of the data we have collected so far:

(* Data collected so far *)
libraryBin = Databin["cK8HWZj4"];
dataEntriesThusFar := libraryBin["Entries"];

(* Delete Null, Missing and Negative cases... Sometimes we get negative cases when the number of available PCs isn't updated correctly. *)
dataEntriesThusFar =  DeleteCases[dataEntriesThusFar, Null | getLibraryData2[], {1}];
dataEntriesThusFar = DeleteCases[dataEntriesThusFar, _Missing | _?Negative | getLibraryData2[], {2}]
List of all entries, thus far.

List of all entries, thus far.

View the dataset.
dataset = Dataset[dataEntriesThusFar]

Dataset

Dataset

Now we have a continuously growing dataset — we can run some analysis and see if we find anything meaningful/useful out of it.

4.2 Basic Stats

Find the minimum, maximum, standard deviation, etc of each library location:

Module[{funcs = {"Location", Mean, StandardDeviation, Median, Max, 
    Min}},
 Prepend[Flatten@{#, 
       Table[dataset[
         IntegerPart@func[Cases[#, _?QuantityQ]] &, #1], {func, 
         Rest@funcs}]} & /@ {"totalNum", "totalUsingPCs","totalNotUsingPCs", "grndFloorATR", "grndFloorEXT", "floor1", "floor2", "floor3"},
   ToString /@ funcs] // Dataset
 ]

Visualise the stats using box charts.

makeBoxChart[labels_] := Block[{data},
  data = Table[DeleteMissing@Normal@dataset[[All, i]], {i, labels}];
  BoxWhiskerChart[data,
   {{"Outliers", Large}, {"Whiskers", Thin}, {"Fences", Thin}},
   AspectRatio -> 1/2, ChartLabels -> labels, ChartStyle -> "StarryNightColors", BarOrigin -> Left, 
   BarSpacing -> .5, PerformanceGoal -> "Quality", ImageSize -> 400]];

   makeBoxChart[#] & /@ {{"totalNum", "totalUsingPCs", 
    "totalNotUsingPCs"}, {"grndFloorATR", "grndFloorEXT", "floor1", "floor2", "floor3"}} // gridOfTwoItems
Basic stats.

Basic stats.

Box charts.

Box charts.

Visualise mean number of people using PCSs on each floor.

Table[(dataset[
    BarChart[Floor@Mean[QuantityMagnitude[DeleteMissing[#, 1, Infinity]]], 
      BarOrigin -> Left, ChartStyle -> "StarryNightColors", ChartLegends -> Automatic, PlotLabel -> "Mean number of people\n", PlotTheme -> "Detailed"] &, i]),
  {i, {{"grndFloorATR", "grndFloorEXT", "floor1", "floor2", 
     "floor3"}, {"totalUsingPCs", "totalNotUsingPCs"}}}] // gridOfTwoItems
Mean number of PCs in use and free on all levels.

Mean number of PCs in use and free on all levels.

To find when the library had the highest number of people in it:

Query[MaximalBy[#totalUsingPCs &]] @ dataset

On 17 May, at about 11.20 am was there was the highest number of occupants over the exam period.

On 17 May, at about 11.20 am was there was the highest number of occupants over the exam period.

4.3 Time series analysis

Extract timeseries of library floors and remove the last timeseries in the list. This timeseries holds failed entries which were uploaded to the databin before I made getLibraryData2.

tseries = Delete[libraryBin["TimeSeries"], -1];

makeDatePlot[series_List] := 
  Block[{colourScheme = Take[ColorData[89, "ColorList"], Length[series]]},
   DateListPlot[(QuantityMagnitude[tseries[#1]] &) /@ series,
    AspectRatio -> 1/\[Pi], PlotStyle -> ({#1, 
         Directive[AbsoluteThickness[0.75`], Opacity[1]]} &) /@ colourScheme, 
    PlotLabels -> MapThread[Style, {(tseries[#1]["LastValue"] &) /@ series, colourScheme}], Filling -> Bottom, 
    PlotLegends -> Placed[SwatchLegend[colourScheme, series, LegendLayout -> "Row"], Bottom],
    FrameLabel -> Automatic, Frame -> True,
    FrameTicks -> {{Automatic, None}, {All, Automatic}},
    GridLines -> Automatic, ImageSize -> 800,
    PlotRangePadding -> {0, {0, Automatic}},
    TargetUnits -> {"DimensionlessUnit", "People"}]];

{makeDatePlot[{"floor3", "floor2", "floor1", "grndFloorEXT", "grndFloorATR"}], 
  makeDatePlot[{"totalNum", "totalUsingPCs", "totalNotUsingPCs"}]} // Column
Library occupancy and PC usage trends. (Note that I stopped running the program for a while after exams, hence the flat gap. I ran it again to after some days to carry out some tests.)

Library occupancy and PC usage trends. (Note that I stopped running the program for a while after exams, hence the flat gap. I ran it again to after some days to carry out some tests.)

Let's see the timespan of entries, and the latest entry to the databin with its metadata.

Column@Normal@libraryBin["Latest"]
Grid@Transpose[{{"First Entry", "Last Entry"}, libraryBin["TimeInterval"]}]

4.4 Location

See where entries were uploaded from.

(* list of geo locations for all entries in the databin *)
geoLocations = libraryBin["GeoLocations"];
geoLocTallyAssoc = Tally[geoLocations] /. {g_GeoPosition, c_} :> {g -> c} // Flatten //Association;

(* plot geo locations on a map *)
pinToMap[highText_] := Column[{highText,
    Graphics[
     GraphicsGroup[{FaceForm[{Orange, Opacity[.8]}], 
       FilledCurve[{{Line[
           Join[{{0, 0}}, ({Cos[#1], 3 + Sin[#1]} &) /@ Range[-((2 \[Pi])/20), \[Pi] + (2 \[Pi])/20, \[Pi]/20], {{0, 0}}]]}, 
              {Line[(0.5 {Cos[#1], 6 + Sin[#1]} &) /@
             Range[0, 2 \[Pi], \[Pi]/20]]}}]}], ImageSize -> 15]}, Alignment -> Center];

findNearestCity[geoCoords_] :=
  Row[{First[GeoNearest[Entity["City"], geoCoords]], " (", 
    geoLocTallyAssoc@geoCoords, 
    If[geoLocTallyAssoc@geoCoords == 1, " entry", " entries"], ")"}];

GeoGraphics[(GeoMarker[#[[1, 1]], pinToMap[#], "Alignment" -> Bottom, 
         "Scale" -> Scaled[1]] &[findNearestCity@#1]) & /@ (First /@ 
      Tally[geoLocations]) & /@ (First /@ Tally[geoLocations]), 
 GeoRange -> "World", GeoProjection -> "Equirectangular", 
 PlotRangePadding -> Scaled[.05], ImageSize -> 750]

Most of the entries were from my PC (Belfast) while a few were uploaded by calling a custom API I created (I assume the server(s) that handled these are based in Seattle).

 

5. Distributions

In this section I will discuss my attempt in finding the daily distributions of people using the library and the PCs in it. I followed William Sehorn's methods in this article to find and visualise the distributions.

5.1 Date-value pairs

Shown below are the timestamps and values of entries.

(* convert date-objects to date-strings *)
dateValuePairs = (tseries["totalNum"]["DatePath"]) /. {x_, 
     y_} :> {DateString@x, QuantityMagnitude@y};
RandomSample[dateValuePairs, 10]

(* Reformat date-strings to date-lists *)
dateValuePairs = 
 dateValuePairs /. {date_, value_} :> {DateList@date, value}

(* total number of entries *)
Length@dateValuePairs
We have 4866 date-value pairs in total.

We have 4866 date-value pairs in total.

5.2 Daily occupancy distributions

We’ll group all entries into 5-minute intervals over a 24-hour period:

timeIntervals = Tuples[{Range[0, 23], Range[0, 55, 5]}];
timeIntervals[[;; 12]]

Find number of people within each time interval, over the entire data gathering period:

Clear[addOccupancyToTimeInterval, OccupancyAtTimeInterval];
OccupancyAtTimeInterval[_] = 0;

addOccupancyToTimeInterval[{__, hour_, minute_, _}, occupancyCount_] := OccupancyAtTimeInterval[{hour, minute}] += occupancyCount;

addOccupancyToTimeInterval @@@ dateValuePairs;

For example, we can find the total number of people for 5-past-noon, over the entire period.

OccupancyAtTimeInterval[{12, 5}]
We can see that the total number of people between noon and 5-past-noon is 1575.

We can see that the total number of people between noon and 5-past-noon is 1575.

We can now plot the number of people over 5-minute intervals for the entire day.

OccupancyDistributionData = OccupancyAtTimeInterval /@ timeIntervals;
OccupancyDistributionData = 
  RotateLeft[OccupancyDistributionData, 12*6];

(* averaged daily distribution of occupancy *)
BarChart[OccupancyDistributionData]
Barchart plot of intervals over 24 hours.

Barchart plot of intervals over 24 hours.

Here's a nicer plot with labels:

barChartStyleOptions = With[{yticks = 
     Map[{#*12, 
        DateString[3600*(# + 6), {"Hour12Short", " ", "AMPM"}]} &, 
      Range[0, 24, 2]]},
   {
    AspectRatio -> 1/5, Frame -> True, FrameTicks -> {{None, None}, {yticks, None}},
     FrameStyle -> GrayLevel[0.75], FrameTicksStyle -> Black,
    GridLines -> {Range[24, 11*24, 24], None}, GridLinesStyle -> GrayLevel[0.87],
    Ticks -> None,
    BarSpacing -> 0,
    PlotRange -> {{1, 288}, All},
    PlotRangePadding -> {{1, 0}, {1, Scaled[.15]}},
    ImagePadding -> {{12, 12}, {20, 0}}
    ChartElementFunction -> 
     ChartElementDataFunction["SegmentScaleRectangle", 
      "Segments" -> 20, "ColorScheme" -> "DeepSeaColors"],
    PerformanceGoal -> "Speed", ImageSize -> 750,
    }
   ];

BarChart[OccupancyDistributionData, barChartStyleOptions]
Distribution of occupants over a 24-hour period.

Distribution of occupants over a 24-hour period.

So… Is it the same pattern everyday? Apparently not.

(* remove June entries *)
totalNumTimeSeries = 
 TimeSeriesWindow[tseries["totalNum"], {{2016, 5, 16}, {2016, 5, 31}}];
wd = WeightedData[totalNumTimeSeries["Dates"], 
   totalNumTimeSeries["Values"]];
DateHistogram[wd, "Day", DateReduction -> "Week", 
 ColorFunction -> "StarryNightColors"]
Oddly, there’s a deviation from the decreasing-in-number trend on Saturday.

Oddly, there’s a deviation from the decreasing-in-number trend on Saturday.

If we reduce the histogram down to the span of a day, we get a clearer picture of the flow of people, in and out of the library:

DateHistogram[wd, “Hour”, DateReduction -> “Day”, ColorFunction -> “StarryNightColors”]

We see a smoother increase and fall here, as compared to the previous plot because the data is binned hourly here, and by minute in the previous one.

We see a smoother increase and fall here, as compared to the previous plot because the data is binned hourly here, and by minute in the previous one.

5.3 Array plot of occupants

arrayData = dateValuePairs[[All, 2]];
arrayData = Partition[arrayData, 12*24];
arrayData = Transpose[Reverse /@ arrayData];

arrayPlotStyleOptions = Module[{startDate, endDate, xticks, yticks},
   startDate = dateValuePairs[[1, 1, ;; 3]];
   endDate = dateValuePairs[[-1, 1, ;; 3]];
   yticks = {#*12, 
       DateString[(24 - #)*3600, {"Hour12Short", " ", "AMPM"}]} & /@ 
     Range[0, 24, 4];
   xticks = 
    With[{month = Take[DatePlus[startDate, {#, "Month"}], 2]},
       {QuantityMagnitude@DateDifference[startDate, month], 
        DateString[month, "MonthName"]}
       ] & /@ 
     Range[0, First@DateDifference[startDate, endDate, "Month"] + 1];
   {
    AspectRatio -> 1/4,
    Frame -> True, FrameTicks -> {{yticks, None}, {xticks, None}},
    FrameStyle -> GrayLevel[.75], FrameTicksStyle -> Black,
    ImageSize -> 450, PlotRangePadding -> {{0, 0}, {0, 1}},
    ColorFunction -> "StarryNightColors"
    }
   ];

ArrayPlot[arrayData, arrayPlotStyleOptions]
Array plot.

Array plot.

5.4 Daily and monthly distributions

(* Auxiliary functions *)
occupancyDataEntryToDay[{{year_, month_, day_, __}, _}] := {year, 
   month, day};
occupancyDataEntryToMonth[{{year_, month_, __}, _}] := {year, month};
getDailyTotals[data_] :=
 Module[{groupedByDay},
  groupedByDay = GatherBy[data, occupancyDataEntryToDay];
  Map[Total@#[[All, 2]] &, groupedByDay]
  ]
  
(* daily distribution *)
dailyOccupancyTotals = getDailyTotals[dateValuePairs];
days = DeleteDuplicates[occupancyDataEntryToDay /@ dateValuePairs];
dailyPlotData = Transpose[{days, dailyOccupancyTotals}];

(* plotting style options common to daily and monthly distribution plots *)
sharedDateListPlotStyleOptions =
  {
   Joined -> True, Filling -> Axis, ImageSize -> 450, Frame -> True,
   FrameStyle -> GrayLevel[.75], FrameTicksStyle -> Black,
   FillingStyle -> Automatic, PlotStyle -> Directive[Thickness[.002], Black],
   PlotRange -> {libraryBin["TimeInterval"], {0, All}},
   GridLines -> {DayRange[First@#, Last@#] &@
      libraryBin["TimeInterval"](*{2016,#,1}&/@Range[1,12]*), None},
   GridLinesStyle -> GrayLevel[.8]
   };
   
(* plotting options for daily distribution *)
dailyDateListPlotStyleOptions =
  With[{padding = .1},
   {
    FrameTicks -> {{Rest@
        FindDivisions[{0, (1 + padding)*Max@dailyOccupancyTotals, 1}, 8], None}, 
       {DateString[#, {"DayShort", " ", "MonthNameShort"}] & /@ 
          DateRange[First@#, Last@#, Quantity[3, "Days"]] &@
        libraryBin["TimeInterval"][[;; , 1]], None}},
    PlotRangePadding -> {{0, 0}, {0, Scaled[padding]}}, ImagePadding -> Automatic, AspectRatio -> 1/4, ColorFunction -> "StarryNightColors"
    }
   ];
   
(* daily distribution plot *)
dailyPlot = DateListPlot[dailyPlotData, sharedDateListPlotStyleOptions, dailyDateListPlotStyleOptions]
Daily distribution.

Daily distribution.

For monthly distribution:

groupedByMonth = GatherBy[dateValuePairs, occupancyDataEntryToMonth];
monthlyOccupancyAverages = 
  Map[Mean@getDailyTotals@# &, groupedByMonth];
months = DeleteDuplicates[occupancyDataEntryToMonth /@ dateValuePairs];
monthlyPlotData = Transpose[{months, monthlyOccupancyAverages}];

(* monthly distribution plotting options *)
monthlyDateListPlotStyleOptions =
  With[{padding = .3},
   {
    FrameTicks -> {{Most@
        FindDivisions[{0, (1 + padding)*Max@monthlyOccupancyAverages, 
          1}, 4], None}, {months, None}},
    PlotRangePadding -> {{0, 0}, {0, Scaled[padding]}},
    ImagePadding -> Automatic, AspectRatio -> 1/4, ColorFunction -> "StarryNightColors"
    }
   ];

monthlyPlot = DateListPlot[monthlyPlotData, sharedDateListPlotStyleOptions, monthlyDateListPlotStyleOptions]
Monthly distribution.

Monthly distribution.

Daily and monthly distributions stacked together:
Column[{dailyPlot, monthlyPlot}, Spacings -> 0]

5.5 Distributions for each floor

Lastly, we want to see how the PC usage evolves over a day on each floor. To generate the following plots, I combined pieces of code from previous sections and formed a function which could be used repeatedly.

plotDailyDistribution[timesries_] := Block[
  {dateValuePairs, timeIntervals, OccupancyAtTimeInterval, 
   addOccupancyToTimeInterval, OccupancyDistributionData, 
   barChartStyleOptions},

  (* convert date-objects to date-strings *)

  dateValuePairs = (tseries[timesries]["DatePath"]) /. {x_, 
      y_} :> {DateString@x, y};
  (* Reformat date-strings to date-lists *)

  dateValuePairs[[All, 1]] = DateList /@ dateValuePairs[[All, 1]];

  (* create 5 minute intervals over a 24-hour period *)

  timeIntervals = Tuples[{Range[0, 23], Range[0, 55, 5]}];

  OccupancyAtTimeInterval[_] = 0;
  addOccupancyToTimeInterval[{__, hour_, minute_, _}, stepCount_] :=

     OccupancyAtTimeInterval[{hour, minute}] += stepCount;
  addOccupancyToTimeInterval @@@ dateValuePairs;

  OccupancyDistributionData = OccupancyAtTimeInterval /@ timeIntervals;
  OccupancyDistributionData = 
   RotateLeft[OccupancyDistributionData, 12*6];
  (* averaged daily distribution of occupancy *)

  (* barchart styling options *)
  barChartStyleOptions = With[{yticks = 
      Map[{#*12, 
         DateString[3600*(# + 6), {"Hour12Short", " ", "AMPM"}]} &, 
       Range[0, 24, 2]]},
    {
     AspectRatio -> 1/5, Frame -> True, FrameTicks -> {{None, None}, {yticks, None}}, FrameTicksStyle -> Black, GridLines -> {Range[24, 11*24, 24], None},
     GridLinesStyle -> GrayLevel[0.87], PlotRange -> {{1, 288}, All}, Ticks -> None, BarSpacing -> 0,
     PlotRangePadding -> {{1, 0}, {1, Scaled[.15]}}, ChartElementFunction -> 
      ChartElementDataFunction["SegmentScaleRectangle", 
       "Segments" -> 20, "ColorScheme" -> "StarryNightColors"], PerformanceGoal -> "Quality", ImageSize -> 450, FrameStyle -> GrayLevel[0.75], ImagePadding -> {{12, 12}, {20, 0}}
     }
    ];

  Column[{timesries, BarChart[OccupancyDistributionData, barChartStyleOptions]}, Alignment -> Center]
  ]

(* plot distributions *)
  {
  Column[plotDailyDistribution[#] & /@ {"totalNum", "totalUsingPCs", "totalNotUsingPCs"}, Spacings -> 0],
  Column[plotDailyDistribution[#] & /@ {"grndFloorATR", "grndFloorEXT", "floor1", "floor2", "floor3"}, Spacings -> 0]
  } // gridOfTwoItems
Daily distribution for each floor of the library, and for PC usage.

Daily distribution for each floor of the library, and for PC usage.

Conclusion

This post demonstates (I hope) the enormous power that data holds. This so-called “data” is now seemingly ubiquitous — in fact, theres’s more of it than ever before. Nearly all devices we carry or surround ourselves with produces a copiuos amount of data — some of which, we don’t have direct access to or know what to do with.

Running an analysis such as this over a lengthy period of time, say, years, could reveal a very interesting insight — other than the expected rise and fall of occupants during and outside term times— and indeed useful for Queen’s students. I originally started this out of the frustration of not fining a seat in the library during the summer exams.

These plots only give an idea of the flow of people in and out of the library, but not the full picture. A group of 5 friends may leave the library for lunch — and counter goes minus 5— but perhaps they’ve left their stuff where they were sat. So net seats available is still zero.

Furthermore, when a PC is logged off automatically (due to prolonged inactivity, for instance), the counter goes -1, but the user could still have their stuff there, or indeed, still be sat in front of that very PC!

However, looking at the totalNum plots, it appears one ought to be there before 11am — one ought to follow the trend.

First-come, first-served, I guess.