# Exploratory Data Analysis I: The McClay Library

The McClay Library at Queen's University is an award winning library. It contains more than a million volumes and more than 2,000 reader spaces. As the exam period approaches the number of occupants gradually rise. At its peak during exams, finding a PC or a seat can be quite difficult. The library provides occupancy data, showing the different floors, the number of available PCs on each, and the total number of occupants in the library. This service is updated once every 10 seconds and can be a useful guide when finding a place to study on a very busy day. I suppose that the information on the website is sourced from the library's controlled entry system.

Having been a student at Queen's quite some time now, I am well aware of how the library can become saturated with rather disconcerted-looking students when occupancy peaks. I was curious to know how useful the occupancy information could be at such times. So I began exploring the website and later found a way to collect the data on it.

Over the course of the summer exams period, I aim to gather as many datapoints as possible and play with the dataset. I hope to find something which can be benefial to every Queen's student, in or out of exam season. In this post I shall discuss what I have done so far with the ever-growing dataset, and how I'm using the Wolfram Language and other technologies such as Wolfram Data Drop and Wolfram|Alpha to store and analyse it. I hope you find this both interesting and useful.

# 1. Simple display

I will begin by discussing how I accessed the occupancy information and visualised it the way I wanted. Two URLs are of interest here: `http://go.qub.ac.uk/mcclay-availablepcs`

and `http://go.qub.ac.uk/mcclay-occupancy`

. The former shows the number of available PCs on all floors while the later shows the total number of people in the building.

## 1.1 Create dispay

Combining data from both sources, we can create a simple interface similar to the one on the occupancy site. We'll first extract the data from our two sources and process it, and then represent occupancy and PC usage in the form of a horizontal gauge.

display = Block[{ url = "https://www.qub.ac.uk/directorates/InformationServices/pcs/impero/", url2 = "https://www.qub.ac.uk/directorates/InformationServices/pcs/sentry", rawDataPCs, dataPCs, dataOccupancy, timeAndDate, mcclayLib, currentOccupancy, maxOccupancy, gaugesPCs, gaugesOccupancy}, (* import info. from sites *) rawDataPCs = Import[url]; (* extract useful info. *) dataPCs = ImportString[rawDataPCs, "RawJSON"]; dataOccupancy = Import[url2]; {currentOccupancy, maxOccupancy} = First /@ ( StringCases[ dataOccupancy, ___ ~~ "<" <> # <> ">" ~~ occ___ ~~ "</" <> # <> ">" ~~ ___ -> occ] & /@ {"occupancy", "maxoccupancy"}); Row[{"Occupancy ", Row[{currentOccupancy, maxOccupancy}, "/"]}]; (* extract time and date *) timeAndDate = dataPCs["dateTimeNow"]; (* associate each floor with its info and sort list *) mcclayLib = With[{order = {"Ground Floor Atrium", "Ground Floor Extended Hours", "First Floor", "Second Floor", "Third Floor"}}, {#[[1]], Row[{#[[3]] - #[[2]], #[[3]]}, " / "]} & /@ SortBy[Values[dataPCs["rooms"][[1 ;;, 2 ;; 4]]], Position[order, #[[1]]] &]]; (* template for horizontal gauges *) makeGauge[value_] := HorizontalGauge[value, {0, 1}, GaugeMarkers -> Placed[Graphics[Disk[]], "DivisionInterval"], GaugeFrameSize -> None, GaugeFrameStyle -> None, GaugeStyle -> {Orange, Lighter[Brown, .8]}, GaugeFaceStyle -> None, ScaleRangeStyle -> {Lighter[Brown, #] & /@ {.9, .5}}, GaugeLabels -> {None, None}, TicksStyle -> None, ScaleDivisions -> None, ScaleRanges -> None, ScalePadding -> 0 ]; (* gauges for PC usage on each floor *) gaugesPCs = Table[ makeGauge[ N[(mcclayLib[[fl, 2, 1, 1]]/mcclayLib[[fl, 2, 1, 2]]), 2]], {fl, Range@Length@mcclayLib}]; (* gauge for total number of occupants *) gaugesOccupancy = makeGauge[(FromDigits@currentOccupancy/FromDigits@maxOccupancy)]; (* put everything together *) Column[{ Framed[timeAndDate,FrameStyle -> Directive[Black, Thin]], Panel@Grid[ Prepend[ Reverse[Flatten /@ Transpose[{mcclayLib, gaugesPCs}]], {"Total occupancy", Row[{currentOccupancy, maxOccupancy}, " / "], gaugesOccupancy}], Alignment -> {"/", Center}, Dividers -> {None, {False, True}}, Spacings -> {Automatic, 2 -> 2}] }, Alignment -> Center] ]

Note that the values of all gauges are normalised between `0`

and `1`

. This is what we get:

## 1.2 Deploy display to cloud

We can go ahead and deploy this to the Wolfram Cloud to make it accessible to every student.

CloudDeploy[Delayed[ ExportForm[display, "PNG", ImageFormattingWidth -> 400, ImageResolution -> 250], UpdateInterval -> 20 ], "McClayLibraryOccupancy", Permissions -> "Public"]

The display then gets deployed to this website which automatically refreshes every `20`

seconds:

`https://www.wolframcloud.com/objects/user-8fa71eda-c6a8-4555-a394-46fd67a27d48/McClayLibraryOccupancy`

and here, also:

`http://go.qub.ac.uk/mcclay-usagedisplay`

.

And of course, you can scan the QR code below!

# 2. Collecting data

## 2.1 Create databin

To analyse the occupancy and PC usage over a period of time, we'll have to store the data somewhere. I chose to use the Wolfram Data Drop to do this. It allows you to easily create a databin and periodically add data to it using a custom API, email, Arduino, Raspberry Pi, IFTTT and even Twitter!

I used the Wolfram Langauge to create a databin, but you can also do this online:

(* connect to Wolfram Cloud first *) If[$CloudConnected == False, CloudConnect["username","password"], Print["Connected to Wolfram Cloud."]]; (* create bin *) libraryBin = CreateDatabin[<| "Name" -> "McClay Library", "Interpretation" -> { "datetime" -> "DateTime", "totalNum" -> Restricted["StructuredQuantity", "People"], "grndFloorATR" -> Restricted["StructuredQuantity", "People"], "grndFloorEXT" -> Restricted["StructuredQuantity", "People"], "floor1" -> Restricted["StructuredQuantity", "People"], "floor2" -> Restricted["StructuredQuantity", "People"], "floor3" -> Restricted["StructuredQuantity", "People"], "totalUsingPCs" -> Restricted["StructuredQuantity", "People"], "totalNotUsingPCs" -> Restricted["StructuredQuantity", "People"] }, "Administrator" -> $WolframID, Permissions -> "Public"|>]

The databin is available to the `"Public"`

and it automatically interpretes and associates input values with their appropriate quantites, e.g., number of `"People"`

on each floor. Note that this databin is only temporary and expires a month from now. Data can not be added to it after it expires.

The following table explains what the inputs are:

datetime | date and time |
---|---|

totalNum | current library occupancy |

grndFloorATR | Ground Floor Atrium |

grndFloorEXT | Ground Floor (Extended Hours) |

floor1 | First Floor |

floor2 | Second Floor |

floor3 | Third Floor |

totalUsingPCs | total no. of students using PCs |

totalNotUsingPCs | total no. of students NOT using PCs |

## 2.2 Properties of databin and metadata

Properties of databin:

`Grid[Options[libraryBin] /. (property___ -> value___) -> {property, value},Frame -> All]`

Metadata of databin:

`Column@Normal@libraryBin["Information"]`

## 2.3 Prepare data for upload

Using the Wolfram Language, data is usually added to the bin in the form of an Association. Since we already know what the required inputs for our databin are, we can extract the occupance and immediately process it into our desired association.

getLibraryData[] := Block[ { urlPCs = "https://www.qub.ac.uk//directorates/InformationServices/pcs/impero/",urlOccupancy = "https://www.qub.ac.uk//directorates/InformationServices/pcs/sentry", rawDataPCs, PCsUsageData,OccupancyData, occ, datetime, f0atr, f0ext, f1, f2, f3, totalUsingPCs, totalNotUsingPCs }, rawDataPCs = Import[urlPCs]; PCsUsageData = ImportString[rawDataPCs, "RawJSON"]; datetime = PCsUsageData["dateTimeNow"]; (* occupancy *) OccupancyData = Import[urlOccupancy]; occ = First@StringCases[OccupancyData, ___ ~~ "<occupancy>" ~~ occ___ ~~"</occupancy>" ~~ ___ :> FromDigits@occ]; (* PC usage on all floors *) {f0atr, f0ext, f1, f2, f3} = Abs@With[{order = {"Ground Floor Atrium", "Ground Floor Extended Hours", "First Floor", "Second Floor", "Third Floor"}}, (#[[3]] - #[[2]]) & /@ SortBy[Values[PCsUsageData["rooms"][[1 ;;, {2, 3, 4}]]], Position[order, #[[1]]] &]]; totalUsingPCs = f0atr + f0ext + f1 + f2 + f3; totalNotUsingPCs = occ - totalUsingPCs; (* our desired association *) <|"datetime" -> datetime, "totalNum" -> occ, "grndFloorATR" -> f0atr, "grndFloorEXT" -> f0ext, "floor1" -> f1, "floor2" -> f2, "floor3" -> f3, "totalUsingPCs" -> totalUsingPCs, "totalNotUsingPCs" -> totalNotUsingPCs|> ];

This version handles failure which arises from lack of internet connection, etc.

getLibraryData2[] := Block[ { urlPCs = "https://www.qub.ac.uk//directorates/InformationServices/pcs/impero/", urlOccupancy = "https://www.qub.ac.uk//directorates/InformationServices/pcs/sentry", rawDataPCs, PCsUsageData, PCsUsageDataAssoc, OccupancyData, occ, datetime, f0atr, f0ext, f1, f2, f3, totalUsingPCs, totalNotUsingPCs }, (* import all data *) rawDataPCs = Check[Import[urlPCs], $Failed]; OccupancyData = Check[Import[urlOccupancy], $Failed]; (* import failure criteria *) ImportFailedQ[dataToImport_] := FailureQ[dataToImport]; (* proceed (if data is imported successfully; all are free from failure) *) If[FreeQ[{ImportFailedQ@rawDataPCs, ImportFailedQ@OccupancyData}, True], ( (* extract PC usage, timestamp & occupancy data *) PCsUsageData = ImportString[rawDataPCs, "RawJSON"]; datetime = PCsUsageData["dateTimeNow"]; occ = First@ StringCases[ OccupancyData, ___ ~~ "<occupancy>" ~~ occupancy___ ~~ "</occupancy>" ~~ ___ :> If[occupancy == "NA", Missing["NotAvailable"], FromDigits@occupancy]]; (* processing *) PCsUsageDataAssoc = Cases[PCsUsageData[["rooms"]][[;; , 2 ;; 4]], KeyValuePattern[{"roomDescription" -> x__, "freeCount" -> y_ /; y >= 0, "totalCount" -> z_ /; z >= 0}] :> (x -> (z - y))] // Association; {f0atr, f0ext, f1, f2, f3} = With[{order = {"Ground Floor Atrium", "Ground Floor Extended Hours", "First Floor", "Second Floor", "Third Floor"}}, Lookup[PCsUsageDataAssoc, #, Missing["NotAvailable"]] & /@ order]; totalUsingPCs = With[{floors = {f0atr, f0ext, f1, f2, f3}}, If[FreeQ[floors, #], Total@floors, #] ] &@Missing["NotAvailable"]; totalNotUsingPCs = If[totalUsingPCs === #, #, occ - totalUsingPCs] &@ Missing["NotAvailable"]; (* our desired association *) <|"datetime" -> datetime, "totalNum" -> occ, "grndFloorATR" -> f0atr, "grndFloorEXT" -> f0ext, "floor1" -> f1, "floor2" -> f2, "floor3" -> f3, "totalUsingPCs" -> totalUsingPCs, "totalNotUsingPCs" -> totalNotUsingPCs|> ) (* else do nothing - upload skipped *) ] ];

Running this gets the data and creates an association we can directly send to our databin. For example, evaluating `getLibraryData[]`

gives:

```
<|"datetime" -> "Saturday, 23rd April 2016 14:40:05",
"totalNum" -> 630, "grndFloorATR" -> 88, "grndFloorEXT" -> 50,
"floor1" -> 24, "floor2" -> 25, "floor3" -> 33, "totalUsingPCs" -> 220, "totalNotUsingPCs" -> 410|>
```

# 3. Uploading data

The are various ways to upload extracted data to our bin. We'll so this using the Wolfram Language, since our data will fisrt need to be extracted from a website.

## 3.1 Using a scheduled task

Scheduled tasks are great for running code in the background. QUB updates the data on the occupancy website once every 10 seconds. However, a basic Wolfram Data Drop only allows a maximum of 60 data entries per hours. So we'll upload our data once every minute to add as much as possible to the databin each hour. Note that the short ID for our databin is **cbeUoury**.

libraryBin = Databin["cbeUoury"] RunScheduledTask[( assoc = getLibraryData[]; DatabinAdd[libraryBin, assoc]), 60]

As long as we allow this to run without interrupting the task or closing Mathematica, it'll run forever. This is fantastic, given that every Raspberry Pi comes with Mathematica pre-installed! We can also add data to the bin using a custom API, Arduino and other different methods.

After a few minutes (and entries) our data can be viewed on the databin site. You can deploy your scheduled task to the Wolfram Cloud and it'll keep running until you choose to stop it. Although a paid subscription is required to use this service. This will be ideal for gathering data over a lengthy period of time, which is what I aim to do . Unfortunately, I do not have a paid subscription at the moment but will look into getting one. For now, data uploads will be made from Mathematica on my PC or using a Raspberry Pi, or an Intel Edison.

## 3.2 Cloud-based scheduled task

If we're deploying our scheduled task to run autonomously in the cloud, we'd use the following code:

```
dataUploadCloudObject = Block[{shortid = "cbeUoury", databin, assoc},
databin = Databin[shortid];
CloudDeploy[ScheduledTask[
(assoc = getLibraryData[]; DatabinAdd[databin, assoc]),
600],"McClayLibraryDataUpload", Permissions -> "Private"
]]
```

# 4. Data Analysis

I ran the program pretty much throughout the exam period with short intervals of inactivity. The entries amassed into a dataset large enough to spot trends and run some analyses.

## 4.1 Dataset

Let's see an overview of the data we have collected so far:

```
(* Data collected so far *)
libraryBin = Databin["cK8HWZj4"];
dataEntriesThusFar := libraryBin["Entries"];
(* Delete Null, Missing and Negative cases... Sometimes we get negative cases when the number of available PCs isn't updated correctly. *)
dataEntriesThusFar = DeleteCases[dataEntriesThusFar, Null | getLibraryData2[], {1}];
dataEntriesThusFar = DeleteCases[dataEntriesThusFar, _Missing | _?Negative | getLibraryData2[], {2}]
```

View the dataset.`dataset = Dataset[dataEntriesThusFar]`

Now we have a continuously growing dataset — we can run some analysis and see if we find anything meaningful/useful out of it.

## 4.2 Basic Stats

Find the minimum, maximum, standard deviation, etc of each library location:

```
Module[{funcs = {"Location", Mean, StandardDeviation, Median, Max,
Min}},
Prepend[Flatten@{#,
Table[dataset[
IntegerPart@func[Cases[#, _?QuantityQ]] &, #1], {func,
Rest@funcs}]} & /@ {"totalNum", "totalUsingPCs","totalNotUsingPCs", "grndFloorATR", "grndFloorEXT", "floor1", "floor2", "floor3"},
ToString /@ funcs] // Dataset
]
```

Visualise the stats using box charts.

```
makeBoxChart[labels_] := Block[{data},
data = Table[DeleteMissing@Normal@dataset[[All, i]], {i, labels}];
BoxWhiskerChart[data,
{{"Outliers", Large}, {"Whiskers", Thin}, {"Fences", Thin}},
AspectRatio -> 1/2, ChartLabels -> labels, ChartStyle -> "StarryNightColors", BarOrigin -> Left,
BarSpacing -> .5, PerformanceGoal -> "Quality", ImageSize -> 400]];
makeBoxChart[#] & /@ {{"totalNum", "totalUsingPCs",
"totalNotUsingPCs"}, {"grndFloorATR", "grndFloorEXT", "floor1", "floor2", "floor3"}} // gridOfTwoItems
```

Visualise mean number of people using PCSs on each floor.

```
Table[(dataset[
BarChart[Floor@Mean[QuantityMagnitude[DeleteMissing[#, 1, Infinity]]],
BarOrigin -> Left, ChartStyle -> "StarryNightColors", ChartLegends -> Automatic, PlotLabel -> "Mean number of people\n", PlotTheme -> "Detailed"] &, i]),
{i, {{"grndFloorATR", "grndFloorEXT", "floor1", "floor2",
"floor3"}, {"totalUsingPCs", "totalNotUsingPCs"}}}] // gridOfTwoItems
```

To find when the library had the highest number of people in it:

`Query[MaximalBy[#totalUsingPCs &]] @ dataset`

## 4.3 Time series analysis

Extract timeseries of library floors and remove the last timeseries in the list. This timeseries holds failed entries which were uploaded to the databin before I made `getLibraryData2`

.

```
tseries = Delete[libraryBin["TimeSeries"], -1];
makeDatePlot[series_List] :=
Block[{colourScheme = Take[ColorData[89, "ColorList"], Length[series]]},
DateListPlot[(QuantityMagnitude[tseries[#1]] &) /@ series,
AspectRatio -> 1/\[Pi], PlotStyle -> ({#1,
Directive[AbsoluteThickness[0.75`], Opacity[1]]} &) /@ colourScheme,
PlotLabels -> MapThread[Style, {(tseries[#1]["LastValue"] &) /@ series, colourScheme}], Filling -> Bottom,
PlotLegends -> Placed[SwatchLegend[colourScheme, series, LegendLayout -> "Row"], Bottom],
FrameLabel -> Automatic, Frame -> True,
FrameTicks -> {{Automatic, None}, {All, Automatic}},
GridLines -> Automatic, ImageSize -> 800,
PlotRangePadding -> {0, {0, Automatic}},
TargetUnits -> {"DimensionlessUnit", "People"}]];
{makeDatePlot[{"floor3", "floor2", "floor1", "grndFloorEXT", "grndFloorATR"}],
makeDatePlot[{"totalNum", "totalUsingPCs", "totalNotUsingPCs"}]} // Column
```

Let's see the timespan of entries, and the latest entry to the databin with its metadata.

```
Column@Normal@libraryBin["Latest"]
Grid@Transpose[{{"First Entry", "Last Entry"}, libraryBin["TimeInterval"]}]
```

## 4.4 Location

See where entries were uploaded from.

```
(* list of geo locations for all entries in the databin *)
geoLocations = libraryBin["GeoLocations"];
geoLocTallyAssoc = Tally[geoLocations] /. {g_GeoPosition, c_} :> {g -> c} // Flatten //Association;
(* plot geo locations on a map *)
pinToMap[highText_] := Column[{highText,
Graphics[
GraphicsGroup[{FaceForm[{Orange, Opacity[.8]}],
FilledCurve[{{Line[
Join[{{0, 0}}, ({Cos[#1], 3 + Sin[#1]} &) /@ Range[-((2 \[Pi])/20), \[Pi] + (2 \[Pi])/20, \[Pi]/20], {{0, 0}}]]},
{Line[(0.5 {Cos[#1], 6 + Sin[#1]} &) /@
Range[0, 2 \[Pi], \[Pi]/20]]}}]}], ImageSize -> 15]}, Alignment -> Center];
findNearestCity[geoCoords_] :=
Row[{First[GeoNearest[Entity["City"], geoCoords]], " (",
geoLocTallyAssoc@geoCoords,
If[geoLocTallyAssoc@geoCoords == 1, " entry", " entries"], ")"}];
GeoGraphics[(GeoMarker[#[[1, 1]], pinToMap[#], "Alignment" -> Bottom,
"Scale" -> Scaled[1]] &[findNearestCity@#1]) & /@ (First /@
Tally[geoLocations]) & /@ (First /@ Tally[geoLocations]),
GeoRange -> "World", GeoProjection -> "Equirectangular",
PlotRangePadding -> Scaled[.05], ImageSize -> 750]
```

Most of the entries were from my PC (Belfast) while a few were uploaded by calling a custom API I created (I assume the server(s) that handled these are based in Seattle).

# 5. Distributions

In this section I will discuss my attempt in finding the daily distributions of people using the library and the PCs in it. I followed William Sehorn's methods in this article to find and visualise the distributions.

## 5.1 Date-value pairs

Shown below are the timestamps and values of entries.

```
(* convert date-objects to date-strings *)
dateValuePairs = (tseries["totalNum"]["DatePath"]) /. {x_,
y_} :> {DateString@x, QuantityMagnitude@y};
RandomSample[dateValuePairs, 10]
(* Reformat date-strings to date-lists *)
dateValuePairs =
dateValuePairs /. {date_, value_} :> {DateList@date, value}
(* total number of entries *)
Length@dateValuePairs
```

## 5.2 Daily occupancy distributions

We’ll group all entries into 5-minute intervals over a 24-hour period:

```
timeIntervals = Tuples[{Range[0, 23], Range[0, 55, 5]}];
timeIntervals[[;; 12]]
```

Find number of people within each time interval, over the entire data gathering period:

```
Clear[addOccupancyToTimeInterval, OccupancyAtTimeInterval];
OccupancyAtTimeInterval[_] = 0;
addOccupancyToTimeInterval[{__, hour_, minute_, _}, occupancyCount_] := OccupancyAtTimeInterval[{hour, minute}] += occupancyCount;
addOccupancyToTimeInterval @@@ dateValuePairs;
```

For example, we can find the total number of people for 5-past-noon, over the entire period.

`OccupancyAtTimeInterval[{12, 5}]`

We can now plot the number of people over 5-minute intervals for the entire day.

```
OccupancyDistributionData = OccupancyAtTimeInterval /@ timeIntervals;
OccupancyDistributionData =
RotateLeft[OccupancyDistributionData, 12*6];
(* averaged daily distribution of occupancy *)
BarChart[OccupancyDistributionData]
```

Here's a nicer plot with labels:

```
barChartStyleOptions = With[{yticks =
Map[{#*12,
DateString[3600*(# + 6), {"Hour12Short", " ", "AMPM"}]} &,
Range[0, 24, 2]]},
{
AspectRatio -> 1/5, Frame -> True, FrameTicks -> {{None, None}, {yticks, None}},
FrameStyle -> GrayLevel[0.75], FrameTicksStyle -> Black,
GridLines -> {Range[24, 11*24, 24], None}, GridLinesStyle -> GrayLevel[0.87],
Ticks -> None,
BarSpacing -> 0,
PlotRange -> {{1, 288}, All},
PlotRangePadding -> {{1, 0}, {1, Scaled[.15]}},
ImagePadding -> {{12, 12}, {20, 0}}
ChartElementFunction ->
ChartElementDataFunction["SegmentScaleRectangle",
"Segments" -> 20, "ColorScheme" -> "DeepSeaColors"],
PerformanceGoal -> "Speed", ImageSize -> 750,
}
];
BarChart[OccupancyDistributionData, barChartStyleOptions]
```

So… Is it the same pattern everyday? Apparently not.

```
(* remove June entries *)
totalNumTimeSeries =
TimeSeriesWindow[tseries["totalNum"], {{2016, 5, 16}, {2016, 5, 31}}];
wd = WeightedData[totalNumTimeSeries["Dates"],
totalNumTimeSeries["Values"]];
DateHistogram[wd, "Day", DateReduction -> "Week",
ColorFunction -> "StarryNightColors"]
```

If we reduce the histogram down to the span of a day, we get a clearer picture of the flow of people, in and out of the library:

`DateHistogram[wd, “Hour”, DateReduction -> “Day”, ColorFunction -> “StarryNightColors”]`

## 5.3 Array plot of occupants

```
arrayData = dateValuePairs[[All, 2]];
arrayData = Partition[arrayData, 12*24];
arrayData = Transpose[Reverse /@ arrayData];
arrayPlotStyleOptions = Module[{startDate, endDate, xticks, yticks},
startDate = dateValuePairs[[1, 1, ;; 3]];
endDate = dateValuePairs[[-1, 1, ;; 3]];
yticks = {#*12,
DateString[(24 - #)*3600, {"Hour12Short", " ", "AMPM"}]} & /@
Range[0, 24, 4];
xticks =
With[{month = Take[DatePlus[startDate, {#, "Month"}], 2]},
{QuantityMagnitude@DateDifference[startDate, month],
DateString[month, "MonthName"]}
] & /@
Range[0, First@DateDifference[startDate, endDate, "Month"] + 1];
{
AspectRatio -> 1/4,
Frame -> True, FrameTicks -> {{yticks, None}, {xticks, None}},
FrameStyle -> GrayLevel[.75], FrameTicksStyle -> Black,
ImageSize -> 450, PlotRangePadding -> {{0, 0}, {0, 1}},
ColorFunction -> "StarryNightColors"
}
];
ArrayPlot[arrayData, arrayPlotStyleOptions]
```

## 5.4 Daily and monthly distributions

(* Auxiliary functions *) occupancyDataEntryToDay[{{year_, month_, day_, __}, _}] := {year, month, day}; occupancyDataEntryToMonth[{{year_, month_, __}, _}] := {year, month}; getDailyTotals[data_] := Module[{groupedByDay}, groupedByDay = GatherBy[data, occupancyDataEntryToDay]; Map[Total@#[[All, 2]] &, groupedByDay] ] (* daily distribution *) dailyOccupancyTotals = getDailyTotals[dateValuePairs]; days = DeleteDuplicates[occupancyDataEntryToDay /@ dateValuePairs]; dailyPlotData = Transpose[{days, dailyOccupancyTotals}]; (* plotting style options common to daily and monthly distribution plots *) sharedDateListPlotStyleOptions = { Joined -> True, Filling -> Axis, ImageSize -> 450, Frame -> True, FrameStyle -> GrayLevel[.75], FrameTicksStyle -> Black, FillingStyle -> Automatic, PlotStyle -> Directive[Thickness[.002], Black], PlotRange -> {libraryBin["TimeInterval"], {0, All}}, GridLines -> {DayRange[First@#, Last@#] &@ libraryBin["TimeInterval"](*{2016,#,1}&/@Range[1,12]*), None}, GridLinesStyle -> GrayLevel[.8] }; (* plotting options for daily distribution *) dailyDateListPlotStyleOptions = With[{padding = .1}, { FrameTicks -> {{Rest@ FindDivisions[{0, (1 + padding)*Max@dailyOccupancyTotals, 1}, 8], None}, {DateString[#, {"DayShort", " ", "MonthNameShort"}] & /@ DateRange[First@#, Last@#, Quantity[3, "Days"]] &@ libraryBin["TimeInterval"][[;; , 1]], None}}, PlotRangePadding -> {{0, 0}, {0, Scaled[padding]}}, ImagePadding -> Automatic, AspectRatio -> 1/4, ColorFunction -> "StarryNightColors" } ]; (* daily distribution plot *) dailyPlot = DateListPlot[dailyPlotData, sharedDateListPlotStyleOptions, dailyDateListPlotStyleOptions]

For monthly distribution:

```
groupedByMonth = GatherBy[dateValuePairs, occupancyDataEntryToMonth];
monthlyOccupancyAverages =
Map[Mean@getDailyTotals@# &, groupedByMonth];
months = DeleteDuplicates[occupancyDataEntryToMonth /@ dateValuePairs];
monthlyPlotData = Transpose[{months, monthlyOccupancyAverages}];
(* monthly distribution plotting options *)
monthlyDateListPlotStyleOptions =
With[{padding = .3},
{
FrameTicks -> {{Most@
FindDivisions[{0, (1 + padding)*Max@monthlyOccupancyAverages,
1}, 4], None}, {months, None}},
PlotRangePadding -> {{0, 0}, {0, Scaled[padding]}},
ImagePadding -> Automatic, AspectRatio -> 1/4, ColorFunction -> "StarryNightColors"
}
];
monthlyPlot = DateListPlot[monthlyPlotData, sharedDateListPlotStyleOptions, monthlyDateListPlotStyleOptions]
```

Daily and monthly distributions stacked together:`Column[{dailyPlot, monthlyPlot}, Spacings -> 0]`

## 5.5 Distributions for each floor

Lastly, we want to see how the PC usage evolves over a day on each floor. To generate the following plots, I combined pieces of code from previous sections and formed a function which could be used repeatedly.

```
plotDailyDistribution[timesries_] := Block[
{dateValuePairs, timeIntervals, OccupancyAtTimeInterval,
addOccupancyToTimeInterval, OccupancyDistributionData,
barChartStyleOptions},
(* convert date-objects to date-strings *)
dateValuePairs = (tseries[timesries]["DatePath"]) /. {x_,
y_} :> {DateString@x, y};
(* Reformat date-strings to date-lists *)
dateValuePairs[[All, 1]] = DateList /@ dateValuePairs[[All, 1]];
(* create 5 minute intervals over a 24-hour period *)
timeIntervals = Tuples[{Range[0, 23], Range[0, 55, 5]}];
OccupancyAtTimeInterval[_] = 0;
addOccupancyToTimeInterval[{__, hour_, minute_, _}, stepCount_] :=
OccupancyAtTimeInterval[{hour, minute}] += stepCount;
addOccupancyToTimeInterval @@@ dateValuePairs;
OccupancyDistributionData = OccupancyAtTimeInterval /@ timeIntervals;
OccupancyDistributionData =
RotateLeft[OccupancyDistributionData, 12*6];
(* averaged daily distribution of occupancy *)
(* barchart styling options *)
barChartStyleOptions = With[{yticks =
Map[{#*12,
DateString[3600*(# + 6), {"Hour12Short", " ", "AMPM"}]} &,
Range[0, 24, 2]]},
{
AspectRatio -> 1/5, Frame -> True, FrameTicks -> {{None, None}, {yticks, None}}, FrameTicksStyle -> Black, GridLines -> {Range[24, 11*24, 24], None},
GridLinesStyle -> GrayLevel[0.87], PlotRange -> {{1, 288}, All}, Ticks -> None, BarSpacing -> 0,
PlotRangePadding -> {{1, 0}, {1, Scaled[.15]}}, ChartElementFunction ->
ChartElementDataFunction["SegmentScaleRectangle",
"Segments" -> 20, "ColorScheme" -> "StarryNightColors"], PerformanceGoal -> "Quality", ImageSize -> 450, FrameStyle -> GrayLevel[0.75], ImagePadding -> {{12, 12}, {20, 0}}
}
];
Column[{timesries, BarChart[OccupancyDistributionData, barChartStyleOptions]}, Alignment -> Center]
]
(* plot distributions *)
{
Column[plotDailyDistribution[#] & /@ {"totalNum", "totalUsingPCs", "totalNotUsingPCs"}, Spacings -> 0],
Column[plotDailyDistribution[#] & /@ {"grndFloorATR", "grndFloorEXT", "floor1", "floor2", "floor3"}, Spacings -> 0]
} // gridOfTwoItems
```

# Conclusion

This post demonstates (I hope) the enormous power that data holds. This so-called “data” is now seemingly ubiquitous — in fact, theres’s more of it than ever before. Nearly all devices we carry or surround ourselves with produces a copiuos amount of data — some of which, we don’t have direct access to or know what to do with.

Running an analysis such as this over a lengthy period of time, say, years, could reveal a very interesting insight — other than the expected rise and fall of occupants during and outside term times— and indeed useful for Queen’s students. I originally started this out of the frustration of not fining a seat in the library during the summer exams.

Furthermore, when a PC is logged off automatically (due to prolonged inactivity, for instance), the counter goes -1, but the user could still have their stuff there, or indeed, still be sat in front of that very PC!

However, looking at the **totalNum** plots, it appears one ought to be there before 11am — one ought to follow the trend.

*First-come, first-served*, I guess.