Data Gator blog

To content | To menu | To search

Sunday 26 February 2017

Proof of Concept Project Report

This project can be described as aggregating IoT time series data and creating Candlestick visualizations using that data.

The main open technologies and standards used in this project are the HTTP protocol for communication and the Open API specification (www.openapis.org) to make it possible for others to make use of the project in a standard way.

The applicability of the Candlestick visualizations could serve many industries by making what could possibly be bad\missing observations in the IoT data more precisely and clearly discernible over other commonly available JavaScript based line charts.

Another use could be for creating data exploration and visualization applications running directly on a gateway device using something like Kweb for the Raspberry PI. The applications could be very light weight and not impact the normal gateway duties by not needing to use any JavaScript at all which can cause a negative impact even on the largest PI model currently available. Other application developers could create more specialized applications tuned to different devices using the Open API specification.

Finally a third use would be inserting data visualizations into blog posts, forum boards and social media messages where using any JavaScript at all is not possible. The visualizations would also be rendered with the latest available IoT data at the time it is viewed and also each time the post or message is reloaded an updated chart would automatically be rendered then shown.

One of the lessons I learned is that working with IoT data can be difficult. You always have to keep checking for bad data and how to handle it depending on where the bad data is found. With the many different ways IoT solutions can choose to structure their data this can be made even more difficult.

The communication protocol used by this project had been changed from the original CoAP to HTTP. I still believe that was a good decision and should make the project more accessible to a larger audience by using a protocol they probably already use daily. All of the objectives stated in the project proposal have been achieved (taking into consideration the change from CoAP to HTTP).

Instead of making a video of the project, I have decided to make the service available live for any one who would like to try it out. In the blog posts below I have tried to explain exactly how that can be done as simply and concise as possible.

A Live demo of the Project (part 1 of 4)

IMPORTANT: The service as described below is not available online at the moment. If anyone is seriously interested in testing this proof of concept demo, just let me know in the comments and I will set up a live secure (https) tunnel connection to the service. Sat Mar 11 2017 02:28:47 

First I will disclaim that this project currently uses alpha level code and many things can or even should be expected to go wrong. The service is currently being hosted on an old laptop with only a slow mobile internet connection. Also the laptop is currently located where the electricity usually goes out for several hours at least once a week and even more often brown outs occur, sometimes it can keep on going thru these depending on how low the power dips and for how long. On top of all that the internet connection can also come an go or just slow way down to a crawl.

If all you want is to just see the end result of a rendered Candlestick chart visualization this can be achieved by simply copying and pasting a link into your web browser address bar. This is described in the blog posts below titled "Visualizing Aggregated Data".

I will try and make this demo available for week or so. The steps for the demo will using a web form page at api.datagator.tech/api thru the use of an API browser. At the end of each step (except step 3) just click the "Try it out!" button, for brevity I only mention it once here.

The web service uses tokens and one will be needed in later steps. This requires to first create an account so that you can login and get a token.

In the next parts of this post series I will explain how to do that. If something doesn't work or you just have a question just reply to the post and I try try and answer.

A Live demo of the Project (part 2 of 4)

Step 1: Create an Account

  1. Browse to the API web page at api.datagator.tech/api.
  2. On the web page open the user list by clicking on "user".
  3. Click the purple "POST" button with "/users" to the right of it.
  4. In the "data" box enter your unique email and password so get you can get a token later, just supply an email address an a password, the email will not verified or anything.
  5. Copy the value of the "id" key in the box labeled "Response Body" and also save a copy of the email and password sent in the request.
Account request example:
{
"email": "anybody@anywhere.com",
"password": "mySecret"
}

Step 2: Get an API Token

  1. Near the end of the "user" list click the purple "POST" button with "/users/login" to the right of it.
  2. Paste in the same email and password used earlier into the "credentials" box.
  3. Copy the value of the "id" key from the response without the quotes and paste it into the "accessToken" box on the top of the page.
  4. Click the "Set Access Token" button. This will save you from needing to worry about the token while using this web page, but the token will be needed to render the Candlestick chart in the last step.
Step 3: Choose the Topic Data Source
  1. Select the data source for the Topic from any available at https://thingspeak.com/channels/public (Topics are called fields there).
  2. On that web page left click the link at the top of the box that you have chosen and copy the link address.
  3. Click on the link that was just copied to open its web page.
  4. On the page that opens note the number of the Field you would like to use as Topic data.
  5. Next edit that saved link from the top of the box by appending "/fields/" then the number of the noted Field and adding a ".json" to the very end.
Your end result should look similar to this https://thingspeak.com/channels/172857/fields/4.json you can test it by pasting the link into the address bar of a web browser and checking the broker returns some JSON data.

A Live demo of the Project (part 3 of 4)

Step 4: Create the Topic

  1. On the DataGator API web page under user click the purple "POST" button with "/users/{id}/topics" to the right of it near the bottom.
  2. In the "id" box paste in a copy of your user "id" saved at the end of step 1 without quotes.
  3. In the "data box" supply the link created in step 3 as a value of the "topicURI" key formatted as JSON. Example at the end of this step.
  4. From the box labeled "Response Body" make a copy of the value of "id".
Example Topic request data.
{
"topicURI": "https://thingspeak.com/channels/172857/fields/4.json"
}

Step 5: Add Aggregates to the Topic.

  1. Click on the Topic heading near the top to open the topic list.
  2. Near the top click the purple "POST" button with "/Topics/{id}/aggregates" to the right of it.
  3. In the "id" box paste in a copy of the Topic "id" without quotes from the end of step 4.
  4. In the "data" box enter in JSON providing values for the keys "aggregateName", "aggregateMethod" and "period".
Aggregate request example:
{
"aggregateName": "Open",
"aggregateMethod": "Open",
"period": 1357
}

A note about "period": The value "period" can be any number seconds. For example I chose 1357 which is about 22 minutes 22 seconds. In this demo using public data from this broker I recommend something in the range of 1200 to 14,400.

IMPORTANT: You must use the exact same value for "period" in all 4 of the different "Open", "High", "Low" and "Close" Aggregates.

Step 6: Create additional Aggregates.

  1. Repeat sub steps 3 and 4 changing the values of "aggregateName" and "aggregateMethod" to "High".
  2. Repeat sub steps 3 and 4 changing the values of "aggregateName" and "aggregateMethod" to "Low".
  3. Repeat sub steps 3 and 4 changing the values of "aggregateName" and "aggregateMethod" to "Close".

A Live demo of the Project (part 4 of 4)

Step 7 Create a Candlestick

  1. Click on the user heading to open its list.
  2. Near the middle click the purple "POST" button with "/users/{id}/candlesticks" to the right of it.
  3. In the "id" box paste in a copy of your user "id" saved at the end of step 1 without quotes.
  4. In the "data" box enter in JSON providing values for the keys "topicId" and "period" the value of "topicId" is the value saved from the end of step 4 and "period" must be the exact same as supplied for all 4 of the different Aggregates in steps 5 and 6. Example at the end.
Candlestick request example:
{
"topicId": "58b2408c11ced21375a51beb",
"period": 1357
}

Step 8: Update Topic Data

  1. Click on the user heading to open its list.
  2. Near the end click the brown "PUT" button with "/Topics/{id}/updateData" to the right of it.
  3. In the "id" box paste in a copy of the Topic "id" without quotes from the end of step 4.
A note about "updateData": The Topic data will need to use "updateData" few dozen times at least to gather enough Topic data to create a Candlestick chart visualization.  In the box labeled "Curl" you can see the curl command that will update the Topic data, if know how to automate that command using your OS or if there is any other way you can automate a "PUT request to the link in the curl command, then great. If all else fails you can use that "Try it out!" every few minutes till there is enough data. I have a Node-Red flow that can update data also, if anyone wants it just ask in a response to this post.

Step 9: Create a Candlestick Time Series Visualization

  1. Construct a link by starting with "api.datagator.tech/image/candlestick/", then append a copy of the Topic "id" without quotes from the end of step 4, then put "/chart" at the end of that.
  2. The link will need at least one parameter "access_token", place a "?" before that first parameter then tack it on to the end of the link.
  3. Paste the just created into the address bar of you web browser and hit enter.
Example Candlestick time series request example:
https://api.datagator.tech/image/candlestick/58b2897926389b3391378e75/chart?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Thursday 23 February 2017

Visualizing Aggregated Data (part 1)

IMPORTANT: The services described below are not currently available online. If anyone is seriously interested in testing this proof of concept demo, just let me know in the comments and I will set up a live secure (https) tunnel connection to the service. Sat Mar 11 2017 02:36:10

To create a Candlestick chart of the aggregated data all that is needed is to make a request to the chart end point. But, before that will work first the basic details of the chart need to be configured. This is very similar to how the other services work that I have written about in previous posts.

As shown in the example below the mutable parameters can be set in the URL request for the chart. For this example the URL would simply be pasted into the address bar of a web browser. The link below is functional and can be pasted directly into an address box of any web browser. The parameters can also be edited to try out different things.

api.datagator.tech/image/candlestick/58a860759b20b239b3378e7e/chart?access_token=TOKEN&width=480&height=300&imageFormat=png&imageQuality=80

Visualizing Aggregated Data (part 2)

The only required configuration keys for the Candlestick chart are the topicId (id of the Topic) and period (time period of the aggregated data). For this to work correctly the chosen Topic must have four different Aggregates of the same period that belong to the chosen Topic. The four different aggregate methods needed are Open, High, Low, and Close that are aggregated to the same period specified in the configuration key for the chart.

Only the topicId and period keys are needed, because the service will automatically search for the needed Aggregates (Open, High, Low and Close) using the period specified that belong to the Topic.

The different configuration keys available for a Candlestick chart are shown below.

Candlestick {
topicId (string),
period (number),
candlestickName (string, optional),
candlestickDesc (string, optional),
candlestickUnit (string, optional),
ohlcAggIds (Array[string], optional),
startingDate (string, optional),
width (number, optional),
height (number, optional),
imageFormat (string, optional),
imageDPI (number, optional),
imageQuality (number, optional),
theme (string, optional),
showInfo (boolean, optional),
timeZone (string, optional),
locale (string, optional)
}

I haven't had time to implement all the above parameters, but most should work.

Visualizing Aggregated Data (part 3)

Next to create an instance of a Candlestick chart configuration using curl.

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{ \
"topicId": "58a80d7d9b20b239b3378e6e", \
"period": 3600 \
}' 'http://api.datagator.tech/users/5769d43bc577a38c3ff66/candlesticks?access_token=TOKEN'

The response below showing the Aggregate objects automatically found for the Topic. Also there are no default values set for candlestickName, candlestickDesc or candlestickUnit as these are dynamically looked up from the Topic each time a chart image is generated on the server.

{
"topicId": "58a80d7d9b20b239b3378e6e",
"period": 3600,
"ohlcAggIds": [
"58a80ea79b20b239b3378e72",
"58a80e6f9b20b239b3378e71",
"58a80e3c9b20b239b3378e70",
"58a80e059b20b239b3378e6f"
],
"width": 1280,
"height": 720,
"imageFormat": "svg",
"imageDPI": 120,
"imageQuality": 80,
"theme": "dark",
"showInfo": true,
"timeZone": "utc",
"locale": "en_US.UTF-8",
"id": "58a85f5c9b20b239b3378e7d"
}

Visualizing Aggregated Data (part 4)

I have a tunnel connection setup to an old laptop that has been configured like a server. The data connection is only mobile internet from a very remote location and it will be very slow. Spaces can be insert into parameters that accept strings by using "%20". Using the special access_token=TOKEN will provide read access to 3 different Topics to create Candlestick charts. The data goes back to 2014 if I remember correct, but there are many spots where the data from the broker gets rough (missing or bad observations). The data seems to be getting worse as time goes on, maybe some maintenance is needed on the IoT device.

The 3 Candlestick object ids available shown first followed by there originating data URL are shown below.

  1. 58a85f5c9b20b239b3378e7d        http://api.thingspeak.com/channels/12397/fields/3.json
  2. 58a860759b20b239b3378e7e        http://api.thingspeak.com/channels/12397/fields/4.json
  3. 58a861009b20b239b3378e7f        http://api.thingspeak.com/channels/12397/fields/6.json

The service uses many different VM containers and it would not be worth it to move everything onto a big server somewhere just for this small proof of concept demo, but I'm sure the rendered images would be much much faster. Locally on the laptop HD SVG images are rendered in about 150ms, other vector formats take about the same time. Raster formats (PNG, JPG, etc) take about 3 times as long as they are currently trans-coded from an SVG. These times are measured at the last stage entering and leaving the laptop and include verifying the token, gathering all the data, etc. The code that renders the images is currently all synchronous and single threaded. It is better to first get things implemented and running correctly before working on speed optimizations.

The data for the Topics are requested from the broker once about every 5 minutes, so there will be no new data to render any sooner than that. Historic data can be requested by using the startingDate parameter using just a date (startingDate=2015-03-23) or a fuller more precise format down to seconds (startingDate=2015-03-23T07:23:09). Different image formats can be returned by using the imageFormat parameter and the types currently supported should be bitmap, gif, jpeg, pdf, png, ps, svg, tiff and webp. I have limited the size of mostly the raster formats as they use a lot of the limited data currently available when they get large.

*** Update *** I have slowed down the interval for requesting new Topic data from the broker from every 5 minutes to about 20 minutes. This was done because the behavior of the broker seems different after the horizon has been reached in requesting Topic data. I am now busy with the Final Report and do not have time to look into it.

There is probably much I haven't mentioned, so if there are any questions just ask in the replies of any of the posts.

Monday 13 February 2017

Updating Topic Data

In this post I will describe one way how the Topics data can get updated and thus aggregated. The updating is done strictly on an on demand bases. This was done to allow flexible configuration choices depending on the needs of the user. To update the data all one needs to do is make a HTTP PUT request to the updateData end point of the Topic. After receiving this HTTP PUT request then the DataGator web service will make a request for data from the broker that was configured when the Topic was created. After the data is received from the broker it is then aggregated by the Topic using the Aggregate(s) that have been configured for it. This is described more in a post below titled Aggregates.

Here is a simple example of getting the Topic data to update using curl.

curl -X PUT 'http://api.datagator.tech/Topics/588b99fbfba65a25640ccc58/updateData?access_token=TOKEN'

A sample response after making the above request would look similar to this.

{
"updated": {
"Open@3600": "Aggregated 1859 observations in 0.00978148 seconds with 0 bad observations. ",
"High@3600": "Aggregated 1859 observations in 0.0095574 seconds with 0 bad observations. ",
"Low@3600": "Aggregated 1859 observations in 0.0108046 seconds with 0 bad observations. ",
"Close@3600": "Aggregated 1859 observations in 0.0123292 seconds with 0 bad observations. ",
"Total": "7436 aggregate operations performed in 0.0424727 seconds. (175077 agg ops per sec)"
}
}

Some status and timings are currently in the results mainly just to get a relative idea of the impact that any changes have made. The timings are of just opening a connection to where the Aggregate data is stored, aggregating the data and storing it in its data base. Each Aggregate uses a separate and segregated data store.

There are various way of making the HTTP PUT request that will update the Topic data. I am currently using a Node-Red flow that will periodically make the request.  Using a Webhook service triggered by just about anything is also a possibility. Ideally the IoT gateway that collects the data should have a good idea of when it would be a good time to have it aggregated.

Later I will discuss visualizing the aggregated data.

Sunday 5 February 2017

Aggregates

After a Topic has been created as described in my previous posts, then a way to aggregate the data would be needed next.  In this next set of posts I will discuss how that can be done. 

First it should be helpful to understand that each new Aggregate created will have a Topic as a parent. Each Aggregate can have only one parent Topic and they are otherwise unique individual entities. Other than its parent the main properties of an Aggregate are its method and the time period it will use to aggregate the data. It is not possible to change these properties after an Aggregate has been created and in fact the only property that can be changed after it has been created is its name. 

When a Topic gets new data it will send a complete copy of this data to each of its Aggregates. Each Aggregate will then aggregate this data  according to the configured options then store the aggregated data for later use.

Creating a new Aggregate is very similar to how creating a Topic was discussed earlier. Below is the JSON configuration structure with some brief descriptions for a Topic.

Aggregate {
aggregateMethod (string): The aggregation method to be applied to the Topic data at each period duration. ,
period (number): The duration of time specified in seconds to aggregate the Topic data. ,
beginDate (string, optional): The starting time and date to use as the begining aggregated Topic data. ,
aggregateName (string, optional): A name to associate with this method of aggregated Topic data.
}

There are two required values (aggregateMethod and period) that will need to be set in order to create a new Aggregate. If the beginDate is not set then the brokerCreate date from the Topic to which this Aggreagte belongs will be used. The aggregateName is used to store a more human readable way to identify this Aggregate and can also be be used in a search criteria to find available Aggregates. 

Here is an example using curl that will create a new Aggregate.

curl -X POST -d '{ \ 
"aggregateName": "Close", \
"aggregateMethod": "Close", \
"period": 3600 \
}' 'http://api.datagator.tech/Topics/587f7c45aa784d4a3743667d/aggregates?access_token=TOKEN'

The Topic that this Aggregate will belong to is in the URI path used to create it. This is why it is not necessary to declare in the configuration JSON.

The response from creating the example Aggregate is below.

{
"aggregateMethod": "Close",
"period": 3600,
"beginDate": "2014-05-20T21:50:32.000Z",
"aggregateName": "Close",
"id": "588be68d58a8a828986379b4",
"created": "2017-01-28T00:32:13.674Z",
"modified": "2017-01-28T00:32:13.674Z"
}

In a future I will discuss how actually get the data aggregated.

Sunday 22 January 2017

Create a Topic

The curl command below will create a new Topic that will aggregate the IoT data of the supplied topicURI. Below is the data portion of the response.

curl -X POST -d '{ "topicURI": "http://api.thingspeak.com/channels/12397/fields/4.json" }' 'http://api.datagator.tech/users/123456789/topics?access_token=TOKEN'

The "errorWait" and "errorRetryMax" are just default values and the "brokerCreate" date will be calculated on the backend. The "id" is used to uniquely reference the Topic on DataGator. 

{
"topicURI": "http://api.thingspeak.com/channels/12397/fields/4.json",
"errorWait": 20,
"errorRetryMax": 3,
"brokerCreate": "1970-01-01T00:00:00.000Z",
"id": "587f7c45aa784d4a3743667d",
"created": "2017-01-18T14:31:33.520Z",
"modified": "2017-01-18T14:31:33.520Z"
}

Verify Topic Configuration

A get request to can be used to verify the values that the Topic was configured with on the backend.

curl 'http://api.datagator.tech/Topics/587f7c45aa784d4a3743667d?access_token=TOKEN'

Below is the data portion of the response.

{
"topicURI": "http://api.thingspeak.com/channels/12397/fields/4.json",
"topicName": "WeatherStation",
"topicDescription": "MathWorks Weather Station, West Garage, Natick, MA 01760, USA",
"topicUnit": "Temperature (F)",
"latitude": "42.299676",
"longitude": "-71.350525",
"elevation": "60",
"errorWait": 20,
"errorRetryMax": 3,
"brokerCreate": "2014-05-20T21:50:32.000Z",
"id": "587f7c45aa784d4a3743667d",
"created": "2017-01-18T02:55:41.091Z",
"modified": "2017-01-18T02:55:47.000Z"
}

All the optional values for the Topic were filled by the backend. To really leave a value empty just supply an empty string "" for it when creating the Topic.

Notes

The URLs that were shown for the DataGator service are not yet being made publicly available and are subject to change. They were just used to provide a clearer explanation of about how interaction with the service would look like.

I plan to create integrations to more IoT data brokers in the future also to allow direct bulk uploading of data. Also planned is to have a bridge to CoAP and implement a direct uploading of IoT data.

Saturday 21 January 2017

Configure a Topic

Here is the JSON configuration data structure with some brief descriptions for a Topic.

Topic {
topicURI, (string): The location on the internet to make requests for Topic data.
topicKey, (string, optional): An API key if needed to make requests for Topic data.
topicName, (string, optional): A name for this Topic
topicDescription, (string, optional): A description for this Topic
topicUnit, (string, optional): The units of measure of this Topic data
latitude, (string, optional): The latitude geo location of this Topic.
longitude, (string, optional): The longitude geo location of this Topic.
elevation, (string, optional): The elevation geo location of this Topic.
timeTag, (string, optional): A path to locate the time stamp component in the data structure of this Topic.
valueTag, (string, optional): A path to locate the value of interest in the data structure of this Topic.
sepTag, (string, optional): A single character used to separate the path components into the data structure of this Topic.
errorWait, (number, optional): The number of seconds to wait between failed requests the broker for Topic data.
errorRetryMax, (number, optional): The number of total retries of failed attempts to request Topic data from the broker.
brokerCreate, (string, optional): The starting time and date to use for retrieving Topic data from the broker.
}

Most all values are optional and the only required value is topicURI for access to data. The DataGator service will detect the broker from the topicURI and deal with the specifics of how to request data from the different brokers. The DataGator service attempt to be as user friendly and simple to use as possible. 

Friday 20 January 2017

HTTP IoT broker intregation

I have implemented an HTTP data broker integration with ThingSpeak. There seems to be a large amount of public IoT data available and it should be possible for anyone who wants to use their service. There is a slightly different naming convention on ThingSpeak from what I have noticed some other brokers using. ThingSpeak seems to use "channels" and "fields" where other brokers may use "devices" and "topics" respectively. Where the DataGator service just makes different aggregated topic level data available.

Some broker integration has also been done with OpenSensors. This integration has proved to be more challenging as there is more variety in the way the data is structured on the broker. This variety in the data storage structure made it necessary to construct different parse paths into the structures to allow describing where exactly the data of interest is.

I think the DataGator will complement with what the brokers offer. The visualizations that are provided by ThingSpeak seem to use client side JavaScript and what the DataGator service would offer are rendered server side. Using server side rendering should provide an advantage for constrained clients and make it easier to directly display visualizations on these devices.

Thursday 5 January 2017

A new year and a change to the project proposal

This project was submitted to the Eclipse IoT Challenge 3.0 and is number 29 in the public list of proposals. I have decided to make a change from the original proposal that was submitted. Instead of making the service available primarily only on CoAP I will be using HTTP instead. CoAp could still be supported using a protocol bridge, so there is no lost in this. This should make the service available to a bigger audience.

I am now researching my integration options for the project.

Tuesday 3 January 2017

Hello and Welcome

On this blog site I plan to make posts about the new IoT data project I am working on. This project will will about data aggregation and visualization. I will break the project into smaller parts and blog about the separate pieces here. Everyone who reads the blog is welcomed to comment an create discussion.