{"id":9689,"date":"2019-08-22T13:30:54","date_gmt":"2019-08-22T13:30:54","guid":{"rendered":"https:\/\/blog.uruit.com\/?p=9689"},"modified":"2023-05-31T08:16:47","modified_gmt":"2023-05-31T11:16:47","slug":"machine-learning-with-tensorflow","status":"publish","type":"post","link":"https:\/\/uruit.com\/blog\/machine-learning-with-tensorflow\/","title":{"rendered":"Evolution of Machine Learning, from Server to the Edge"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_17 counter-hierarchy counter-decimal ez-toc-grey\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\">Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" style=\"display: none;\"><i class=\"ez-toc-glyphicon ez-toc-icon-toggle\"><\/i><\/a><\/span><\/div>\n<nav><ul class=\"ez-toc-list ez-toc-list-level-1\"><li class=\"ez-toc-page-1 ez-toc-heading-level-1\"><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/uruit.com\/blog\/machine-learning-with-tensorflow\/#Evolution_of_Machine_Learning_from_Server_to_the_Edge\" title=\"Evolution of Machine Learning, from Server to the Edge\">Evolution of Machine Learning, from Server to the Edge<\/a><ul class=\"ez-toc-list-level-3\"><li class=\"ez-toc-heading-level-3\"><ul class=\"ez-toc-list-level-3\"><li class=\"ez-toc-heading-level-3\"><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/uruit.com\/blog\/machine-learning-with-tensorflow\/#Introducing_Tensorflowjs_the_brain_for_your_smart_apps\" title=\"Introducing Tensorflow.js: the brain for your smart apps\u00a0\">Introducing Tensorflow.js: the brain for your smart apps\u00a0<\/a><\/li><li class=\"ez-toc-page-1 ez-toc-heading-level-3\"><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/uruit.com\/blog\/machine-learning-with-tensorflow\/#How_to_use_Tensorflowjs_models\" title=\"How to use Tensorflow.js models\u00a0\u00a0\">How to use Tensorflow.js models\u00a0\u00a0<\/a><\/li><li class=\"ez-toc-page-1 ez-toc-heading-level-3\"><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/uruit.com\/blog\/machine-learning-with-tensorflow\/#Using_poseNET_model_to_analyze_video_streams\" title=\"Using poseNET model to analyze video streams\u00a0\">Using poseNET model to analyze video streams\u00a0<\/a><\/li><li class=\"ez-toc-page-1 ez-toc-heading-level-3\"><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/uruit.com\/blog\/machine-learning-with-tensorflow\/#PoseNET_use_cases_and_alternatives\" title=\"PoseNET use cases and alternatives\">PoseNET use cases and alternatives<\/a><\/li><li class=\"ez-toc-page-1 ez-toc-heading-level-3\"><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/uruit.com\/blog\/machine-learning-with-tensorflow\/#PoseNET_alternatives_and_more_pre-trained_models\" title=\"PoseNET alternatives and more pre-trained models\u00a0\">PoseNET alternatives and more pre-trained models\u00a0<\/a><\/li><li class=\"ez-toc-page-1 ez-toc-heading-level-3\"><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/uruit.com\/blog\/machine-learning-with-tensorflow\/#How_does_poseNET_work\" title=\"How does poseNET work?\u00a0\">How does poseNET work?\u00a0<\/a><\/li><li class=\"ez-toc-page-1 ez-toc-heading-level-3\"><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/uruit.com\/blog\/machine-learning-with-tensorflow\/#Summary\" title=\"Summary\u00a0\">Summary\u00a0<\/a><\/li><\/ul><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<h1><span class=\"ez-toc-section\" id=\"Evolution_of_Machine_Learning_from_Server_to_the_Edge\"><\/span><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Evolution of Machine Learning, from Server to the Edge&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:15231,&quot;3&quot;:{&quot;1&quot;:0},&quot;4&quot;:{&quot;1&quot;:2,&quot;2&quot;:16777215},&quot;5&quot;:{&quot;1&quot;:[{&quot;1&quot;:2,&quot;2&quot;:0,&quot;5&quot;:{&quot;1&quot;:2,&quot;2&quot;:0}},{&quot;1&quot;:0,&quot;2&quot;:0,&quot;3&quot;:3},{&quot;1&quot;:1,&quot;2&quot;:0,&quot;4&quot;:1}]},&quot;6&quot;:{&quot;1&quot;:[{&quot;1&quot;:2,&quot;2&quot;:0,&quot;5&quot;:{&quot;1&quot;:2,&quot;2&quot;:0}},{&quot;1&quot;:0,&quot;2&quot;:0,&quot;3&quot;:3},{&quot;1&quot;:1,&quot;2&quot;:0,&quot;4&quot;:1}]},&quot;7&quot;:{&quot;1&quot;:[{&quot;1&quot;:2,&quot;2&quot;:0,&quot;5&quot;:{&quot;1&quot;:2,&quot;2&quot;:0}},{&quot;1&quot;:0,&quot;2&quot;:0,&quot;3&quot;:3},{&quot;1&quot;:1,&quot;2&quot;:0,&quot;4&quot;:1}]},&quot;8&quot;:{&quot;1&quot;:[{&quot;1&quot;:2,&quot;2&quot;:0,&quot;5&quot;:{&quot;1&quot;:2,&quot;2&quot;:0}},{&quot;1&quot;:0,&quot;2&quot;:0,&quot;3&quot;:3},{&quot;1&quot;:1,&quot;2&quot;:0,&quot;4&quot;:1}]},&quot;9&quot;:0,&quot;11&quot;:3,&quot;12&quot;:0,&quot;14&quot;:{&quot;1&quot;:2,&quot;2&quot;:0},&quot;15&quot;:&quot;Calibri&quot;,&quot;16&quot;:11}\">Evolution of Machine Learning, from Server to the Edge<\/span><span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p><span style=\"font-weight: 400;\">Traditionally, Machine Learning (ML) and Deep Learning (DL) models were implemented within an application in a server-client fashion way. The server provided training and inference capabilities by exposing web APIs of any sort (REST, web-socket, PUB\/SUB, etc&#8230;) while the client was used mainly to exchange data with the server and present the inference result to the user. <\/span><span style=\"font-weight: 400;\">Whilst this approach has been proven to work well in many cases, it involves a lot of Input\/Output (I\/O) and a subsequent slowdown of the application.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">However, in recent years, <strong>the concept of moving DL models to the client-side has emerged<\/strong>, which is, in most cases, referred to as the EDGE of the system. This approach has been made possible by the newest advancement in GPU\/mobile technologies, like the NVIDIA Jetson microcomputer, and by the introduction and support of ML frameworks for EDGE devices, like TensorFlow Lite and TensorFlow.js.\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" class=\"aligncenter wp-image-9695 size-full\" title=\"machine learning with tensorflow\" src=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/post-davide4.jpg\" alt=\"machine learning with tensorflow\" width=\"3780\" height=\"3780\" \/><\/p>\n<p>Due to this exciting new development in machine learning and deep learning, we figured it would be interesting to <strong>show you how you can use Tensorflow.js and a pretrained model called PoseNet to create new possibilities for real-time, human-computer interaction<\/strong> that takes on a Kinect-like style.<span style=\"font-weight: 400;\"> So, in this article, you&#8217;ll find a simple tutorial for applying ML to your project, (even if for the first time) and some use cases for it so you can gain a better understanding of why you would want to apply this technology.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Introducing_Tensorflowjs_the_brain_for_your_smart_apps\"><\/span><span style=\"font-weight: 400;\">Introducing Tensorflow.js: the brain for your smart apps\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-9700\" src=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/logo-tensorflow.png\" alt=\"\" width=\"1200\" height=\"675\" srcset=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/logo-tensorflow.png 1200w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/logo-tensorflow-300x169.png 300w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/logo-tensorflow-1024x576.png 1024w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/logo-tensorflow-768x432.png 768w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/logo-tensorflow-750x422.png 750w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/logo-tensorflow-1140x641.png 1140w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/logo-tensorflow-20x11.png 20w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">TensorFlow.js is the JavaScript version of the popular machine learning framework released for Python by Google in 2015. <\/span><span style=\"font-weight: 400;\"><strong>TensorFlow provides an environment that facilitates the development of complex statistical models<\/strong> by exposing an API composed of high-level abstractions of the nuts and bolts (activation and loss functions, graphs data-flow, tensor operations, etc.) of a Machine Learning\/Deep Learning model and by leveraging the power of GPU parallel computing.\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Thanks to the WebGL, API TensorFlow.js can leverage the power of the GPU even in the browser, allowing us to run complex deep learning models for training and inference.\u00a0<\/span><span style=\"font-weight: 400;\">The following picture represents the architecture of the framework. Most of the time, the user will access the Layers API (high-level abstraction) while the Ops API provides low-level linear algebra operations.<\/span><\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-9696\" src=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/post-davide3.jpg\" alt=\"\" width=\"3780\" height=\"3780\" \/><\/p>\n<h3><span class=\"ez-toc-section\" id=\"How_to_use_Tensorflowjs_models\"><\/span><span style=\"font-weight: 400;\">How to use Tensorflow.js models\u00a0\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">If you\u2019re interested in applying machine learning to your app, the easiest way to do so is to include one of the pre-trained models for Tensorflow.js published on NPM under the scope, <\/span><b>@tensorflow-models<\/b><span style=\"font-weight: 400;\">. Those models come pre-trained and can be included in any JS application like a JS module.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In order to illustrate this, in this post we are going to show a concrete example. <\/span><span style=\"font-weight: 400;\"><strong>PoseNET is a machine learning model that allows human pose estimation in real-time<\/strong>. We refer to human estimation as the ability to identify human figures and the relative key body joints and parts. <\/span><span style=\"font-weight: 400;\">Let&#8217;s dive into the code to understand a bit more about what we can achieve with this model.\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true \">## create a directory and initialise the project \r\n\r\nmkdir fun-with-posenet &amp;&amp; cd fun-with-posenet \r\n\r\n \r\n\r\n## Install the live-server module (used to spin-up a webserver) \r\n\r\nnpm i \u2013g live-server \r\n\r\n \r\n\r\n# create a file called index.html and add the following piece of code \r\n\r\n# add an image to the project directory called myImage.jpg \r\n\r\n \r\n\r\n&lt;!-- index.html file start--&gt; \r\n\r\n&lt;img id=\u201dmyImage\u201d src=\u201dmyImage.jpg\u201d\/&gt; \r\n\r\n&lt;script src=\"https:\/\/unpkg.com\/@tensorflow\/tfjs\"&gt;&lt;\/script&gt; \r\n\r\n&lt;script src=\"https:\/\/unpkg.com\/@tensorflow-models\/posenet\"&gt;&lt;\/script&gt; \r\n\r\n&lt;script type=\"text\/javascript\"&gt; \r\n\r\n\/\/ loading the network \r\n\r\nposenet.load() \r\n\r\n.then(net =&gt; { \r\n\r\n\/\/ posenet model loaded, now we get the DOM element with the \r\n\r\n\/\/ image or video and we return a single pose estimation \r\n\r\nconst imgElem = document.getElementById(\u201cmyImage\u201d) \r\n\r\nreturn net.estimateSinglePose(imgElem) \r\n\r\n}) \r\n\r\n.then(console.log) \r\n\r\n\/\/ print the pose estimation, if any to console \r\n\r\n&lt;\/script&gt; \r\n\r\n&lt;!--index.html file end--&gt; \r\n\r\n \r\n\r\n# serve the index file using live-server \r\n\r\nlive-server<\/pre>\n<p><span style=\"font-weight: 400;\">As we can see from the code above, the actual implementation and usage of this model is very straight-forward and can be done with just 10 lines of code.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The single pose estimation will return a JS object with the following parameters:\u00a0<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><b>Score &#8211; <\/b><span style=\"font-weight: 400;\">a probability value that can be used to discard wrong estimations, in our case we used a threshold value of <\/span><b>0.3<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Keypoints &#8211; <\/b><span style=\"font-weight: 400;\">a list of key-points with their name, score, and coordinates\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Using the key-points provided, <strong>it will be possible to draw the estimated pose in an overlaying <\/strong><\/span><strong>canvas<\/strong><span style=\"font-weight: 400;\">, like the pictures below.\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-9690\" src=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-31-40.png\" alt=\"\" width=\"1349\" height=\"730\" srcset=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-31-40.png 1349w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-31-40-300x162.png 300w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-31-40-1024x554.png 1024w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-31-40-768x416.png 768w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-31-40-750x406.png 750w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-31-40-1140x617.png 1140w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-31-40-20x11.png 20w\" sizes=\"(max-width: 1349px) 100vw, 1349px\" \/><\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-9691\" src=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-32-11.png\" alt=\"\" width=\"1317\" height=\"747\" srcset=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-32-11.png 1317w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-32-11-300x170.png 300w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-32-11-1024x581.png 1024w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-32-11-768x436.png 768w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-32-11-750x425.png 750w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-32-11-1140x647.png 1140w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-09-32-11-20x11.png 20w\" sizes=\"(max-width: 1317px) 100vw, 1317px\" \/><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Using_poseNET_model_to_analyze_video_streams\"><\/span><span style=\"font-weight: 400;\">Using poseNET model to analyze video streams\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Since the <\/span>estimateSinglePose <span style=\"font-weight: 400;\">method takes a DOM image or video element as input, applying the same method to a webcam stream or to a video stream, is painless. You just need to execute the estimation for every frame, or every <\/span><i><span style=\"font-weight: 400;\">n<\/span><\/i><span style=\"font-weight: 400;\"> number of milliseconds, which can be done using the <\/span>setInterval<span style=\"font-weight: 400;\"> or the <\/span>requestAnimationFrame<span style=\"font-weight: 400;\"> function<\/span><span style=\"font-weight: 400;\">s<\/span><span style=\"font-weight: 400;\"> provided by the JS language.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The following is an example of how to start the webcam feed and how to estimate every frame.\u00a0<\/span><span style=\"font-weight: 400;\">Here\u00a0 is a small video of poseNET running on a MP4 video:\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true \">import * as posenet from \"@tensorflow-models\/posenet\";  \r\n\r\nconst videoElem = document.getElementById(\"video\"); \r\n\r\n    videoElem.height = 600; \r\n\r\n    videoElem.width = 600; \r\n\r\n    \/\/ get camera access \r\n\r\n    return navigator.mediaDevices \r\n\r\n      .getUserMedia({ \r\n\r\n        audio: false, \r\n\r\n        video: { height: videoElem.height, width: videoElem.width} \r\n\r\n      }) \r\n\r\n      .then(stream =&gt; { \r\n\r\n        \/\/ bind the stream to the videoElem and play \r\n\r\n        videoElem.srcObject = stream; \r\n\r\n        videoElem.play(); \r\n\r\n        \/\/ load the psoenet model \r\n\r\n        return posenet.load({ \r\n\r\n          architecture: \"MobileNetV1\", \r\n\r\n          outputStride: 16, \r\n\r\n          inputResolution: 257, \r\n\r\n          multiplier: 0.75 \r\n\r\n        }); \r\n\r\n      }) \r\n\r\n      .then(net =&gt; { \r\n\r\n     \/\/ start a \"game loop\" that will estimate a pose from the videoElem \r\n\r\n     \/\/ every 500 milliseconds  \r\n\r\n        setInterval(async () =&gt; { \r\n\r\n          const pose = await net.estimateSinglePose(videoElem, { \r\n\r\n            flipHorizontal: true \r\n\r\n          }); \r\n\r\n          \/\/ discard the pose if score is less than .3 \r\n\r\n          if (pose.score &lt; 0.3) return; \r\n\r\n          console.log(pose); \r\n\r\n        }, 500); \r\n\r\n      }) \r\n\r\n      .catch(err0r =&gt; { \r\n\r\n        console.error(`error during startVideo: ${err0r.message}`); \r\n\r\n      });<\/pre>\n<div style=\"width: 640px;\" class=\"wp-video\"><!--[if lt IE 9]><script>document.createElement('video');<\/script><![endif]-->\n<video class=\"wp-video-shortcode\" id=\"video-9689-1\" width=\"640\" height=\"360\" preload=\"metadata\" controls=\"controls\"><source type=\"video\/mp4\" src=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/federer.mp4?_=1\" \/><a href=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/federer.mp4\">https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/federer.mp4<\/a><\/video><\/div>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"PoseNET_use_cases_and_alternatives\"><\/span><span style=\"font-weight: 400;\">PoseNET use cases and alternatives<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Generally speaking, moving ML and DL models to the EDGE improve the perceived performance of an application and increases the overall security and privacy as the client-side won\u2019t need to exchange the raw data with the server, but only the representation of this data in the form of weights.<\/span><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In the specific case of PoseNET, <strong>we could use the model to provide an enhanced user experience by adding another level of interaction within the application<\/strong>. As an example, here at UruIT, we created a simple volley game where the user can bounce a virtual ball on the screen using <\/span><a href=\"https:\/\/github.com\/liabru\/matter-js#examples\" class=\"external\" rel=\"nofollow\"><span style=\"font-weight: 400;\">Matter.js<\/span><\/a><span style=\"font-weight: 400;\"> as the physics engine and the webcam stream combined with poseNET\u2019s estimation of the eyes\u2019 key-points.\u00a0<\/span><\/p>\n<div style=\"width: 640px;\" class=\"wp-video\"><video class=\"wp-video-shortcode\" id=\"video-9689-2\" width=\"640\" height=\"360\" preload=\"metadata\" controls=\"controls\"><source type=\"video\/mp4\" src=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/React-App.mp4?_=2\" \/><a href=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/React-App.mp4\">https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/React-App.mp4<\/a><\/video><\/div>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">The same approach could be used in many different ways. An interesting example could be a web application for a sunglasses company, where the user can virtually try on its sunglasses and see how they fit in real-time.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Another app that could be made with the same method is a fitness app that helps you practice yoga by giving you advice on your poses. It could also be gamified by awarding users points when they achieve and maintain a pose for a certain amount of time.\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-9694\" src=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-14-50-54.png\" alt=\"\" width=\"1072\" height=\"607\" srcset=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-14-50-54.png 1072w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-14-50-54-300x170.png 300w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-14-50-54-1024x580.png 1024w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-14-50-54-768x435.png 768w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-14-50-54-750x425.png 750w, https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/Screenshot-from-2019-08-01-14-50-54-20x11.png 20w\" sizes=\"(max-width: 1072px) 100vw, 1072px\" \/><\/p>\n<h3><span class=\"ez-toc-section\" id=\"PoseNET_alternatives_and_more_pre-trained_models\"><\/span><span style=\"font-weight: 400;\">PoseNET alternatives and more pre-trained models\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">\u00a0<\/span><span style=\"font-weight: 400;\">PoseNET is just one of the many deep learning models that are going to be released in the near future.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0<\/span><span style=\"font-weight: 400;\">The following are some of the notable ones already published by Google and by other Open-Source contributors:\u00a0<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">COCO-SSD (image classification) <\/span><a href=\"https:\/\/www.npmjs.com\/package\/@tensorflow-models\/coco-ssd\" class=\"external\" rel=\"nofollow\"><span style=\"font-weight: 400;\">COCO-SSD on npm<\/span><\/a><span style=\"font-weight: 400;\">\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Toxicity (text classification, sentiment analysis) <\/span><a href=\"https:\/\/www.npmjs.com\/package\/@tensorflow-models\/toxicity\" class=\"external\" rel=\"nofollow\"><span style=\"font-weight: 400;\">Toxicity on npm<\/span><\/a><span style=\"font-weight: 400;\">\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Face-api.js (face detection, face recognition) <\/span><a href=\"https:\/\/www.npmjs.com\/package\/face-api.js\" class=\"external\" rel=\"nofollow\"><span style=\"font-weight: 400;\">Face-api.js on npm<\/span><\/a><span style=\"font-weight: 400;\">\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Handtrack.js (hand tracking) <\/span><a href=\"https:\/\/www.npmjs.com\/package\/handtrackjs\" class=\"external\" rel=\"nofollow\"><span style=\"font-weight: 400;\">Handtrack.js on npm<\/span><\/a><span style=\"font-weight: 400;\">\u00a0<\/span><\/li>\n<\/ul>\n<h3><span class=\"ez-toc-section\" id=\"How_does_poseNET_work\"><\/span><span style=\"font-weight: 400;\">How does poseNET work?\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">In this section, we\u2019ll discuss how the poseNET model works under the hood. To keep things brief, we\u2019ll assume that you have basic knowledge of linear algebra and knowledge of how Machine Learning and Deep Learning models are defined. If you need a gentle introduction on ML and DL, you can start by reading our <\/span><a href=\"https:\/\/uruit.com\/blog\/machine-learning-guide\/\"><span style=\"font-weight: 400;\">Ultimate Introduction to Machine Learning<\/span><\/a><span style=\"font-weight: 400;\">.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The poseNET model is based on the <a href=\"https:\/\/arxiv.org\/abs\/1803.08225\" class=\"external\" rel=\"nofollow\">PersonLab paper<\/a> <\/span><span style=\"font-weight: 400;\">and it\u2019s been built on top of the <a href=\"https:\/\/arxiv.org\/abs\/1704.04861\" class=\"external\" rel=\"nofollow\">\u201cMobileNet\u201d architecture<\/a><\/span><span style=\"font-weight: 400;\">\u00a0which is a type of Convolutional Neural Network used in computer vision applications and optimized to run on mobile and embedded systems.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The provided model has been trained and evaluated against the <a href=\"http:\/\/cocodataset.org\" class=\"external\" rel=\"nofollow\">Microsoft COCO dataset<\/a> which<\/span><span style=\"font-weight: 400;\">\u00a0comprises 330k total images and 250k people with key-points.<\/span><span style=\"font-weight: 400;\">\u00a0\u00a0<\/span><\/p>\n<figure id=\"attachment_9697\" aria-describedby=\"caption-attachment-9697\" style=\"width: 3780px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"wp-image-9697 size-full\" src=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/post-davide.jpg\" alt=\"\" width=\"3780\" height=\"3780\" \/><figcaption id=\"caption-attachment-9697\" class=\"wp-caption-text\">https:\/\/arxiv.org\/pdf\/1803.08225.pdf<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">The diagram above describes the components of the model that takes a raw image as input and feeds that into a Convolutional Neural Network (MobileNet or ResNet). The output of the latter is then used to produce the poseNet outputs in the form of heatmaps and offset vectors.\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-9698\" src=\"https:\/\/uruit.com\/blog\/wp-content\/uploads\/2019\/08\/post-davide2.jpg\" alt=\"\" width=\"3780\" height=\"3780\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Heatmaps are represented internally by a 3D tensor of shape (Xres\/outputStride, Yres\/outputStride, Kn) where Xres and Yres are the resolution of the image and Kn is the number of keypoints. In the case of the PoseNET model, the resolution depends on the chosen outputStride that defines the segmentation of the image. Given an image sized 255&#215;255 pixels, an outputStride of 16 and 17 keypoints, the heatmap tensor will be of the shape (15, 15, 17).\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Offset-vectors are used along with heatmaps to predict the exact location of the keypoints. While the PersonLab paper provides a better description of the different offset vectors (short, mid and long-range), in this article, we limit the analysis to the short-range vectors for simplicity\u2019s sake, but <strong>adding more off-set vectors would provide better precision in the pose estimation and Instance segmentation.<\/strong> Short-range offset vectors are represented as 3D tensors of shape (Xres\/outputStride, Yres\/outputStride, Kn*2).\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">After retrieving the Heatmaps and Off-set vectors, we can calculate the keypoints as shown in the following code:\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true \">Scores = heatmap.sigmoid() \r\n\r\nHeatmapPosition = scores.argmax(x, y) \r\n\r\nOffsetVector = [offsets.get(y,x,k), offsets.get(y,x,k*2)] \r\n\r\nKeypointPositions = heatmapPositions * outputStride + offsetVectors \r\n\r\nPoseScore = mean(Scores)<\/pre>\n<h3><span class=\"ez-toc-section\" id=\"Summary\"><\/span><span style=\"font-weight: 400;\">Summary\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">In this article, we saw that implementing a pre-trained model in a web application using TensorFlow.js is an easy task that can be done with just 10 lines of code. This opens up new levels of Human-Computer Interaction (HCI). Also, it brings new tools to UX designers to enhance the overall UX of an application.\u00a0<\/span><\/p>\n<p>Feel free to share this post with your network! We&#8217;re happy to collaborate with other developers and tech teams.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Contents Evolution of Machine Learning, from Server to the EdgeIntroducing Tensorflow.js: the brain for your smart apps\u00a0How to use Tensorflow.js models\u00a0\u00a0Using poseNET model to analyze video streams\u00a0PoseNET use cases and&#8230;<\/p>\n","protected":false},"author":35,"featured_media":9702,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[290],"tags":[],"_links":{"self":[{"href":"https:\/\/uruit.com\/blog\/wp-json\/wp\/v2\/posts\/9689"}],"collection":[{"href":"https:\/\/uruit.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/uruit.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/uruit.com\/blog\/wp-json\/wp\/v2\/users\/35"}],"replies":[{"embeddable":true,"href":"https:\/\/uruit.com\/blog\/wp-json\/wp\/v2\/comments?post=9689"}],"version-history":[{"count":5,"href":"https:\/\/uruit.com\/blog\/wp-json\/wp\/v2\/posts\/9689\/revisions"}],"predecessor-version":[{"id":11215,"href":"https:\/\/uruit.com\/blog\/wp-json\/wp\/v2\/posts\/9689\/revisions\/11215"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/uruit.com\/blog\/wp-json\/wp\/v2\/media\/9702"}],"wp:attachment":[{"href":"https:\/\/uruit.com\/blog\/wp-json\/wp\/v2\/media?parent=9689"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/uruit.com\/blog\/wp-json\/wp\/v2\/categories?post=9689"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/uruit.com\/blog\/wp-json\/wp\/v2\/tags?post=9689"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}