Category

Data Visualizations

Everyone in logistics is talking about IoT. But we need to talk about the DATA.

By | analytics, Cinchapi, Data Visualizations, Database, Natural Language Interface, Real-Time Data, Supply Chain and Logistics | No Comments

These days, it seems like everyone in the supply chain is talking about the Internet of Things – loosely defined as devices that can connect to the internet and which generate data, with the exceptions being computers, and smart devices.

Primarily, we’re talking sensors which monitor “stuff” and which generate data about that stuff.

All well and good, right? But data is a little bit like lumber. Having a pile of it might be nice, but it’s what you DO with the data (or the lumber) that adds value.

Of course, the data in question is typically monitoring stuff and kicking out data in real-time. Multiply the number of devices by the number of assets monitored, and you can see the problem – someone or something is needed to analyze the data. That can be a time-consuming process when done manually, while the tools typically used to monitor data can’t keep up with high-velocity data.

Yeah, that’s a problem.

Let’s look at it from the carrier/3Pl point of view. If you’re moving a load of Angus beef, it sure would be nice to know that the reefer is keeping it at the right temperature. Even better would be to be informed proactively that there might be a problem BEFORE anything gets loaded.

So how can that be done?

The Ideal Solution for Real-Time Logistics Data and Analytics

This is where the Cinchapi Data Platform (CDP) comes to the rescue. Our platform was purpose-built to work with ANY real-time data source, which absolutely includes IoT generated data. We can do this because we use machine learning to make sense of what the data means, while it also identifies patterns, anomalies, and relationships across otherwise disconnected data sources. So, if the refrigeration unit is looking dicey in one truck, it can then easily identify other trucks that have identical configurations so that your maintenance crews can take a look.

We can do this because we use machine learning to make sense of what the data means, while it also identifies patterns, anomalies, and relationships across otherwise disconnected data sources. So, if the refrigeration unit is looking dicey in one truck, it can then identify other trucks that have identical configurations and usage.

Since the platform can be configured to trigger or modify enterprise workflows, so in this example, maintenance crews can be scheduled to check all of the other trucks which could also be close to going belly up.

And here’s the kicker – you don’t have to be a data genius to use this platform. All you have to do is ask questions using everyday English phrases. Really. A user can ask “Are any of my reefers showing problems?”, and they can get real time results displayed as vivid visualizations as along with text-based descriptions.

Because the platform is context-aware, it quickly picks up industry and company jargon. Once told that a reefer is a refrigerated truck, it will always understand what you are referring to. It also understands that “problems” means something that is not right, so it will reveal those anomalous behaviors that warrant action.

There is much more that the Cinchapi Data Platform can do in the logistics and supply chain space. If your company is a 3PL (third party logistics provider), you’re going to get data in who knows how many different kinds of formats. Where one contracted carrier might deliver its data via an api, another might be sending spreadsheets by email, while a third could be relying on a fax machine, which needs to be processed by an OCR solution to then be imported into the 3PL’s systems. That’s a whole lot of manual processes.

With the CDP, all of this data can be streamed, examined and stored. Then that simple interface makes it easy to work with all of that data in a consistent manner.

Still not convinced? Take a look at the 64-second video below so that you can see what we mean. Wincanton is a believer. How about you? Would you like a demonstration? Click here to request a no-obligation live demo.

Cinchapi Releases Beta Version of Data Platform

By | Cinchapi, Data Visualizations, Natural Language Interface, Natural Language Processing, News, Real-Time Data | No Comments

Cinchapi Releases Beta Version of Data Platform Featuring Machine Learning and a Natural Language Interface to Explore Any Data Source in Real-Time

The Cinchapi Data Platform allows data scientists and analysts to dispense with data prep. Makes data exploration and discovery conversational and actionable.

ATLANTA, GA. March 6, 2017 – Delivering on its promise to take enterprise data from cluster to clarity, Atlanta data startup, Cinchapi, today announced the beta launch of its flagship product, the Cinchapi Data Platform (CDP).  

The Cinchapi Data Platform is a real-time data discovery and analytics engine that automatically learns as humans interact with data and automates their workflows on-the-fly. Cinchapi’s data integration pipeline connects to disparate databases, APIs and IoT devices and streams information to the foundational Concourse Database in real time. Data analysts can then use the Impromptu application to perform ad hoc data exploration using a conversational interface.

The CDP’s analytics engine automatically derives additional context from data and presents the most interesting trends through beautiful visualizations that update in real-time. These visualizations can also be “rewound” to show how data looked in the past and evolved over time – even if the data has been deleted. The CDP’s automated machine intelligence empowers data analysts to immediately explore data using natural language and drill down by asking follow-up questions.

Compared to conventional data management, data teams can expect to shave 50% or more from their analytics tasks. Obstinately a data management platform, the CDP is ideal for anyone looking to explore decentralized or disparate data in search of previously hidden relationships. No matter the nature of the data source – be it any combination of unstructured IoT data, industry standard frameworks, proprietary data, or legacy sources – in just a few minutes, interesting relationships, patterns, or anomalies will be exposed.

Just as powerfully, the Cinchapi Data Platform’s underlying database, Concourse, writes and stores definitive data across time. Like a DVR for data, users can “rewind time” to specific points in the past. They can also can press play to watch as vivid visualizations illustrate how these newly discovered insights were created and how they evolved over time.

“From day one, the Cinchapi vision has been to deliver ‘computing without complexity’”, explains Cinchapi CEO and founder, Jeff Nelson. “I’ve worked with data my entire career and have been frustrated by how much of my time has been spent integrating and cleaning up disparate or decentralized data before being able to explore trends or to begin coding. We knew that by leveraging machine learning, the Cinchapi Data Platform would eliminate the drudgery of data prep. It then instantly exposes the most interesting and relevant data to use or to more fully investigate.”

The End of Data Prep and Cleanup

If asked, those who work with data will tell you that the greatest impediment to working with it is that there is too much of it, and that often, the data is messy. In other words, before an analyst can get insights from data, she has to sift through all of the data to see what she has. She has to determine what data is relevant to the task at hand, and then see how that might relate to other data points.  This data prep and cleanup process can add weeks or months to a project.

As Big Data grows ever larger with data generated by the Internet of Things, it’s a problem which will only increase in scale and complexity. By 2020, BusinessInsider.com predicts that 24 billion IoT devices will be connected to the internet. That works out to about three IoT devices for every person on the planet.  Each of these devices will be generating “messy data”, as there is no standard for what IoT data should look like.

To solve this growing problem, the Cinchapi Data Platform uses machine intelligence to comprehend data, regardless of the source or the schema.  It then looks for relationships, patterns, or anomalies found between otherwise decentralized, disparate, data stores. The CDP was also purpose-built to not impose, nor to rely upon any specific data schema.

This makes the CDP the ideal platform when working with data sources which lack a coherent structure, like IoT data or undocumented legacy or proprietary data. Of course, the Cinchapi Data Platform can also work with industry standard databases like SQL, noSQL, and Oracle.

A Simple, Three Step Workflow

The Cinchapi Data Platform workflow consists of three simple steps: Ask, See, and Act.

Step One, ASK: Once connected to the desired data sources, the first step is to simply ask a question using common English phrases. There is no need to master cryptic data queries in an effort to “solve for x”. Instead, users can ask a question using everyday, conversational phrases. Should the user need a more specific answer, all that she needs to do is ask a follow-up question. With use, the CDP’s machine learning allows the platform to better understand the context of the question asked, further enhancing the user experience.

Step Two, SEE: After questions are asked, next comes the results. Built into the CDP is a powerful analytics engine which provides hidden insights and customized visualizations. This allows users to see relationships and connections which were previously obscured. Even better, with these new relationships now exposed, users can “rewind time” to see how these relationships have evolved and impacted operations in the past.

Step Three, ACT: With the results available, users can then act on the information presented. A data analyst can automate actions with just a few button clicks. A logistics company might find enhanced efficiencies in route planning which could be shared to the fleet in real-time. A CSO in a bank might use its automation capabilities to trigger alerts to a security team when potentially fraudulent activities are detected. Frankly, the possible use cases are endless.

CDP features include:

  • Concourse Database – An enterprise edition of the Cinchapi’s open source database warehouse for transactions, search and analytics across time. This is where streamed data is stored.
  • Sponge – A real-time change data capture and integration service for disparate data sources.
  • Impromptu – A real-time ad-hoc analytics engine that use machine intelligence for workflow automation.

About Cinchapi, Inc.

Atlanta-based Cinchapi is transforming how data scientists, analysts, and developers explore and work with data. The Cinchapi Data Platform (CDP) and its Ask, See, and Act workflow was purpose-built to simplify data preparation, exploration, and development. Its natural language interface combined with machine learning and an analytics engine make working with data conversational, efficient, and intuitive. Imposing no schema requirements, the CDP streams, comprehends, and stores definitive data generated in real-time by IoT devices as well as conventional, legacy, and proprietary databases. Learn more about the Cinchapi Data Platform and its #AskSeeAct workflow at https://Cinchapi.com/

###

Rewind Time with the Cinchapi Data Platform

By | Cinchapi, Concoursedb, Data Visualizations, Database, Real-Time Data, Strongly Consistent | No Comments

Love it or hate it, the singer Cher had a hit single with her 1989 song “If I Could Turn Back Time”. While the song may now be stuck in your head, the truth is that developers who work with data now have the ability to rewind time, at least from a data perspective.

The Cinchapi Data Platform (CDP) allows developers to stream and store decentralized or disparate data from any connected data source. The foundation of the CDP is the open source Concourse Database, created and maintained by Cinchapi.  Since Concourse is a strongly consistent database, it stores definitive data values from connected data sources.

With versioning included, even if the original source data has been overwritten, lost, or changed, developers and analysts will always have the ability to go back to any point in time to see what the values were at a specific moment in time.

The Benefit of Traveling Back in Time

Data is fast, and data is often messy. By that we mean that data points change and evolve from moment to moment. What was true a minute ago may no longer be true now. Worse, typically data is siloed, so it becomes increasingly difficult to see relationships between decentralized data sources.

In other words, organizations have an enormous amount of data which is constantly morphing in real time, and the sources of the data are not connected to each other. That makes finding relationships between data sets a tedious and time consuming task. Dependant upon the data, we could be talking weeks or even months of data prep and cleanup just to see what is relevant, and how the data sets relate to each other.

By leveraging the power of machine learning, the CDP can make short work of understanding what your data means, and it can uncover interesting relationships between otherwise siloed data.

That’s pretty cool, but it gets even better.  With these previously hidden relationships now exposed, the data developer, analyst, or scientist can now explore aspects of the relationship at any point in time.

Think of this as like a DVR for data. Sports fans will often rewind a play to see it again – they want to see how the play developed, who did what right, and who did what wrong to lead to a score or a loss of possession.

Similarly, the Cinchapi Data Platform allows users to rewind data, “press play” and then watch as that data evolves to its current state. Just like a DVR, users can slow things down, fast forward, or pause at specific points in time.

This could prove valuable for a vast array of use cases. Banks and credit card issuers might use this to detect credit card fraud, and to prevent future fraud. A retailer might use it to better understand why demand for specific products rise and fall. A logistics company might use this to determine more efficient transportation routes and methods.

The Visualization Engine

Out of the box, the CDP lets a developer see relationships between her connected data sources. It doesn’t matter what the schema or the source of that data may be, because the platform doesn’t impose any schema on her. She can work with financial data, IoT generated data, data from operations and logistics, or virtually any source to which she has access to via a direct connection or an API.

Good stuff to be sure, but looking at a glorified spreadsheet with values changing over time can be a little off-putting. This is why a powerful visualization engine is included as a core component of the CDP.

Visualizations help people to see the relationships in data. But as we mentioned earlier, typically the data in one data source is independent of other sources. Vendor data might be in one silo, customer data in another, with operations and logistics in still another silo.

Factor in social media data, news events, and a host of other data and the list of potential data silos can be mind boggling as the size and scope of a business grows. Yet as the amount of data grows, it becomes an increasing critical to see the very relationships which could be impacting productivity, sales, operations, and much more.

It’s not just the positive things that can impact a business. We’ve all heard stories of retailers and other businesses which found out well after the fact that they had been hacked, or that fraud has occurred.

This doesn’t just hurt the bottom line, it can also have a profoundly negative effect on the reputation of a business. When retailers like Target or restaurant chains like Wendy’s had customer information stolen, how much potential business did they also lose because customers were fearful of of their information also being exposed?

It’s impossible to put a specific dollar value on bad publicity, but we will suggest that there is a significant cost factor when customers shy away from a company because they fear becoming the next victim.

Data is big, and it’s only getting bigger. It’s also increasingly messy in that not all data is relevant to a specific problem or opportunity. Having the ability to uncover relationships that were hidden is compelling enough.  But being able to rewind the data and see how these relationships looked in their nascent stage can benefit anyone with an interest in data forensics.

Cher probably wasn’t thinking about data when she wondered what would change if she could turn back time. But with the Cinchapi Data Platform, anyone working with data can turn back the calendar to see when and how data relationships were established, and how they then changed and morphed over time.

Cinchapi Data Platform Recommendation system design

Building a Recommendation System for Data Visualizations

By | Cinchapi, Concoursedb, Data Visualizations, Database, Real-Time Data | No Comments

This past year, I’ve been working as a software engineer at Cinchapi, a technology startup based in Atlanta. The company’s flagship product is the Cinchapi Development Platform (CDP). The CDP is a platform for gleaning insights from data, through real-time analytics, natural language querying, and machine learning.

One of the more compelling aspects of the platform is to provide data visualizations out of the box. The visualization engine is where I have focused my energies by developing a recommendation system for visualizations.

The Motivation

With so much data being generated by smart devices and the Internet of Things (IoT), it’s increasingly difficult to see and understand relationships and correlations from these disparate data sources – especially in real-time. At the same time, collecting insufficient amounts of data may lead you to miss out on important problems that you’d miss otherwise.

This is where the power of data visualization comes into play. On the surface, it’s a simple transformation that converts raw, unintelligible data into actionable, intuitive insights. Simple, of course, is relative to the eye of the beholder.

Data Visualization

Maybe not the best example

After all, there are an abundance of plots and graphs and charts and figures out there, each of which is suited for a particular kind of dataset. Do you have some categorical data indexed by frequency? A bar chart might be the best method to visualize it. However, bivariate numerical data abiding by a non-functional relationship might best be seen as a scatter plot.

That pretty much outlines the problem – how can you get a visualization engine to determine what type of visualization is appropriate for a given set of data?  That’s what I needed to determine, and I thought the process of getting there would make for an interesting article.

Understanding the Problem

The point of all of this is to help users understand better understand what their data means and to do so with visualizations. I knew that I needed a recommendation system – something that would offer up visualizations which would best show that the data really means.  Recommendation systems are a highly researched and published topic, and have seen widespread implementation.  Consumers see examples of recommendation systems in products from companies like as Google, Netflix, Amazon, Spotify, and Apple.

These companies implement their systems to solve the generalized problem of recommending something (whatever it may be) to the user. If this sounds ambiguous, it’s because it is. The specifics of a recommendation system often rely on the problem being solved, and differ from one use case to the next. Netflix, as an example, would be recommending movies which might appeal to the user. Amazon may do that as well, but they would also recommend other products related to the movie.  A baseball might be displayed when looking at the movie, “A Field of Dreams”, as an example.

Some recommendations are dynamic while others are static recommendations. One is not necessarily better than the other, but it is useful to understand what sets them apart.

Dynamic Recommendation

Google search uses a Dynamic Recommendation system, as do Netflix, Amazon, and Spotify. These systems collect data generated by a user as they search for items or when they make a purchase. Essentially these companies are building profiles of each user. The profiles factor in prior transactions and behavior of the user and become more refined over time and usage.  These profiles can then be compared to similar profiles of other users, which allows for recommendations which are increasingly relevant.

For example, recently I was researching Apache Spark on Google.  As I began to type the letters ho’ Google’s search auto-completion feature provided relevant phrases which begin with the letters “ho”:

Search Recommendations

Google search: recommendations based on a user’s profile and history

As you likely know, Hortonworks is a company focusing on the development of other Apache platforms, such as Hadoop. Google understands the topic I’m likely interested in via my search history, and from that it offers up relevant search options related to my prior search on Apache Spark.

Following that search, I later decided to look up a recipe for Eggs Benedict. Next, I typed the same ‘ho’ letters. Now, based upon that earlier search for Eggs Benedict, Google’s auto=completion offered new suggestions to complete my sentence:

Contextual recommendations

Contextual recommendations

Google’s system is dynamic in the sense that the user’s profile is evolving as they continue to use the product. Therefore, the recommendation evolves to suit the newest relevant information.

Static Recommendations

On the other hand, the systems employed by Apple’s predictive text can be described as largely static recommendations. Apple’s system can process user behavior and history, however they do not use these (to a large extent) to influence their recommendations.

For example, observe the following stream of messages and the Predictive Text output:

Trying to get Siri’s attention

Trying to get Siri’s attention

Unlike the example from Google search earlier, it seems as if Apple’s iOS Predictive Text does not completely base recommendations on user history. I say “completely”, because Predictive Text actually suggested ‘Siri’ after I had typed ‘Hi Siri’ twice, but then it reverted to a generic array of predictions after I sent the third request.

It is extremely important to note here that Predictive Text is in no way worse than Google’s search suggestions. They are both trying to solve completely different problems.

Google Search

What Google Search is offering is a way to improve search experience for users by opening them to new, yet related, options. After looking up that recipe to Eggs Benedict, I was presented with recipes for home fries, poached eggs, hollandaise sauce, and more. This kind of system, building on the user’s cues and profile, makes perfect sense.

Predictive Text

The goal of Predictive Text is to provide rapid, relevant, and coherent sentence construction. Many individuals use abbreviations, slang, improper grammar, and unknown words when texting. To train a system to propagate language like that would lead to a broken system.

The user can be unreliable – they might enter “soz” instead of the proper “sorry”. We wouldn’t want a predictive text system to mimic these bad habits. Instead the predictive text algorithm should offer properly spelled options and it should employ proper grammar when it predicatively completes phrases.

The User’s Behavior Can Be Misleading

For the sake of this blog, imagine a user who has been creating pie charts with her data. Time and time again, she visualizes her data with pie charts.  Does that mean that our visualization engine should always present her with visualizations as pie charts?  Absolutely not.  What our user needs is an engine which will examine her data, and then suggest the best method to visualize the data, regardless of past behavior.

Just because someone has used pie charts for earlier sets of data, it would not follow that they should always use pie charts for any and all data sets.

In other words, the past behavior of the user and her apparent love of pie charts should not be the determining factor as to what type of visualization should be used. Instead, we’ll use static recommendations based upon the data in question, and then employ the best visualization to present that data.

The Item-User Feature Matrix

It’s a mouthful, but it’s an important concept. Let’s back up a bit.

As mentioned earlier, a common way to produce recommendations is to compare the tastes of one user to other users. Let’s say User Allison is most similar to User Zubin. The system will then determine the items that Zubin liked the most which Allison has yet to see.  The system would then recommend those. The issue with this approach for our use case is that there is no community of users from which profiles can be compared.

Alternatively, recommendations can be made on the basis of comparisons between items themselves. Let’s say Allison loves a specific item, in this case, she loves peaches. Along with other fruits, peaches are given its own profile, through which it is quantifiably characterized across several ‘features’. These features could include taste, sweetness, skin type, nutrition facts and the like.

As far as fruits are concerned, nectarines are similar to peaches. The most significant difference being the skin type – peaches have fuzz, while nectarines have a smooth skin, devoid of any fuzz. Since Allison likes peaches, she would probably like nectarines as well. Therefore the system would display nectarines to Allison.

Recommendations of this type work for more than fruit. Think about movies, as an example. While most people enjoy a good movie, “good” is relative to the viewer. Someone who love “Star Wars” will likely enjoy “Star Trek”. But they may not like the film, “A Star is Born”. So, how would the system base its movie suggestions? The word “star” helps, but it isn’t enough.

Enter the Matrix

Example of an Item Feature Matrix

Example of an Item Feature Matrix

The figure above is called an item feature matrix, in which each item offered is characterized along several different features. This is closer to what we want, but it’s not still perfect. We can’t base our recommendations on what the user likes, since the user may not be right. We must incorporate another dimension.

Example of an User Feature Matrix

Example of an User Feature Matrix

The above matrix is called a user feature matrix, as it depicts the preferences of each user along the same features as the items.

Combining the two concepts, we have two matrices, one for characterizing the user and one for characterizing the items. When combined, these are considered the item-user feature matrix.

At Cinchapi, where I work, we don’t characterize the user’s preferences, but we do leverage their data within the ConcourseDB database. Further, we don’t characterize by the number of characters, action scenes, length, and rating, but a series of data characteristics relating to data types, variable types, uniqueness, and more.

This provides a framework to quantifiably determine the similarity between the user’s data and possible visualizations. This is aspect of the Cinchapi Data Platform which we call the DataCharacterizer.  As the name implies, it serves to define the user’s data across some set of characteristics. But how do we characterize the items which in the CDP’s case are the actual visualizations?  We do so by employing a heuristic.

Heuristics

Considering the case of Predictive Text, there is some core ‘rulebook’ from which recommendations originate. For a language predictor in general, this may be in the form of an expression graph or a Markov model. When the vertices are words, a connection then represents a logical next word in a sentence, and each edge is weighted by a certain probability or likelihood.

Expression graph

Example of an Expression Graph

This could explain why repeatedly tapping one of the three Predictive Text suggestions on an iOS device produces something like this as a result of a cycle in the graph:

Nonsense-cycle

Nonsense-cycle from Predictive Text Suggestions

That word salad isn’t really going to do much for us, even if it is possible to read it. Moving to our need – a visualization engine – we’re not looking to complete a sentence.  There is no visualization ‘rulebook’ with which a model can be trained upon, at least not of a size or magnitude that would produce meaningful results.

This is where the heuristic process comes into action. Loosely defined, a heuristic is an approximation. More formally, it is an algorithm designed to find an approximate solution when an exact solution cannot be found.

This formed the basis of my recommendation system, and resolved the problem of having incomplete or unreliable data from which to learn. I developed a table, where the rows represented the same features as in the matrices above, and the columns represented different visualizations. Each visualization was then characterized based on the types of data that it would best represent.

Presently we call this aspect of the Cinchapi Data Platform a HeuristicTable.  For each potential visualization, the HeuristicTable holds pre-defined, static characterizations across the same set of characteristics as the user’s data.

Putting the Pieces Together

Much of the system is comprised of these components. I’m only providing a 30,000 foot view of the DataCharacterizer.  In short, it measures a series of characteristics of the user’s data, namely the percentage of Strings, Numbers, and Booleans.  It also factors in whether or not there are linkages between entries, whether or not the data is bijective, the uniqueness of values, and the number of unique values (dichtomous, nominal, or continuous).

Treating a particular characterization as a vector, a cosine similarity function is executed on the user’s data and each column of the HeuristicTable.  This in turn  measures the similarity between two vectors on a scale from zero to one.

From this point, it’s a matter of sorting the results in descending order of similarity and the recommendation set is ready.

Below is an overview of the system’s design:

Cinchapi Data Platform Recommendation system design

Cinchapi Data Platform Visualization Recommendation System

Closing Thoughts

Recommendation systems come in all shapes and sizes. Although the problems seem similar from a 30,000 feet view, each use case requires a unique solution to propose the best experience for users.

This was how I built a recommendation system for visualizations from unreliable data, and I hope it inspires some new ideas.

To see an example of how Cinchapi’s visualizations from data actually work, there is a 60 Second video which shows how visualizations can uncover relationships.