Presented by:
Selecting the right technology is an important part of any information initiative. In this talk, Andrea Zachary and Louis Spinelli provide four major areas of consideration for selecting and integrating semantic tools into your enterprise.
Duration
Language
Transcript
[Louis Spinelli] Hi, I hope everybody is enjoying Information Architecture Day 2021. My name is Louis Spinelli. I’m an information architect and user researcher based in Seattle.
[Andrea Zachary] Hello, my name is Andrea Zachary. I’m also an information architect. I’ve been in IA for about 15 years and, for the past five, I’ve been focusing on data and information modeling. Louis and I have both worked as data and information modeling consultants in fields such as retail, manufacturing, and finance. As consultants, we are tool-agnostic, and this presentation covers the types of questions you and your team might consider when purchasing semantic tools. In this presentation, we assume general familiarity with semantic concepts but if you do have questions we'll have a Q&A at the end.
[Louis] Great. Why might your organization be considering a new semantic technology? It could be because you're starting out an initiative focused on knowledge graphs. It could be that you're going to adopt some new artificial intelligence. Document classification, enterprise search, content management are all really common reasons why people are shopping around for semantic technologies. GDPR compliance is a new one that's popped up the last few years. One thing that all of these initiatives share is that for an organization to fully benefit from them, they need to have some fundamentals in place.
[Andrea] Right, and, as Louis said, the high-level initiatives might change, but those IA fundamentals you see in the right-hand column are basic. They stay the same and can be applied to any tool.
Once we get into an organization and start trying to answer questions that Louis had, like GDPR compliance or are we AI-ready, we start looking at the data and the information architecture environment, and what we typically see is that there's no single, complete, coherent enterprise-wide view of that data. There's always data silos, there's always different data repositories, and each data repository has some information about a lot of the same things. So if we can shake those different data silos out, we see that the data is about product, customers, employees, stores, and locations. In enterprises you need to get all of that data together in order to understand what those concepts mean.
So what if we could extract all of these concepts and aggregate that into a semantic layer? This might be what it looks like. In that model in the semantic layer is where we create definitions and attributes for those concepts. We can connect them with relationships, but that depends on the tool, which we'll talk about later. Then that semantic layer---that points to where the different pieces of data about those concepts can be found. Basically, semantics means that meaning is captured in a separate area from the data and the other files.
[Louis] I think the big story here, too, is that, you know, the overall data quality within an organization is improved. When you have a situation like this, the product team is using the same concept as the marketing team, the sales team, operations, all across the organization. And, it's less work for each individual team because they're able to go and reuse work that's already been developed. So, each team is not managing their own separate definitions. They're not having to pull in attributes. They can go to a central place and find what they're looking for.
One thing that I hadn't expected, but I've heard from a lot of business analysts, is that the ability to tell stories from a business analyst’s perspective is really improved when they know that the concepts are shared by different teams and they're not having to tape together and do a lot of transforms, or pulling things out of different repositories and duct taping things together.
Now, I know this is kind of abstract, so let's think about this business model from the bottom up. Andrea and I have been talking about starting an information architecture store where we'd sell taxonomies off the shelf. Our first store would be located in Pike Place Market in the town of Seattle. Our first customers would be Amelia and Benjamin. Now, what would this look like as a semantic model?
[Andrea] What we can do is put a model in WebProtégé that might look something like this and basically what it does is even make the concrete basic business model a little more abstract. So we've taken Amelia and Benjamin and created a type or a class for them called “customer.” Then the same thing for our stores and location. Then we've also connected those concepts with two simple relationships. That's basically what it would look like as an ontology.
[Louis] Nice. Well, let's take a look at this in even a more abstract form. We still have those same three concepts---“location”, “store”, and “customer”; and our same relationships---“location” and “shopsAt”.
Let's add a couple more concepts and a few more relationships. Now we've added “employee” and “product” and the relationships of “worksAt”, “stocks”, and “buys”, so we can already answer a new question. If we want to know what employees work at a specific location, like Seattle, we could say, “What employees work at stores that are located in Seattle?” That seems pretty intuitive.
[Andrea] Yes, and this model, it shows you exactly those five concepts that we talked about earlier. But one of the benefits from using semantic models is being able to connect it to external data. So it would be easy for us to integrate an external data source called GeoNames and we could take data from our locations and integrate that into our semantic model.
Or, if there's even something completely unexpected like a global pandemic or COVID-19, we can add that concept and make a relationship between here’s how COVID affects the general locations. This should show you how easy it can be to grab external data and to pivot to meet unexpected or new environmental conditions.
[Louis] The flexibility that these types of models enable is so important in the modern world. Going back to that same question, “What employees work at a location?” Just by adding one relationship and one concept of COVID-19 phase information---and maybe even a couple attributes like store capacity---we could say, “What employees work at a store at a specific capacity level that are located somewhere under a COVID-19 phase?” That would be important for getting PPE [personal protective equipment] to the right employees in a place.
Now that we've kind of gone over the basics of modeling, let's jump into the technology and talk about those considerations.
[Andrea] Sure. For an organization to develop, maintain, and actually benefit from using a semantic model, it has to be integrated both with people---the folks who are going to use it to input and extract data from that [model]---and the tools that you're going to use to make that happen. In general, we're just going to talk about those kind of things. We're not going to cover a legion of other considerations, including documentation, costs, security, etc., but if you have questions, try to ask those in the Q&A, but we're going to keep it tightly scoped.
[Louis] Great. We're going to use three example technologies to illustrate our points throughout the presentation.
One of them is going to be WebProtégé. WebProtégé is a collaboration ontology development tool primarily intended for ontology editing and viewing.
The second one, PoolParty, is a semantic middleware platform that supports building and managing enterprise knowledge graphs in a number of different use cases.
The third [Neo4j] is a graph database known as a labeled property graph. These are very different technologies and we've chosen them to illustrate our points---and the variety in the field of what’s available.
[Andrea] What capabilities do you need to consider when looking at tools? In general, we identified four different areas. Basically, you need to think about:
[1] The semantic modeling tool---where are you going to manage your model? Where you're going to put it.
[2] What standards does your business use case require?
[3] Also, how are you going to support interaction between the model and the different groups of users, whether or not those are IAs [information architects] or data consumers. And then, of course,
[4] Technical integration with the systems as well.
[Louis] So jumping into semantic modeling... We have four bullets to run through.
[1] Collaboration ability---More and more people expect to be able to collaborate on a document, on a spreadsheet, and no differently on a model. That collaboration ability and what a tool does to support it is important.
[2] The skill set of a modeler---if you look within your organization, you might have modelers with a specific skill set. You might have modelers that would require a certain level of training depending on the tool that you are adopting.
[3] Workflow support for governance and data stewardship---When you're developing a model, people within the organization are going to want to submit different ideas, new concepts, request relationships, etc. Even tools that have something simple, like a queue for looking at those [new requests for] relationships or those new concepts is really helpful.
[4] Lastly, we have modeling framework.
Let's just go ahead and jump into WebProtégé and talk about that within the tool. So here we are in WebProtégé and we can see it has a tabbed interface. The three tabs we're going to look at today are classes, properties, and individuals. Classes are what we were just talking about with customer, location, and store.
Properties are broken down into two areas---relationships and attributes. The relationships we've included are “locatedIn” and “shopsAt.” Attributes we've added are “capacity”, “customer type”, and “population”. There's the ultimate flexibility, right? Whatever your company is developing, you can model it in these tools.
Lastly, we have individuals. Individuals are what we're applying our model to. Let's take a closer look at the customer type individuals. We have 10 customers. These 10 customers, these names might look familiar if you have a young child. They're the 10 most popular baby names in Washington state in 2019.
[Andrea] These are the customers that we're focusing on, because who else besides babies and ontologists are obsessed with meaning?
[Louis] I hope they have a little bit of money to support our stores but maybe they'll ask their parents.
[Andrea] We have to start young.
[Louis] So when we're building our model, here's an interface where we can start adding information to Amelia, the individual. We could say Amilia is a customer type. She's a new customer, very young. She shops at Pike Place Market store.
We can also see a bit of the collaborative ability in WebProtégé where I've made a comment and then Andrea has made a comment too about this being an example we’ll use.
Now jumping out of WebProtégé and into PoolParty. I've taken a couple screen captures. In PoolParty, we can still model our classes, our relationships, and our attributes. They also have another thing which they call custom schemes. Custom schemes are a way that we can subset out our larger model into smaller subsets based on different use cases that we want to work with.
Today, I've broken out a smaller use case for our presentation, which includes three classes of customer, location, and store. Those relationships that we've been talking about and these attributes. I found custom schemes to be really helpful in large organizations when they have a big master ontology and then they want to have different use cases supported by different subsets of the model.
[Andrea] Another consideration that you should think about is standards. Standards are the key to interoperability. This prevents you from being locked into one tool or one vendor. The standards are going to affect what kind of tool that you're going to choose.
For example, if you need to model in OWL, you might select Protégé. If you need [to model in] SKOS, you might select another tool, like Smartlogic.
Regardless, you should definitely be aware of the existing W3C standards. These are the bedrock and the foundation of the semantic stack, so there's no reason to reinvent the wheel. Using these will guarantee that you can import and export your model from one tool to another.
[Louis] Now, support for model users is another big one. The first bullet here:
[1] Technology that supports using a semantic model. This might seem like table stakes, but actually tools like WebProtégé are excellent for building a model but it's not a graph database that's designed to support interaction with that model and support those use cases. Tools like PoolParty include both a modeling interface and a graph database embedded in the tool.
[2] The next bullet here, support for direct interaction with a model sounds a little bit abstract but that's when you're developing a model, how can you share it with business users within your organization? How can you show it to people to socialize it? That's going to be really important. We're going to look at a couple different examples of that in tools.
[3] Workflow---workflow support for use cases. If your organization has invested all of this money into developing a model, how are you going to use it? How are you going to support those use cases that you're developing it for?
Let's go ahead and jump back into WebProtégé. Here you can see where we've been developing the Amilia individual. Let's go ahead and look at what that looks like from a business use case.
WebProtégé has designed this great visualization feature. Now if you're sharing your model around, you can show people what it looks like. Amelia shops at Pike Place Market and which is located in Seattle. You can also see that Seattle's a location class. Pike Place Market is a store and Amelia is a customer.
This might not seem big but this [ontology model] is a lot harder to show people and impress them than this [graph visualization]. This is a lot more intuitive for a lot of people that are visual thinkers.
Now, we've included Neo4j as an example, and you might be wondering how will a database support direct interaction with a model. That might seem a little bit strange but Neo4j has developed this great application Bloom to support that direct interaction.
We're going to jump into our model in Neo4j and we're going to query---a no code query---where we can say, “Let's look at the customers that shop at a store,” and let's hit enter here and see what that looks like. It pops right up and I can attest that Neo4j is super fast.
Here we can see that Noah, Charlotte, and Emma shop at a Spokane Valley Mall location of our store. Let's actually go see if Spokane has more stores than just that. Let's see... Stores that are located in the location and hit enter. Now we can see that we've added a different node in a relationship. Let's zoom in a little bit and we can see that Spokane has two stores---the Spokane Valley store and a Riverside store. I think Riverside's downtown. Let's just click in there and we can see it has a capacity of 50. A Spokane Valley store has a capacity of 100 so it's a bit bigger but it's not downtown so that probably makes sense for us. When people have that taxonomy rush, they want to come in and buy all the stuff. Even though Neo4j is a graph database it supports direct interaction with the model in a really interesting and useful way.
Jumping back into our deck now… We talked a little bit about supporting use cases. Once your organization has done all of this work developing a model, how are they going to use it? PoolParty has developed support for a number of different use cases, document classification being one of them. If you want to classify documents, first, you're going to need to have a model of your different classifications for what you'd want to apply to a model. You'd also want to have an algorithm that could look at those documents and say, “This is this type or that type of document,” and apply those tags.
To be really successful, you're going to need to integrate your content management system with your whatever-system is housing your algorithm and your model and PoolParty provides that connectivity. Where as an information architect, Andrea or I could go in and set up some training sets, set up our algorithm, connect our content to our model, and apply those tags in an automated fashion.
When you're looking at tools this is one of those things---if you want to start benefiting from them sooner than later---start looking at how the different use cases are supported at that high level. This also gets a little bit into technical integration. Integration with specific systems is important so it could be a content management system. It could even be something like PowerBI and Tableau, when we're talking about use cases, if it's an analytics project.
The second bullet here, integration with business processes. That could be something like Slack or JIRA. We want to have a way that people using the model could submit different concepts that they would want managed within the model and export flexibility. That really goes back to what Andrea was talking about with standards. If you want to use it to develop a model in a certain tool, you want to look at how that tool can export it and use it in other systems. You can develop a model in SKOS and OWL in PoolParty and you can also export it into a different format like the Neo4j format and use it there. That's something to really keep a close eye on when you're when you're using these tools.
Andrea, we've talked a little bit about adding a semantic layer. Can you talk us through what this would look like with a semantic technology in place?
[Andrea] Absolutely. If you remember our concept of location, let's take a specific instance of that. Specifically, how about Washington state? That's just an abstract concept---it’s a perimeter line on a geographical terrain; it's an abstract concept. It has a name.
We can ingest external data from other places, such as ISO standards, W3C standards. We can pull additional metadata or data from Wikipedia. DBPedia is the database behind that. Or from another external ontology, GeoNames. We can take existing data. We can integrate that into our semantic model of Washington that you see there. We have that model in the SKOS format and all of that is living in a semantic layer.
Those data repositories down there, they can search and extract information that they require from that semantic layer. Another thing you might see is that in the semantic layer, it's focused on meaning. If you check that last box [i.e., scope note], you can see that the concept for Washington state, even though they still have the same label, it's not the same meaning as Washington, D.C. so the meaning for those two concepts are separate even though they're called the same thing. Those downstream systems if they want information about Washington state they're able to extract that and not get the information from Washington D.C.
[Louis] I love this. I love the idea that now you can have a technology that supports product [teams or repositories] using one term to describe the concept and operations [teams or repositories] using a different term to describe the same concept. It makes intuitive sense for me and seeing it work within organizations really is an amazing thing.
So we've reviewed these four different capabilities today too. These are really important for semantic technologies. These aren't the only considerations for technologies like Andrea mentioned, cost documentation, what technologies are already in use within an organization are really important. But those are important for every technology so we didn't include them in our talk today.
I know that this is definitely kind of an abstract area for a lot of people. When I think about this stuff, I think about moving companies. I think everybody here has moved one time in their life or another. If you're a moving company and you're moving apartments, people that live in apartments all of the time and you're working in downtown Seattle, you might just have a fleet of 12-foot trucks that can go down alleys and go around dumpsters and pull right into loading docks. They can fit everything that they need to fit from somebody's apartment and get them where they need to go.
However, if you're in Issaquah, and you're moving people with four- to five-bedroom houses, you might want 26-foot trucks. Nobody you know looks at a moving company in downtown Seattle and says, “Hey I think your trucks are too small.” Our trucks are 26-foot trucks, right?
Technology is no different here. You really want to be looking at the capabilities that your organization needs to meet your goals and to fit with your use cases and select those right technologies. If you're a large organization, you might need to have a couple different technologies to meet your needs. Andrea, do you have any last thoughts?
[Andrea] Yes. I really like that truck moving metaphor because I think about information and moving it in a container and it depends on the size and what needs you might have. And, as you pointed out, the semantic modeling tools have improved. W3C standards have been codified and database technology is finally catching up with graph theory.
It reminds me that 20 years ago, Tim Berners-Lee published the original article, “The Semantic Web,” in the Scientific American---those things have changed over 20 years but the approach to information architecture hasn't. That's typically what we find where data modeling or knowledge graph integration starts to get stymied when the approach to information architecture isn't consistent.
So we hope that you can use this information to help you ask the right questions and determine the right semantic tools for you and your use cases.
[Louis] I love that as a last thought. Questions?