Monthly Archives: June 2010

SPML and DSML search filters not so hard

One issue that has been raised in regards to SPML is search filters. SPML allows searches that optionally specify a starting point (in terms of an SPML container), a subset of data to return, and a search filter. In the DSML Profile, the search filter is naturally a DSML filter.

DSML filters can be arbitrarily complex, just like the LDAP filters they model. For instances a DSML filter could be something like “get everyone with the last name of smith”, or in DSML:

<filter xmlns=’urn:oasis:names:tc:DSML:2:0:core’>

<substrings name=’Name’><final>Smith</final></substrings>

</filter>

Or it could be “get everyone with last name smith not located in Orlando”:

<filter xmlns=’urn:oasis:names:tc:DSML:2:0:core’>

<and>

<substrings name=’Name’><final>Smith</final></substrings>

<not>

<substrings name=’Office’>

<final>Orlando</final>

</substrings>

</not>

</and>

</filter>

Now if your back end data is stored in LDAP, then this is pretty easy to handle. Just convert to an LDAP filter and do any attribute name mappings required. If you backend data is SQL, it just slightly more difficult to translate the DSML filter into a SQL query clause.

But what if your back end data store doesn’t support a query mechanism? What if the data is in a flat file, or a NOSQL DB? What if the data is only accessible through an API that doesn’t allow for filtering?

There are several ways to solve that problem, but the easiest is to recursively walk the DSML filter and create a decision tree where each node determines if a given instance passes the part of the filter it knows.  The code for this is pretty simple in .NET and I posted an example here. Note that this example is just a partial implementation of the SPML search request for the purposes of demonstrating this concept. It is not a full featured implementation of SPML.

The basic idea is that an abstract data provider would return a dictionary of the attribute values for each entry in the data. The interface could look like (in C#):

public interface IUnfilteredDataProvider {

List <DSMLData> DoUnfilteredSearch();

}

In this example the sample data provider reads entries from a flat file. On each search request the filter is recursively read and turned into nodes in a decision tree. Each data entry is then passed to the decision tree and if it passes the filter it is appended to the returned results:

List<DSMLData> dataList = this._unfilteredProvider.DoUnfilteredSearch();

DataFilter df = GetDataFilter(searchRequest);

List<PSOType> returnPSOs = new List<PSOType>();

// return only those entries that pass the filter

foreach (DSMLData data in dataList) {

if (df.Pass(data)) {

returnPSOs.Add(data.GetPSO());

}

}

The GetDataFilter method walks the DSML filter and constructs a decision tree (feel free to download the sample and look at the code for more details). No special meaning given to any of the attributes returned by the provider. They are all just treated as DSML attributes. Of course you will note a potential scalability issue with large data sets, but there are several tricks that can be used to minimize that (thoughts for a later post).

Oh, and this approach works great for creating a DSML service as well and the general concept would be just as easy to implement in Java.

So what does all this mean? Supporting filtered searches in an SPML or DSML service is really not that hard, even if your data is stored in a data store that does not support filtering.

Advertisements

Federated Provisioning

Nishant Kaushik has a great (and funny) slide deck on federated provisioning on his blog. He discusses some distinctions between two flavors of federated provisioning, the Just-in-time (JIT) and what he terms advanced provisioning (often referred to as bulk provisioning).

I would like to clarify a couple of points in his presentation, however. He talks about a possible SAML profile of SPML for JIT provisioning. There was already an effort (which I lead) to define a SAML profile of SPML in Project Liberty (most of the work has already been done if anyone wants to revive it). But this was not for JIT provisioning as there is really no need for SPML when doing JIT provisioning. JIT provisioning can be done by SAML alone (or OpenID+other stuff). Rather the SAML profile of SPML was intended for advanced (bulk) provisioning. While the DSML profile could be used for advanced provisioning the Liberty TEG felt that using the SAML attributes assertions as the provisioning data structure was a better fit for advance provisioning accounts that would later be used in a SAML sign-on.

Me, I see it six one way, half dozen the other.

Another point the Nishant brought up is the need for the equivalent of the SAML attribute query in SPML. That already exists in the form of the SPML Search query which supports getting a single user record and asking for a subset of attributes.

When discussion whether JIT or advanced provisioning is appropriate, the points that are usually brought up are auditing, de-provisioning, and billing. But Nishant overlooks the most important point:

Do the identities have any meaning outside of authentication?

If the answer for your service is yes, than JIT provisioning is likely not an option.This is not a case of “tightly coupled” versus “loosely coupled”. Rather it is a matter of business requirements.

Take my current employer CareMedic (now part of Ingenix). We have a suite of medical revenue cycle management web apps that we host as SaaS. We need to know the identities of our customer users for the purposes of workflow before the user actually logs into the system.

Of course there are plenty of apps where the business requirements make JIT provisioning ideal. But it still comes down to the business requirements, not the technical architecture or standards.