Features of Patent softwares and databses

From Vinodksingh

Excel

It became apparent during the preparation of this article that Excel forms the basis of many of discussed packages’ visualisation output options, and in its own right can be identified as a well-established visualisation tool.

Of course, an advantage of presenting information in this way is the relatively low cost, as well as existing user familiarity with Excel charting and tabulation presentations, with no requirement for individual specialised visualisation packages.

There is also a number of plug-ins and add-ins for Excel which can be used to display chemical structure diagrams, and the program allows for further analysis and sorting of information to a variety of criteria, with the additional advantage of enabling data from a range of resources, both internal and external, to be merged—an example being patent data found from online searches combined with that from internally accessed patent gene sequence resources, used successfully within AstraZeneca to present collated data.

BizInt Smart Charts for Patents

The BizInt Smart Charts [4] tools have been long-standing favourites for me, and the Smart Charts for Patents package has become an established part of our repertoire at AstraZeneca for reporting patent and chemical information.

The tool has a lot of flexibility in application to a wide range of data resources, in terms of the number of host systems and data file formats that can be imported and manipulated.

Colleagues agree that the tool is particularly straightforward to use for presenting chemical patent information, and is good for producing simple, clear reports, allows comments to be added as an additional column, and key elements can be highlighted by, e.g., changes in ranking.

In Smart Charts, the retention of links to records and the preservation of sub-table layouts as retrieved, for example, from patent abstract databases on STN, is superior to Excel, and it is easier to display images, including structure diagrams, in Smart Charts and to update the charts on a regular basis with new data.

Data can be exported from a number of resources containing patent and chemical information, such as MicroPatent® and IDdb, as well as the online patent abstract databases, with easy combination of data elements into charts with nested sorting options. Simple sorting by common data elements, such as the basic patent number or priority date, allow easy grouping and duplicate removal for simple presentation of large amounts of patent information from these diverse resources.

VantagePoint

In 2004, BizInt Solutions introduced the option of integration with VantagePoint [5], which provides additional data visualisation possibilities. The integration with VantagePoint works both ways, allowing data to be exported from Smart Charts for analysis in VantagePoint, or data from VantagePoint can be exported to Smart Charts to create various reports.

VantagePoint itself provides a number of visualisation options, ranging from a matrix presentation of data, correlation maps with connections between data points, with drill down options to view the original records, and of course export to Excel for production of spreadsheets and charts. The package is applicable to patent outputs from a variety of hosts and databases, and the “import engine” is customisable to deal with data from other resources. Data clean-up tools are available, but as the volume of records grows, the subsequently generated maps can be very difficult to interpret, and users tend to resort to Excel outputs for simpler presentations.

Overall, VantagePoint’s visualisation features are interesting, but this is one of a number of tools described in this article which have a relatively high cost per user, with an additional annual subscription which is not insignificant, especially as the intention is for the tool to sit with each user so the scope of the analyses can be explored—report outputs do not allow all of the analytical features to be shared.

Derwent Analytics

When launched, Derwent AnalyticsSM [6] was exclusively based on Derwent World Patent Index data, but developments now allow data from Delphion to be processed.

The List clean-up option is an unique enhancement of the basic VantagePoint tool that also underlies Derwent AnalyticsSM.

The various different maps generated give representations with increasing levels of complexity, so may need in-depth analysis to glean the key results.

I am told that the key to effective application of a tool of this type is to have a large number (3000–5000) of abstracts to analyze, so it is an advantage to have an “Open Access” agreement for effectively unlimited download of Derwent WPI data, but if this is not part of your company’s information strategy, this can be a significant restriction, as the costs are likely to be prohibitive, because the inherent “fuzziness” of a large dataset means it is inevitable that a major proportion of the downloaded full records will ultimately be discarded.

MicroPatent®

MicroPatent® worksheets are seen to have a useful range of output options, particularly for endusers, but it is felt that significant manual effort is often required to produce the desired MicroPatent® worksheets, so perhaps this feature is not so straightforward for the occasional user, inhibiting people from reaching the endpoint of visualisation of the data.

However, if the users persist, the export to Excel, with e.g. claims text included, is a popular option. The outputs can then be merged with data from other resources, such as records from in-house patent sequence searches, allowing enhanced reports with tables and charts to be produced.

As mentioned previously, the option to export to BizInt Smart Charts is also appreciated by information professionals, for the presentation of large amounts of data in a simple format without the need for extensive preparation.

Aureka

Aureka, now an integral part of Thomson Scientific’s MicroPatent® suite, has many in-depth analysis and visualisation features, and the product is strongly targeted towards IP portfolio management and analysis.

This is one of the products, which uses contour map displays for data visualisation, but in this sort of presentation, one must be clear about what conclusions are to be drawn from the position and proximity of data points, particularly as the peak labels may not be consistent or particularly helpful in deciding what the peaks actually represent.

IT problems made evaluation of this tool problematical in early 2004, so our licence was not pursued.

SciFinder®

SciFinder® is a key tool, which has data visualisation capabilities, aimed at the end user, in contrast to some of the others described here, which are more likely to be used by information professionals.

Analyze and Refine commands enable data to be processed through to various categorisation options and displayed through the Panorama feature, which is similar to other cross-tabulation displays in other tools. Getting the best out of the Panorama tool is not always intuitive, but once grasped, can provide an excellent overview of the data presented. Especially attractive is the retention of links from the tabulated data to the original results and through to full-text.

Of course, full access to these valuable features comes at a not insignificant cost, if the product is provided to the desktop of all potential users under an annual subscription model.

STN Express®

STN Express® with Discover!™ Analysis Edition [10] has features for the information professional which resemble those available through SciFinder® for the enduser. For example, the Cross-Tab outputs are obvious parallel developments to the Panorama options seen in SciFinder®. So similarly to the SciFinder® Panorama output, we can display a cross-tab representation of the data, which has been previously grouped according to company name, and drill down from the totals cell to the records in question.

The Report and Table Tools are seen as particularly useful by patent information specialists, for reporting in a consistent format, which gives a common view to patent attorneys across the organisation, no matter at which site they are based.

The Data Grouping Tool is valuable to prepare the data set by automatically clustering similar terms, but this feature can also be customised to your own settings.

The STN Express® Analyze Plus Wizard allows the familiar ANALYZE and TABULATE commands to be used to analyze almost any field, and further refinements can be applied to the selection; it is particularly advantageous to be able to include multiple STN-hosted database outputs.

The Variable Group Analysis tool for R-group analysis is particularly attractive, and has obvious utility in summarising potentially extensive datasets in just the way a chemist requires. STN Express® appears to be one of the few tools, which provides this sort of analysis on chemical information from patents data from external databases. Of course, several structure–activity relationship analysis tools exist, and can be applied to chemical data from a variety of sources, but these do not appear to be routinely linked directly to external patent information sources.

Again, outputs in Excel chart format are relatively easily generated, which can then be captured for presentations or reports, useful for supplying to internal customers who require the summary rather than access to the detailed analysis process.

One downside is that, although we are able to use STN Express® as communication software to access other hosts like DataStar and Questel.Orbit, outputs from these hosts cannot be processed in quite the same way through the Tools, as the latter are optimised for STN datasets.

Once a data set is processed it is not easy to update, in contrast to BizInt Smart Charts, so to add data or to get a different view, we often have to start again from scratch.

A recent development is the integration of the latest version of STN Express® with a new visualisation package STN® AnaVist™.

RefViz™

I wanted to mention RefViz™ briefly, as its features do show another view on data which could be used on the textual and bibliographic elements of patent abstracts, although it is currently targeted at specific literature data sets; we have already found it to work with a wider range of data sets than those explicitly suggested by the suppliers.

With RefViz™, bibliographic records held within personal databases in Reference Manager, EndNote, or Procite can be analyzed and presented in two key views. The Galaxy View groups references conceptually, while the Matrix View grouped references according to terms discussed together.

The views are quite intuitive and give different insights on the data sets concerned to suit different user requirements.

OmniViz®

OmniViz® forms the basis of the RefViz™ visualisation components. In its “native” form, the software can integrate data from multiple sources, and has additional visualisation options and specific features directed to analysis. It was last evaluated some time ago within AstraZeneca, from the point of view of both information professionals and chemists, as it is able to analyze diverse data types and integrate scientific and patent data, but was not progressed due to the costs (per concurrent user and annually).

Vivísimo

Vivísimo is a tool with potential to be applied to textual patent data, to perhaps provide simpler clustering and categorisation views on the data, rather than more advanced visualisation. Vivísimo technology also provides the advanced text clustering features in Aureka.

From their website, you can very quickly produce an hierarchical listing in “real-time”, with named clusters and hit terms highlighted, linking through to original abstracts for easy review. It seems relatively simple for endusers to drill down within the descriptive folder names to the articles, and as this package can be applied to other datasets defined by the user, this could, in principle, be applied to patent abstracts or full-text documents. Anacubis™

When I put the original presentation together back in January 2005, I wanted to highlight anacubis™, as I had seen a number of demonstrations which I found impressive, but I heard soon after that the parent company have withdrawn development of this tool within the patents arena.

A key application area was seen to be identifying the relationships between patents through citations and other connections, in order to explore licensing offerings—the intellectual property concerned is often several generations beyond the original patents, so it is important to know who really owns what.

So the sort of visualisation provided by anacubis™ is a little different from the others we have seen so far, and has potential in linking patent information with other business and company data—a shame if it disappears completely.

Other recent additions

Seen at IPI-ConfEx 2005, PatAnalyst and PatentExaminer are two products with similar profiles with regard to their interfaces, datasets and functionality, being based on the EPOQUE package used by EPO patent examiners, made available for commercial development as of last year.

Both have excellent visualisation and viewer functionality, with colour coding to reflect the frequency of term occurrences and their proximity to other terms—a really easy way to see highly relevant sections. These will need some usage to evaluate in comparison to each other and the existing range of tools available.

STN® AnaVist™

Most recently, STN® AnaVist™ , launched in July 2005, is a new visualisation component which works with STN Express®, to give information professionals a variety of ways to analyze and view information found in scientific literature and patents.

Capabilities available in STN® AnaVist™ include a workspace displaying a default of four data visualisations, dynamically integrated, including cluster and contour maps, histograms, and co-occurrence matrices. A key feature is the customisable data grouping feature to combine disparate entries for, e.g., company or inventor names. Our usage of this tool is limited so far but the product appears to have more flexible reporting options than some of the other tools previously reviewed.


Personal tools