DBIS: Publikationen

DBIS: Publikationen TR 136: Information Extraction from the Web with Florid Technical Report 136 Institut für Informatik Universität Freiburg March 2000 Information Extraction from the Web Wolfgang May Georg Lausen

The goal of information extraction from the Web is to provide an integrated view on data from autonomous, heterogeneous information sources. The main problem with current wrapper/mediator approaches is that they rely on very different formalisms and tools for wrappers and mediators, thus leading to an "impedance mismatch" between the wrapper and mediator level. Additionally, most approaches nowadays are restricted to access information only from a fixed set of sources. On the other hand, generic Web querying approaches are restricted to pure syntactical and structural queries and do not deal with semantical issues.

In this paper, we discuss an integrated architecture for Web exploration, wrapping, mediation, and querying. Our system is based on a unified framework - i.e., data model and language - in which all tasks are performed. We regard the Web and its contents as a unit, represented in an object-oriented data model: the Web structure, given by its hyperlinks, the parse-trees of Web pages, and its contents are all included in the internal world model of the system. The advantage of this unified view is that the same data manipulation and querying language can be used for the Web structure and the application-level model: The model is complemented by a rule-based object-oriented language which is extended by Web access capabilities and structured document analysis. Thus, accessing Web pages, wrapping, mediating, and querying information can be done using the same language.

This integration also allows for data-driven Web exploration which is independent from a given network of individual predefined wrappers and mediators. Thus, in addition to the classical wrapper and mediator functionality, a system with this architecture can be equipped with Web navigation and exploration functionality. Queries to existing Web indexing and searching engines can also be integrated.

In particular, we present a methodology for reusing generic rule patterns for typical extraction, integration, and restructuring tasks using this framework. In an abstract sense, the system contains a universal wrapper, which can be applied to arbitrary Web pages that the system learns about during information processing. Equipped with suitably intelligent rules, the system can potentially explore initially unknown parts of the Web, thus coping with the steady growth of the Web.

We show the practicability of our approach by using the FLORID system. The approach is illustrated by two case-studies.

[ps-File]

Excerpts of this work have been published in

Modeling and Querying Structure and Contents of the Web, International Workshop on Internet Data Management (IDM'99), Florence, Sept. 2, 1999,
A Unified Framework for Wrapping, Mediating and Restructuring Information from the Web, International Workshop on the World-Wide Web and Conceptual Modeling (WWWCM'99), Paris, Nov. 15-18, 1999,
An Integrated Architecture for Exploring, Wrapping, Mediating and Restructuring Information from the Web, Australasian Database Conference (ADC 2000), Canberra, Jan. 31 -Feb. 3, 2000,
Slides have been presented at Dagstuhl-Seminar "Declarative Data Access on the Web", Sept. 12-17, 1999, Schloss Dagstuhl, Germany.
The MONDIAL Case Study describes a practical application.

A journal version appeared with Informations Systems, 2004.

Abstract ADC2000 Australasian Database Conference (ADC 2000) Canberra, Australia Jan. 31 - Feb. 3 2000 1 31 Australian Computer Science Communications, Vol. 2, No. 2, IEEE CS Press 82-89 An Integrated Architecture for Exploring, Wrapping, Mediating and Restructuring Information from the Web Wolfgang May

The goal of information extraction from the Web is to provide an integrated view on heterogeneous information sources. A main problem with current wrapper/mediator approaches is that they rely on very different formalisms and tools for wrappers and mediators, thus leading to an "impedance mismatch" between the wrapper and mediator level. Additionally, most approaches currently are tailored to access information from a fixed set of sources. In this paper, we discuss an architecture where Web exploration, wrapping, mediation, and querying is done in an integrated system. Such an architecture reveals significant advantages in combination with a unified framework - i.e., data model and language - in which all tasks are done. Our approach is based on a unified model of the application-level information and the relevant fragment of the Web, and on an integrated language for accessing the Web, wrapping, mediating, and querying information. In this world model, in contrast to other approaches, the relevant part of the Web becomes a part of the internal world model of the system. This allows for a data-driven Web exploration which is independent from a given network of individual predefined wrappers and mediators. Thus, in addition to the classical wrapping and mediating functionality, a system in this architecture can be equipped with Web navigation and exploration functionality. In an abstract sense, the system comprises a universal wrapper which can be applied to arbitrary Web data sources which become known to the system during information processing. Equipped with suitably intelligent rules, the system can potentially explore before unknown parts of the Web, thus coping with the steady growth of the Web. The architecture is implemented in the FLORID system.

[Slides]

Abstract IDB2000 Workshop Internet-Datenbanken, GI-Jahrestagung 2000 Berlin, Germany Sept. 19 2000 9 19 Published in Technical Report No. 12, Fak. f. Informatik, Univ. Magdeburg, 2000 31-45 Handling XML with a Deductive Database System Wolfgang May

We propose an integration of XML with F-Logic, a deductive object-oriented database framework. The F-Logic data model is in fact a semistructured data model, exhibiting many similarities with the XML/DOM data model: there is a canonical mapping from XML to a fragment of F-Logic. The advantages of the integration are that the full expressiveness of F-Logic (and the functionality of the Florid system) can be applied to the data, providing an intuitive language for view definitions, updates, schema reasoning, Web exploration, integration (also with non-XML sources) etc. Especially, extended XML features, such as XML Schema or XLink can easily be prototypically implemented and evaluated.

[Postscript]

[Slides]

TR 149: XPath-Logic and XPathLog: A Logic-Based Approach for Declarative XML Data Manipulation Technical Report 149 Institut für Informatik Universität Freiburg February 2001 2 XPath-Logic and XPathLog: A Logic-Based Approach for Declarative XML Data Manipulation Wolfgang May

In this work, a logic-based framework for handling XML data is proposed. XPath-Logic embeds an extension of the XPath query language into first-order logic. We give a model-theoretic semantics of XPath-Logic formulas based on answer-sets. XPathLog is the Horn fragment of XPath-Logic, providing a logic-based language for manipulating and integrating XML data. Due to the close relationship with XPath, the semantics of rules is easy to grasp. In contrast to other approaches, the XPath syntax and semantics is also used for a declarative specification how the database should be updated: when used in rule heads, XPath filters are interpreted as specifications of elements and properties which should be added to the database. The formal semantics is defined wrt. a Herbrand structure which covers the XML data model. XPathLog has been implemented in LoPiX.

[ps-File]

Revised versions can be found in the Habilitation Thesis and in Theory and Practice of Logic Programming, 2004.
Further documentation can be found on the LoPiX project homepage.

Abstract DBPL 01 Workshop on Databases and Programming Languages (DBPL 2001) Frascati, Italy Sept., 8-10 9 8 2001 LNCS 2397, Springer 165-181 A Rule-based Querying and Updating Language for XML Wolfgang May XPathLog

We present XPathLog as a Datalog-style extension to XPath. The querying part extends XPath with binding multiple variables to XML nodes which are "traversed" when evaluating an XPath expression. Data manipulation is done in a rule-based way. In contrast to other approaches, the XPath-based syntax and semantics is also used for a declarative specification how the database should be updated: XPath filters are interpreted as specifications of elements and properties which should be added to the database. In this paper, we focus on the theoretical aspects of XPathLog. XPathLog has been implemented in the LoPiX system.

[Slides]

Extended versions can be found in the Habilitation thesis and in Theory and Practice of Logic Programming, 2004.
Further documentation can be found on the LoPiX project homepage.

Abstract DIWeb 01 CAiSE Workshop Data Integration over the Web (DIWeb'01) Interlaken, Switzerland June, 4/5 6 4 2001 Published as Technical Report, Univ. Montpellier (LIRM), France 2-16 Integration of XML Data in XPathLog Wolfgang May XPathLog

XPathLog is a logic-based language for manipulating and integrating XML data. It extends the XPath query language with Prolog-style variables. Due to the close relationship with XPath, the semantics of rules is easy to grasp. XPathLog defines a semantics for XPath expressions in rule heads, declaratively specifying how to create and update XML trees and nodes. In this paper, we show how XPathLog can be used to manipulate and restructure a database containing several XML trees. By linking subtrees, fusing elements and defining synonyms, data can be restructured and integrated into result trees. We illustrate the practicability of the approach by excerpts of a case study done with the LoPiX system.

[postscript]
[slides]

Extended versions can be found in the Habilitation thesis and in Theory and Practice of Logic Programming, 2004.
Further documentation can be found on the LoPiX project homepage.

Abstract FMLDO 01 Intl. Workshop on Foundations of Models and Languages for Data and Objects (FMLDO 2001) Roma, Italy Sept., 16-18 9 16 2001 Published as Technical report, Dept. of Computer Science, Univ. Manchester, UK. Post-conference proceedings should have been appeared with Springer LNCS. On an XML Data Model for Data Integration Wolfgang May Erik Behrends XPathLog

We consider the problem of integrating XML data using a warehouse strategy. In particular, we show that the DOM model and the XML Query Data Model are not suitable for data integration. We present a solution by a node-labeled graph-based data model, called XTreeGraph, for an internal XML database that represents multiple, overlapping XML trees, or tree views. The practicability of the approach is shown by a rule-based XML querying and manipulation language, implemented in the LoPiX system.

[postscript]
[pdf]
[Slides]

An extended version can be found in the Habilitation thesis and in Theory and Practice of Logic Programming, 2004.
Further documentation can be found on the LoPiX project homepage.

W.May: Habilitation Thesis Habilitation Thesis Institut für Informatik Universität Freiburg April 2001 4 XPath-Logic and XPathLog: A Logic-Based Approach to XML Data Manipulation Wolfgang May http://user.informatik.uni-goettingen.de/~may/Habil/ Abstract IDEAS 01 International Database Engineering & and Applications Workshop (IDEAS'01) Grenoble, France July, 16-18 7 16 2001 IEEE Computer Society Press 123-128 XPathLog: A Declarative, Native XML Data Manipulation Language Wolfgang May XPathLog

XPathLog is a logic-based language for manipulating and integrating XML data. It extends the XPath query language with Prolog-style variables. Due to the close relationship with XPath, the semantics of rules is easy to grasp. In contrast to other approaches, the XPath syntax and semantics is also used for a declarative specification how the database should be updated: when used in rule heads, XPath filters are interpreted as specifications of elements and properties which should be added to the database. The formal semantics is defined wrt. a graph Herbrand structure which covers the XML tree data model. XPathLog has been implemented in LoPiX.

[Slides]

Extended versions can be found in the Habilitation thesis and in Theory and Practice of Logic Programming, 2004.
Further documentation can be found on the LoPiX project homepage.

Abstract KRDB 01 International Workshop on Knowledge Representation meets Databases (KRDB 2001) Roma, Italy Sept., 15 9 15 2001 Published as Vol.45, CEUR Workshop Proceedings, Technical University of Aachen (RWTH) 73-82 A Framework for Generic Integration of XML Data Sources Wolfgang May XPathLog

We consider the situation where several XML sources have to be integrated which are assumed to contain complementary, overlapping contents. These overlappings have to be detected, and then appropriate operations have to be applied to the internal database to generate a result view. The approach uses the XPathLog language for formulating queries and updates of an XML database.

[postscript]
[pdf]
[Slides]

Further documentation can be found on the LoPiX project homepage.

Abstract VLDB01 Demo Intl. Conf. on Very Large Databases (VLDB) - Demonstration Track Rome, Italy Sept., 11-14 9 11 2001 707 LoPiX: A System for XML Data Integration and Manipulation Wolfgang May XPathLog

LoPiX is an implementation of XPathLog, an XML/XPath-native, rule-based programming language for manipulation and integration of XML documents. The main syntactical constructs are XPath expressions, extended with variables. Due to the close relationship with XPath, the semantics of rules is easy to grasp. In contrast to other approaches, the XPath syntax and semantics is also used for a declarative specification how the database should be updated: when used in rule heads, XPath filters are interpreted as specifications of elements and properties which should be added to the database. The LoPiX implementation provides an environment where XPathLog is complemented with schema information obtained from DTDs, a class concept, data-driven Web access and export functionality. Binaries of LoPiX together with a detailed paper on XPathLog can be found here.

[Demo Poster]

Due to several enquiries, here is the LaTeX source code (using the a0poster package from CTAN and the dbicons.sty).

A detailed description of the XPathLog language can be found in the Habilitation Thesis and in Theory and Practice of Logic Programming, 2004.
Further documentation can be found on the LoPiX project homepage.

Talk at Dagstuhl-Seminar 2061 Dagstuhl-Seminar "Rule Markup Techniques" Schloss Dagstuhl, Germany Feb. 3-8 2 3 2002 Dagstuhl-Report No. 332 Data Manipulation and Integration in XML Wolfgang May

XPathLog is a Logic-Programming style language for XML querying, data manipulation, and integration. It has been designed as a crossbreed between F-Logic (which has been successfully applied to semistructured data in pre-XML times) and XPath. Its main features are the extension of XPath with variable bindings and the definition of a constructive semantics for XPath atoms in rule heads. For updates and data integration, we favor a graph-based data model instead of the XML tree model: The XTreeGraph is an extension of the XML data model that allows multiple overlapping trees in a graph-like database. Result views are then defined as XML tree views over this internal database. XPathLog and the XTreeGraph have been implemented in the LoPiX system.

Post-Workshop Summary of the Talk: Due to many questions at the workshop, the talk had a second title, "From F-Logic to XPathLog": In addition to the presentation of XPathLog, also its development as a reconcilation between F-Logic as a "proprietary" data model and language for knowledge representation and data integration, and the standards of the XML world is described.

The XPathLog language is a Datalog-like extension of XPath for querying, manipulating and integrating XML data. Based on navigation and filtering, its basic, internal semantics is closely related to F-Logic. The querying part extends XPath with binding variables to XML nodes that are "traversed" when evaluating an XPath expression. Variables can be bound to literals, nodes, and even names, allowing for metadata reasoning. The variable bindings can be output as answers, or they can be communicated to the rule head for specifying updates in the database. In contrast to other approaches, the XPath syntax and semantics is also used for a declarative specification how the database should be updated: when used in rule heads, XPath filters are interpreted as specifications of elements and properties which should be added to the database.

The restriction that XML uses a tree data model directly effects the semantics of updates: if an update specifies that some subtree of a document should be inserted also at another place, the subtree must be copied. Thus, for references into this tree, it must be decided whether they point into the original or into the copy; "sharing" subtrees - as in graph data models such as OEM or F-Logic - is not possible. On the data integration level, when restructuring trees, fusing nodes, and introducing synonyms, this problem occurs even more.

Thus, as a data manipulation and integration language, XPathLog - and the LoPiX implementation - internally use a graph-based, edge-labeled model, called XTreeGraph. The XTreeGraph extends the basic XML data model by modeling multiple overlapping trees, and thus allows for restructuring existing XML trees into a densely connected graph database. XML result trees are then defined as XML tree views by projections from this database.

Thus, the "pure" XPathLog language provides an intuitive Datalog-style extension of XPath for querying, data manipulation, and data integration which is very close to the standard syntax and semantics of XPath. Extended features of the language provide a class hierarchy (including nonmonotonic inheritance), a lightweight signature formalism (which is also used for defining tree views from the XTreeGraph), and data-driven Web access to extend the database with further XML documents. The expressiveness and flexibility of the full language makes it a candidate for combined handling of data, schema-metadata, and semantical metadata such as ontologies.

Slides: [postscript] [pdf]

A detailed description of the XPathLog language can be found in the Habilitation thesis and in Theory and Practice of Logic Programming, 2003.
Further documentation can be found on the LoPiX project homepage.

WebS'02: Linking the Semantic Web DEXA Workshop on Web Semantics (WebS'02) Aix-en-Provence, France Sept. 3 9 3 2002 Proc. DEXA 2002 Workshop, IEEE Computer Society Press 93-97 Linking the Semantic Web with Existing Sources Wolfgang May

The Semantic Web aims at providing Web data sources on a semantic level. On the other hand, most of the Web data itself is not suitably prepared (e.g., by annotations). In this paper, we describe a semantic layer that integrates existing data sources with the Semantic Web by combining semantic modeling with links that associate the semantic notions with actual data on the Web. The semantic level consists of specialized service providers - which can be seen as agents - for each application domain. Each agent contains ontological knowledge represented in XML where the links to the actual data are embedded as XPath expressions, similar to XLink. The agent uses its knowledge with an internal reasoning mechanism to combine the links for translating a Semantic Web query into a Web query that is then evaluated against the individual sources.

[Slides (postscript)]

[Slides (pdf)]

Vortrag: SSD und CL DFG-Rundgespräch "Theorie und Anwendung semistrukturierter Daten im Schnittpunkt von Informatik und Computerlinguistik" Herrsching/Ammersee, Germany Feb. 21/22 2 21 2002 Datenmanipulation und -integration in XML Wolfgang May

XPathLog ist eine Datalog-artige Erweiterung von XPath als Anfrage-, Datenmanipulations- und -integrationssprache für XML-Daten. Dabei wird XPath um Variablenbindungen erweitert, bei denen die im Zuge der Auswertung eines XPath-Ausdrucks durchquerten XML-Knoten etc. an Variablen gebunden werden können. Dabei können Variablen sowohl an Literale und Knoten, als auch an Namen gebunden werden, um Anfragen auf Daten- sowie Schemaebene stellen zu können. Variablenbindungen können entweder als Antworten ausgegeben werden, oder an den Regelkopf weitergegeben werden um Änderungen an der Datenbasis auszuführen. Im Gegensatz zu anderen Ansätzen wird dabei XPath-Syntax und Semantik auch zur deklarativen Spezifikation von Änderungen verwendet: XPath-Ausdrücke im Regelkopf erhalten eine konstruktive Semantik, indem sie als Beschreibung der Elemente und Attribute, die der Datenbank hinzugefügt werden sollen, interpretiert werden.

Die Einschränkung, dass XML auf einem baumartigen Datenmodell basiert hat dabei direkte Auswirkungen auf die Semantik von Änderungsoperationen: Wenn eine Änderung spezifiziert, dass ein (durch eine Anfrage) erhaltener Teilbaum einer XML-Instanz an einer bestimmten Stelle eingefügt werden soll, mu"s dieser Teilbaum kopiert werden. Damit ergibt sich die Frage, ob Referenzen, die in den originalen Teilbaum zeigen, angepasst werden sollen; es ist nicht möglich, denselben Teilbaum ein zweites Mal einzubinden (wie in graphbasierten Modellen wie z.B. OEM oder F-Logic). Wenn man Datenintegrationsprobleme, wie etwa die Restrukturierung von XML-Instanzen, das Verschmelzen von Knoten aus verschiedenen Bäumen, oder Synonyme betrachtet, wird dieses Problem noch deutlicher.

Aus diesem Grund verwendet XPathLog - und die Implementierung in LoPiX - intern ein graphbasiertes, an den Kanten markiertes Datenmodell, das als XTreeGraph bezeichnet wird. Der XTreeGraph erweitert das bekannte XML-Datenmodell indem mehrere überlappende Bäume modelliert werden können, und somit die skizzierten Änderungs- und Integrationsoperationen unterstützt werden. Zur Integration verschiedener XML-Bäume wird aus den Eingabebäumen ein interner XTreeGraph erzeugt, auf dem die Ergebnisbäume durch Projektion als XML-Sichten definiert werden.

Die "reine" XPathLog-Sprache bietet damit eine intuitive Datalog-artige Erweiterung von XPath zu einer Anfrage-, Datenmanipulations- und -integrationssprache, deren Syntax und Semantik eng an den bekannten XPath-Standard anknüpft. Spracherweiterungen bieten zusätzlich eine Klassenhierarchie (mit nichtmonotoner Wertvererbung), ein einfaches Konzept zur Beschreibung des Datenbankschemas (der zur Definition von XML-Views über dem XTreeGraph verwendet wird), sowie den Zugriff auf weitere Datenquellen im Web während der Auswertung eines Programms. Die Flexibilität und Ausdruckskraft der vollen XPathLog-Sprache erlaubt eine kombinierte Behandlung von Daten, Schemadaten, sowie semantischen Metadaten, wie z.B. Annotierungen und Ontologien.

Folien: [postscript] [pdf]

TODS: Referential Integrity ACM Transactions on Database Systems (TODS) 27 4 343-397 December 2002 12 Understanding the Global Semantics of Referential Actions using Logic Rules Wolfgang May Bertram Ludäscher Referential Integrity

Referential actions are specialized triggers for automatically maintaining referential integrity in databases. While the local effects of referential actions can be grasped easily, it is far from obvious what the global semantics of a set of interacting referential actions should be. In particular, when using procedural execution models, ambiguities due to the execution ordering can occur. No global, declarative semantics of referential actions has been defined yet.

We show that the well-known logic programming semantics provide a natural global semantics of referential actions that is based on their local characterization: To capture the global meaning of a set RA of referential actions, we first define their abstract (but non-constructive) intended semantics. Next, we formalize RA as a logic program P_RA. The declarative, logic programming semantics of P_RA then provide the constructive, global semantics of the referential actions. So, we do not define a semantics for referential actions, but we show that there exists a unique natural semantics if one is ready to accept (i) the intuitive local semantics of local referential actions, (ii) the formalization of those and of the local "effect-propagating" rules, and (iii) the well-founded or stable model semantics from logic programming as "reasonable" global semantics for local rules.

We first focus on the subset of referential actions for deletions only. We prove the equivalence of the logic programming semantics and the abstract semantics via a game-theoretic characterization, which provides additional insight into the meaning of interacting referential actions. In this case a unique maximal admissible solution exists, computable by a PTIME algorithm.

Second, we investigate the general case, i.e. including modifications. We show that in this case there can be multiple maximal admissible subsets and that all maximal admissible subsets can be characterized as 3-valued stable models of P_RA. We show that for a given set of user requests, in presence of referential actions of the form ON UPDATE CASCADE, the admissibility check and the computation of the subsequent database state, and (for non-admissible updates) the derivation of debugging hints all are in PTIME. Thus, full referential actions can be implemented efficiently.

The paper is based on previous publications and reports:

Referential Actions: From Logical Semantics to Implementation, Bertram Ludäscher, Wolfgang May, EDBT'98 (6th Intl. Conference on Extending Database Technology),
Referential Actions as Logic Rules, Bertram Ludäscher, Wolfgang May, Georg Lausen, Proc. of 16th ACM Symposium on Principles of Database Systems (PODS'97).
Towards a Logical Semantics for Referential Actions in SQL, Bertram Ludäscher, Wolfgang May, Joachim Reinert, Proc. 6th Intl. Workshop on Foundations of Models and Languages for Data and Objects: Integrity in Databases, Dagstuhl, Germany, 1996.

WWW'02: Querying Linked XML Document Networks 11th International World Wide Web Conference (WWW 2002) Honolulu, Hawaii May, 7-11 5 7 2002 CDROM/online at http://www2002.org/CDROM/alternate/166/ Querying Linked XML Document Networks in the Web Wolfgang May

The W3C XML Linking Language (XLink) provides a powerful means for interlinking XML documents all over the world. From the database (and in general, querying) point of view, elements with linking semantics as specified by XLink can be seen as embedded views in an XML instance. Compared with classical databases, i.e., SQL and relational data, the situation of having links inside the data is new, raising new aspects for query languages: for using such documents, strategies how to handle links are required. There is not yet an official proposal on the interaction of interlinked XML documents using the XLink language and navigation/querying. We investigate a model where interlinked documents are regarded as virtual XML subtrees, i.e., XML views. Several strategies are presented how to handle such subtrees, concerning the timepoint when the link is evaluated, and the evaluation and caching strategies. The evaluation strategies are influenced by capabilities of the linked XML servers. So far, the approach is independent from the actual querying language. The approach is under implementation as an extension of LoPiX, a web-aware system for XML data manipulation.

[Slides (postscript)]

[Slides (pdf)]

BTW 2003 GI-Fachtagung Datenbanksysteme für Business, Technologie und Web (BTW 2003) Leipzig, Germany February 26-28 2003 2 26 A Logical, Transparent Model for Querying Linked XML Documents Wolfgang May Dimitrio Malheiro

The W3C XML Linking Language (XLink) provides a powerful means for interlinking XML documents all over the world. While the effects when browsing through linked XML documents are well-defined, there is not yet any proposal how to handle interlinked XML documents that make use of the XLink language from the database point of view, i.e., considering the data model and navigation/querying aspects. From the database (and in general, querying) point of view, elements with linking semantics can be seen as virtual XML subtrees, i.e., XML views. Compared with classical databases, i.e., SQL and relational data, the situation of having links inside the data is new. We define a logical, transparent data model for linked documents. Queries are then formulated in standard XPath against the logical model. We propose additional attributes using the dbxlink (database-xlink) namespace for specifying the mapping from XLinks to the logical model.

[Slides (postscript)]
[Slides (pdf)]

BTW 2003 CoLogNET Workshop on Logic-based Methods for Information Integration Vienna, Austria August 23 2003 8 23 A Logic-Based Approach to XML Data Integration with "lazy materialization" Wolfgang May

XPathLog is a Datalog-like extension of XPath for querying, manipulating and integrating XML data. The querying part extends XPath with binding variables to XML nodes that are "traversed" when evaluating an XPath expression. In contrast to other approaches, the XPath syntax and semantics is also used for a declarative specification how the database should be updated: when used in rule heads, XPath filters are interpreted as specifications of elements and properties which should be added to the database. Special operations for data integration include additional cross links (subelement relationship and attributes) between fragments of the original database, declaring synonym relationships between notions from different sources, and XML element fusion.

As a data manipulation and integration language, XPathLog is originally based on a graph-based, edge-labeled model, called XTreeGraph. The XTreeGraph extends the basic XML data model by modeling multiple overlapping trees, and thus allows for restructuring existing XML trees into a densely connected graph database. XML result trees are then defined as XML tree views by projections from this database. The LoPiX implementation follows a "warehouse approach" where the integrated XTreeGraph is materialized.

In the present talk, a "lazy materialization" strategy is proposed that does not materialize the complete internal database, and that is based on the original XML tree model combined with XLink: Data items that are not (yet) changed are integrated into the internal database as references, represented by XLinks. The approach uses our recent proposal for a logical, transparent data model for XLinked data. Only if a referenced data item is actually modified in course of the integration process, it is (partially) loaded into the database; unchanged fragments of it still being represented in a lazy way by XLinks.

[Slides (postscript)]
[Slides (pdf)]

DB-Spektrum 2003: Datenintegration Datenbank-Spektrum 6 3 23-32 June 2003 6 Datenintegration in XML - ein regelbasierter Ansatz Wolfgang May

Dieser Beitrag beschreibt die Erfahrungen, die im Rahmen des LoPiX-Projektes bei der Integration von XML-Daten gemacht wurden. Die in diesem System verwendete Sprache XPathLog ist eine Datalog-artige Erweiterung von XPath, die es erlaubt, Änderungen an der Datenbank in dieser erweiterten XPath-Syntax zu spezifizieren. Da solche Änderungen - speziell im Zuge der Datenintegration, wie z.B. die Restrukturierung von Datenbeständen, Verschmelzung von Knoten, und Einführung von Synonymen - auf dem XML-Datenmodell nicht möglich sind, basiert LoPiX auf dem XTreeGraph-Datenmodell, das nicht nur eine, sondern mehrere überlappende XML-Baumstrukturen verwalten kann. Als Ergebnis können dann XML-Bäume als Projektionen dieses Graphen (z.B. durch DTDs) erzeugt werden.

Anfragen, Ändern und Publizieren von XML Web & Datenbanken (Kapitelübersicht) Erhard Rahm und Gottfried Vossen, editors dpunkt-Verlag ISBN 3-89864-189-9 65-100 January 2003 1 Anfragen, Ändern und Publizieren von XML Georg Lausen Wolfgang May XML

Dieses Kapitel hat Anfragesprachen, Datenmanipulation und -transformation von XML-Daten zum Inhalt. Dabei wird sowohl die Entwicklung der Konzepte beschrieben, als auch eine Einführung in die gegenwärtig populärsten Sprachen, XPath und XQuery, gegeben. XML-Anfragesprachen gehen zurück auf die frühen XSL Patterns und XQL, aus denen sich XPath als Adressierungsformalismus entwickelte, der die Grundlage für mächtigere Sprachen zu XML bildet. In diesem Kapitel werden die Anfragesprachen XML-QL, sowie das auf XPath basierende Quilt, aus dem dann XQuery hervorging, beschrieben. Mittlerweile wurden zu diesen bis dahin reinen Anfragesprachen auch Konzepte zur Datenmanipulation vorgeschlagen, die als Spracherweiterung zu XQuery auch bereits implementiert sind. Weiterhin wird in diesem Kapitel die Transformation von XML-Daten beschrieben, die letztlich eine Grundlage für die Präsentation in HTML ist.

IS 2004: Information Integration Information Systems 29 1 59-91 January 2004 1 A Uniform Framework for Integration of Information from the Web Wolfgang May Georg Lausen

We discuss a system that implements an integrated framework for Web exploration, wrapping, data integration, and querying. Here, the "integration" applies in three aspects: the data model and the functionality, and the architecture. The core of the approach is a unified framework - i.e., data model and language - in which all tasks are performed. We regard the Web and its contents as a unit, represented in a semi-structured, object-oriented data model: the Web structure, given by its hyperlinks, the parse-trees of Web pages, and its contents are all included in the internal world model of the system. Additionally, the application-level model is immediately generated as an overlay of this source-level model. The model is complemented by a rule-based object-oriented language which is extended by Web accessing capabilities and structured document analysis. This language is implemented by a central reasoning engine. The advantage of our unified approach is that the same data manipulation and query language can be used for all tasks, i.e., accessing Web pages, wrapping, data integration, and querying information. Thus, these tasks are not necessarily separated, but can be closely intertwined. Additionally, by reusing the source-level model for generating the application-level model, there is no overhead for communication and mapping between different data formats. In particular, we present a methodology for reusing generic rule patterns for typical extraction, integration, and restructuring tasks. In an abstract sense, the system contains a universal wrapper, which can be applied to arbitrary Web pages that the system considers during information processing. Equipped with suitably intelligent rules, the system can potentially explore initially unknown parts of the Web, thus coping with the steady growth of the Web. We show the practicability of our approach by using the FLORID system.

TPLP 2004: XPathLog Theory and Practice of Logic Programming 4 3 499-526 December 2004 12 XPath-Logic and XPathLog: A Logic-Programming Style XML Data Manipulation Language Wolfgang May

We define XPathLog as a Datalog-style extension of XPath. XPathLog provides a clear, declarative language for querying and manipulating XML whose perspectives are especially in XML data integration.

In our characterization, the formal semantics is defined wrt. an edge-labeled graph-based model which covers the XML data model. We give a complete, logic-based characterization of XML data and the main language concept for XML, XPath. XPath-Logic extends the XPath language with variable bindings and embeds it into first-order logic. XPathLog is then the Horn fragment of XPath-Logic, providing a Datalog-style, rule-based language for querying and manipulating XML data. The model-theoretic semantics of XPath-Logic serves as the base of XPathLog as a logic-programming language, whereas also an equivalent answer-set semantics for evaluating XPathLog queries is given. In contrast to other approaches, the XPath syntax and semantics is also used for a declarative specification how the database should be updated: when used in rule heads, XPath filters are interpreted as specifications of elements and properties which should be added to the database.

The paper is available online at arXiv.org.

The paper is an excerpt of the Habilitation Thesis A Logic-Based Approach to XML Data Integration (2001).
XPathLog has been implemented in the LoPiX project.

Preliminary results have been presented in

Intl. Conf. on Very Large Databases (VLDB) 2001 - Demonstration Track,
Workshop on Databases and Programming Languages (DBPL 2001),
Workshop on Foundations of Models and Languages for Data and Objects (FMLDO 2001),

W.May: Diploma Thesis Diploma Thesis Fakultät für Informatik Universität Karlsruhe February 1995 2 Protokollverifikation in Temporallogik: Evolving Algebras und ein Tableaukalkül Wolfgang May Dynamics Formal Methods Temporal Logics http://user.informatik.uni-goettingen.de/~may/Diplom/ Abstract FAPR'96 International Conference on Formal and Applied Practical Reasoning (FAPR'96) Bonn, Germany June 3-7 1996 6 3 LNCS 1085, Springer 399-413 A Tableau Calculus For First-Order Branching Time Logic Wolfgang May Peter H. Schmitt Temporal Logic Dynamics

Tableau-based proof systems have been designed for many logics extending classical first-order logic. This paper proposes a sound tableau calculus for temporal logics of the first-order CTL-family. Until now, a tableau calculus has only been presented for the propositional version of CTL. The calculus considered operates with prefixed formulas and may be regarded as an instance of a labelled deductive system. The prefixes allow an explicit partial description of states and paths of a potential Kripke counter model in the tableau. It is possible in particular to represent path segments of finite but arbitrary length which are needed to process reachability formulas. Furthermore, we show that by using prefixed formulas and explicit representation of paths it becomes possible to express and process fairness properties without having to resort to full CTL^*. The approach is suitable for use in interactive proof-systems.

[PS-File]
[Slides]

IWS: Referential Integrity 6th Intl. Workshop on Foundations of Models and Languages for Data and Objects (FMLDO'96) Dagstuhl, Germany September 16-20 1996 9 16 Proceedings: University of Magdeburg, Faculty of Computer Science, Preprint No. 4, 1996 57-72 Towards a Logical Semantics for Referential Actions in SQL Bertram Ludäscher Wolfgang May Joachim Reinert Referential Integrity

We investigate a logical semantics which unambiguously specifies the meaning of SQL-like referential actions of the form ON DELETE CASCADE and ON DELETE RESTRICT. The semantics is given by a translation of referential actions into logical rules. The proposed semantics is less restrictive than the standard SQL semantics, yet preserves all referential integrity constraints. First, a preliminary set of rules is introduced which rejects a set of user requests if a single request is rejected. Subsequently, a refined translation is presented using Statelog, a state-oriented Datalog extension which allows to define active and deductive rules within a unified framework. We show that our semantics yields the maximal admissible subset of a given set of user requests. Apart from the Statelog formalization, a three-valued formalization based on the well-founded semantics and an equivalent game-theoretic specification are presented, which give further insight into the problem of ambiguity of triggers.

[PS-File]
[Slides of the talk]

Followup papers have been published in PODS 1997 and TODS 27(4), 2002.
An extended version containing these results can be found in Understanding the Global Semantics of Referential Actions using Logic Rules, Wolfgang May, Bertram Ludäscher, ACM Transactions on Database Systems (TODS), 27(4), 2002.

LID: Nested Transactions Intl. Workshop on Logic in Databases (LID'96) San Miniato, Italy July 1-3 1996 7 1 LNCS 1154, Springer 197-222 Nested Transactions in a Logical Language for Active Rules Bertram Ludäscher Wolfgang May Georg Lausen Statelog Logic Programming Active Databases Dynamics

We present a hierarchically structured transaction-oriented concept for a rule-based active database system. In previous work, we have proposed Statelog as a unified framework for active and deductive rules. Following the need for better structuring capabilities, we introduce procedures as a means to group semantically related rules and to encapsulate their behavior. In addition to executing elementary updates, procedures can be called, thereby defining (sub)transactions which may perform complex computations. A Statelog procedure is a set of ECA-style Datalog rules together with an import/export interface. System-immanent frame and procedure rules ensure both propagation of facts and processing of results of committed subtransactions. Thus, Statelog programs specify a nested transaction model which allows a much more structured and natural modeling of complex transactions than previous approaches. Two equivalent semantics for a Statelog program P are given: (i) a logic programming style semantics by a compilation into a logic program, and (ii) a model-theoretic Kripke-style semantics. While (ii) serves as a conceptual model of active rule behavior and allows to reason about properties of the specified transactions, (i) -- together with the appropriate execution model -- yields an operational semantics and can be used as an implementation of P.

[PS-File]
[Slides of the talk]

Abstract DOOD'97 5th Intl. Conf. on Deductive and Object-Oriented Databases (DOOD'97) Montreux, Switzerland December 8-12 1997 12 8 LNCS 1341, Springer 320-336 Well-Founded Semantics for Deductive Object-Oriented Database Languages Wolfgang May Bertram Ludäscher Georg Lausen Well-Founded

We present a well-founded semantics for deductive object-oriented database languages by applying the alternating-fixpoint characterization of the well-founded model to them. In order to compute the state sequence, states are explicitely integrated by making them first-class citizens of the underlying language. The concept is applied to Florid, an implementation of F-Logic, previously supporting only inflationary negation. Using our approach, well-founded models of F-Logic programs can be computed. The method is also applicable to arbitrary languages which provide a sufficiently flexible syntax and semantics. Given an implementation of the underlying database language, any program given in this language can be evaluated wrt. the well-founded semantics.

[PS-File]
[Extended set of slides]

Abstract FAPR'97 International Joint Conference on Qualitative and Quantitative Practical Reasoning (ECSQARU/FAPR '97) Bad Honnef, Germany June 9-12 1997 6 9 LNCS 1244, Springer 436-450 Process Modeling with Different Qualities of Knowledge Wolfgang May Knowledge Modeling Dynamics Formal Methods Temporal Logics

For modeling structured processes with different levels of atomic actions, classical linear or branching time Kripke structures with first-order states are insufficient: They do not provide any means for modeling independent parallel threads of activity which have to be joined at some point, action refinement, or procedure concepts. In all three cases, it is important to distinguish facts resp. knowledge derived in the current thread of activity from facts which are not concerned in this activity. This problem is also closely related to the frame problem.

In this paper, hierarchical Kripke structures are introduced for modeling hierarchically structured processes, also coping with the different qualities of knowledge arising in this context: every state consists of a total first-order interpretation, which gives the state as-is, and a partial first-order interpretation containing all procedure knowledge which has been derived in the current thread of activity.

A correct and complete set of axioms is presented for reasoning about hierarchical Kripke structures.

[PS-File]
[Slides]

PODS: Referential Integrity 16th. ACM Symposium on Principles of Database Systems (PODS 97) Tucson, Arizona, USA May 12-14 1997 5 12 ACM Press 217-224 Referential Actions as Logical Rules Bertram Ludäscher Wolfgang May Georg Lausen Referential Integrity

Referential actions are specialized triggers used to automatically maintain referential integrity. While their local behavior can be grasped easily, it is far from clear what the combined effect of a set of referential actions, i.e., their global semantics should be. For example, different execution orders may lead to ambiguities in determining the final set of updates to be applied. To resolve these problems, we propose an abstract logical framework for rule-based maintenance of referential integrity: First, we identify desirable abstract properties like admissibility of updates which lead to a non-constructive global semantics of referential actions. We obtain a constructive definition by formalizing a set of referential actions RA as logical rules, and show that the declarative semantics of the resulting logic program P_RA captures the intended abstract semantics: The well-founded model of P_RA yields a unique set of updates, which is a safe, sceptical approximation of the set of all maximal admissible updates; the third truth-value undefined is assigned to all controversial updates. Finally, we show how to obtain a characterization of all maximal admissible subsets of a given set of updates using certain maximal stable models.

[PS-File]
[Slides of the talk]

An extended version can be found in

Understanding the Global Semantics of Referential Actions using Logic Rules, Wolfgang May, Bertram Ludäscher, ACM Transactions on Database Systems (TODS), 27(4), 2002.

Abstract RIDS'97 3rd International Workshop on Rules in Database Systems (RIDS'97) Skövde, Sweden June 26-28 1997 6 26 LNCS 1312, Springer 20-34 Integrating Dynamic Aspects into Deductive Object-Oriented Databases Wolfgang May Christian Schlepphorst Georg Lausen Dynamics

We show how the dynamics of database systems can be modeled by making states first-class citizens in an object-oriented deductive database language. With states at the same time acting as objects, methods, or classes, several concepts of dynamic entities can be implemented, allowing an intuitive, declarative modeling of the application domain. Exploiting the natural stratification induced by the state sequence, the approach also provides an implementable operational semantics. The method is applicable to arbitrary object-oriented deductive database languages which provide a sufficiently flexible syntax and semantics. Provided an implementation of the underlying database language, any system specification in the presented framework is directly executable, thus unifying specification, implementation, and metalanguage for proving properties of a system. The concept is applied to F-Logic. Besides the declarative semantics given by the rules of a State-F-Logic program, the use of F-Logic's inheritance semantics for modeling states provides an effective operational semantics exploiting the naturally given state-stratification. State-F-Logic programs can be executed using the Florid implementation.

[PS-File]
[Slides]

Abstract TABLEAUX'97 International Conference on Analytic Tableaux and Related Methods (TABLEAUX'97) Pont-à-Mousson, France May 13-16 1997 5 13 LNCS 1227, Springer 261-275 Proving Correctness of Labeled Transition Systems by Semantic Tableaux Wolfgang May Temporal Logics Dynamics Formal Methods

The paper presents a method for formally proving correctness of processes specified by transition systems which is based on a tableau calculus for an extended temporal logic. The model-theoretic semantics is given by labeled Kripke structures, incorporating information about the actions performed in transitions. Extending first-order CTL for handling action labels, the multi-modal logic MCTL is defined which is well-suited for specifying transition systems and their properties. For MCTL, a tableau semantics and -calulus is presented, allowing formal verification.

[PS-File]
[Slides]

Abstract TAPSOFT'97 7th International Joint Conference on the Theory and Practice of Software Development (TAPSOFT'97) Lille, France April 14-18 1997 4 14 LNCS 1214, Springer 535-549 Specifying Complex and Structured Systems with Evolving Algebras Wolfgang May Formal Methods

This paper presents an approach for specifying complex, structured systems with Evolving Algebras by means of aggregation and composition. Evolving algebras provide a formal method for executable specifications which has been employed for specifying several algorithms and programming languages. With its transition system-like rule-based syntax, the concept is as well very intuitive as well-suited for formal reasoning and verification.

Following the need for structuring capabilities in specification frameworks, the paper proposes a concept for hierarchically structuring Evolving Algebras corresponding to the semantics of the system to be modeled, allowing to build up complex systems from simpler ones by several combinators. The concept can be generalized to arbitrary rule-based state-oriented formalisms.

In such systems, transitions regarded as atomic on the corresponding level are allowed to be specified by computations performed by sub-Evolving-Algebras instead of single rules. The subsystems provide a natural way of encapsulating data and behaviour while a computation is running. Communication is done via distinguished locations accessible to the participating systems.

[PS-File]
[Slides]

The paper is based on

[Slides] given at "Workshop on Evolving Algebras", May 17-19, 1996, Schloß Eringerfeld, Germany
[Report]

IS 23(8) FLORID Information Systems, Special Issue on Semistructured Data, 23 8 589-612 December 1998 12 Managing Semistructured Data with FLORID: A Deductive Object-Oriented Perspective Bertram Ludäscher Rainer Himmeröder Georg Lausen Wolfgang May Christian Schlepphorst Information Extraction

The closely related research areas management of semistructured data and languages for querying the Web have recently attracted a lot of interest. We argue that languages supporting deduction and object-orientation (dood languages) are particularly suited in this context: Object-orientation provides a flexible common data model for combining information from heterogeneous sources and for handling partial information. Techniques for navigating in object-oriented databases can be applied to semistructured databases as well, since the latter may be viewed as (very simple) instances of the former. Deductive rules provide a powerful framework for expressing complex queries in a high-level, declarative programming style.
We elaborate on the management of semistructured data and show how reachability queries involving general path expressions and the extraction of data paths in the presence of cyclic data can be handled. We then propose a formal model for querying structure and contents of Web data and present its declarative semantics. A main advantage of our approach is that it brings together the above-mentioned issues in a unified, formal framework and - using the FLORID system - supports rapid prototyping and experimenting with all these features. Concrete examples illustrate the concise and elegant programming style supported by FLORID and substantiate the above-mentioned claims.

Search, Analysis, and Integration of Web Documents: A Case Study with FLORID Intl. Workshop on Deductive Databases and Logic Programming (DDLP'98) Manchester, UK June 20 1998 6 20 GMD Report 22/1998 47-58 Search, Analysis, and Integration of Web Documents: A Case Study with FLORID Rainer Himmeröder Paul-Th. Kandzia Bertram Ludäscher Wolfgang May Georg Lausen Information Extraction

Languages supporting deduction and object-orientation seem particularly promising for querying and reasoning about structure and contents of the Web, and for the integration of information from heterogeneous sources. FLORID, an implementation of the deductive object-oriented language F-logic, has been extended to provide a declarative semantics for querying the Web. This extension allows extraction and restructuring of data from the Web and a seamless integration with local data. Since the functionality of wrappers and mediators is integrated into a single declarative language, the development of advanced applications based on the Web as an information source is significantly simplified. This claim is substantiated using a comprehensive example.

[PS-File]
[Slides]

W.May: PhD Thesis Dissertation/PhD Thesis Institut für Informatik Universität Freiburg Mai 1998 5 Integrated Static and Dynamic Modeling of Processes Wolfgang May Dynamics Formal Methods Temporal Logics http://user.informatik.uni-goettingen.de/~may/Diss/ EDBT: Referential Integrity 6th. Intl. Conference on Extending Database Technology (EDBT'98) Valencia, Spain March 23-27 1998 3 23 LNCS 1377, Springer 404-418 Referential Actions: From Logical Semantics to Implementation Bertram Ludäscher Wolfgang May Referential Integrity

Referential actions (rac's) are specialized triggers used to automatically maintain referential integrity. While their local effects can be grasped easily, it is far from obvious what the global semantics of a set RA of interacting rac's should be. To capture the intended meaning of RA, we first present an abstract non-constructive semantics. By formalizing RA as a logic program P_RA, a constructive semantics is obtained. The equivalence of the logic programming semantics and the abstract semantics is proven using a game-theoretic characterization, which provides additional insight into the meaning of rac's. As shown in previous work, for general rac's it may be infeasible to compute all maximal admissible solutions. Therefore, we focus on a tractable subset, i.e., rac's without modifications. We show that in this case a unique maximal admissible solution exists, and derive a PTIME algorithm for computing this solution. In case a set U of user requests is not admissible, a maximal admissible subset of U is suggested.

[edbt98.ps.gz]

This paper elaborates on the practical aspects of

Referential Actions as Logic Rules, Bertram Ludäscher, Wolfgang May, Georg Lausen, Proc. of 16th ACM Symposium on Principles of Database Systems (PODS'97).

An extended version can be found in

Understanding the Global Semantics of Referential Actions using Logic Rules, Wolfgang May, Bertram Ludäscher, ACM Transactions on Database Systems (TODS), 27(4), 2002.

KI 98: Querying Web Data with FLORID Workshop Deklarative KI-Methoden zur Implementierung und Nutzung von Systemen in Netzen, KI-98 Bremen, Germany September 16 1998 9 16 GMD Report 29/1998 65-79 Techniques and Rule Patterns for Declaratively Querying Web Data with FLORID Bertram Ludäscher Rainer Himmeröder Wolfgang May Information Extraction

FLORID is an implementation of the deductive object-oriented database language F-logic and has recently been extended to provide a declarative semantics for querying the Web. By means of several illustrative examples, we show how \florid's rule-based logical language can be used to extract, query, and analyze data from the Web.

[PS-File]
[Slides]

On Logical Foundations of Active Databases Transactions and Change in Logic Databases Hendrik Decker, Burkhard Freitag, Michael Kifer, and Andrei Voronkov, editors Springer LNCS 1472 69-106 1998 On Active Deductive Databases: The Statelog Approach Georg Lausen Bertram Ludäscher Wolfgang May On Active Deductive Databases: The Statelog Approach Active Databases

After briefly reviewing the basic notions and terminology of active rules and relating them to production rules and deductive rules, respectively, we survey a number of formal approaches to active rules. Subsequently, we present our own state-oriented logical approach to active rules which combines the declarative semantics of deductive rules with the possibility to define updates in the style of production rules and active rules. The resulting language Statelog is surprisingly simple, yet captures many features of active rules including composite event detection and different coupling modes. Thus, it can be used for the formal analysis of rule properties like termination and expressive power. Finally, we show how nested transactions can be modeled in Statelog, both from the operational and the model-theoretic perspective.

ps-file

On Logical Foundations of Active Databases Logics for Databases and Information Systems Jan Chomicki and Gunter Saake, editors Kluwer Academic Publishers ISBN 0-7923-8129-7 389-422 1998 On Logical Foundations of Active Databases Georg Lausen Bertram Ludäscher Wolfgang May Active Databases

In this chapter, we present work on logical foundations of active databases. After introducing the basic notions and terminology, we give a short overview of research on foundations of active rules. Subsequently, we present a specific state-oriented logical approach to active rules which aims at combining the declarative semantics of deductive rules with the possibility to define updates in the style of production rules. The resulting language Statelog models (flat) transactions as a sequence of intermediate transitions, where each transition is defined using deductive rules. Since Statelog programs correspond to a specific class of locally stratified logic programs, they have a unique intended model. Finally, after studying further fundamental properties like expressive power and termination behavior, a Statelog framework for active rules is presented. Although the framework is surprisingly simple, it allows to model many essential features of active rules, including immediate and deferred rule execution, and composite events. Different alternatives for enforcing termination are proposed leading to tractable subclasses of the language. Finally, we show that certain classes of Statelog programs correspond to Datalog programs with production rule semantics (i.e., with inflationary or noninflationary fixpoint semantics).

[Book]

Abstract WLP'98 13. Workshop logische Programmierung (WLP'98) Vienna, Austria October, 6-8 1998 10 06 Technical Report 1843-1998-10, Institut für Informationssysteme, TU Wien Nonmonotonic Inheritance in Object-Oriented Deductive Database Languages Wolfgang May Paul-Thomas Kandzia Inheritance

Deductive object-oriented frameworks integrate logic rules and inheritance. There, specific problems arise: Due to the combination of deduction and inheritance, (a) deduction can take place depending on inherited facts, thus raising indirect conflicts, and (b) also the class hierarchy and -membership is subject to deduction.

From this point of view, we investigate the application of the extension semantics of Default Logic to deductive object-oriented database languages.

By restricting the problem to Horn programs and a special type of defaults tailored to the semantics of inheritance, a forward-chaining construction of a Herbrand-style representation of extensions is possible.

This construction is compared with a solution as implemented in the F-Logic system Florid which is based on a combination of classical deductive fixpoints and an inheritance-trigger mechanism.

From the F-Logic point of view, the main contribution of the report is to provide a connection between inheritance-canonic models as defined in [Kifer-Lausen-Wu-JACM-95] and classical AI frameworks.

[ps-File]
[Slides]

The full paper, Technical Report 114 , Institut für Informatik, Universität Freiburg, Jan. 1999.

A revised version is published in Journal of Logic and Computation, 11(4), 2001.

TR 131: Inheritance Technical Report 114 Freiburg, Germany Universität Freiburg Institut für Informatik 114 January 1999 1 Nonmonotonic Inheritance in Object-Oriented Deductive Database Languages Wolfgang May Paul-Thomas Kandzia Inheritance

From this point of view, we investigate the application of the extension semantics of Default Logic to deductive object-oriented database languages.

This construction is compared with a solution as implemented in the F-Logic system FLORID which is based on a combination of classical deductive fixpoints and an inheritance-trigger mechanism.

From the F-Logic point of view, the main contribution of the report is to investigate the relationship between inheritance-canonic models as defined in [Kifer-Lausen-Wu-JACM-95] and classical AI frameworks: we show that the semantics which is defined and implemented for F-Logic coincides with the standard semantics of Default Logic and Inheritance Networks. In this report, we restrict ourselves to scalar methods.

[ps-File]

A preliminary version has been presented at 13. Workshop logische Programmierung - WLP'98 , Vienna, Oct. 6-8, 1998.

A revised version is published in Journal of Logic and Computation, 11(4), 2001.

TR 131: The Mondial Case Study Technical Report 131 Freiburg, Germany Universität Freiburg Institut für Informatik 131 December 1999 12 Information Extraction and Integration with FLORID: The MONDIAL Case Study Wolfgang May

For accessing and processing this information provided on the Web, there is a need for integration of data from different, heterogeneous sources. Languages for this purpose have to serve for querying the web, extracting information from semistructured data, and restructuring the results. In [LHL+98] we argue that languages supporting deduction and object-orientation are particularly suited in this context; we proposed a formal model for querying structure and contents of Web data. A main advantage of our approach is that it brings together the above-mentioned issues in a unified, formal framework. The approach is implemented in the FLORID system [HLL98] which is an implementation of the deductive object-oriented database language F-Logic [KLW95].

This report substantiates the above claims by a case-study using FLORID: We show how several information sources on the Web containing political and geographical data are integrated to a geographical database using FLORID. The case study illustrates the trade-off gained from an integrated Web-querying and data manipulation language, supporting a concise and elegant programming style. Using a deductive language, a process of rapid prototyping and refinement of the program -- implementing both a wrapper and a mediator -- can be easily followed: the program consists of a skeleton of generic wrapping rules [MHL+99], augmented by refining rules and application-specific rules.

[ps-File]

The report Information Extraction from the Web describes the underlying Framework.
The homepage of the Mondial database contains the F-Logic programs and the database in F-Logic, Oracle, and XML formats.

Dagstuhl-Seminar: Web Dagstuhl-Seminar "Declarative Data Access on the Web" Schloss Dagstuhl, Germany September 12-17 1999 9 12 Dagstuhl-Report No. 251 Information Extraction from the Web with FLORID Wolfgang May Information Extraction from the Web with FLORID

The talk presents an integrated architecture where Web exploration, wrapping, mediation, and querying is done in a monolithic system. The system is based on a unified framework -- i.e., data model and language -- in which all tasks are done. We regard the Web and its contents as a unit, represented in an object-oriented data model: the Web structure, given by its hyperlinks, the parse-trees of Web pages, and its contents all becomes part of the internal world model of the system. The advantage of this unified view is that the same data manipulation and querying language can be used for the Web structure and the application-semantic model: The model is complemented by a rule-based object-oriented language which is extended by Web access capabilities and structured document analysis and allows for accessing the Web, wrapping, mediating, and querying information. Due to this integration, a system in this architecture can be equipped with Web navigation and exploration functionality. We present generic rule patterns for typical extraction, integration, and restructuring tasks using this framework. We show the practicability of our approach by using the FLORID system. The approach is illustrated by two case-studies.

[Slides]

Abstract IDM99 International Workshop on Internet Data Management (IDM'99) Firenze, Italy September 2 1999 9 2 Proc. DEXA 99 Workshop, IEEE Computer Society Press 721-725 Modeling and Querying Structure and Contents of the Web Wolfgang May Information Extraction

For accessing and processing the information provided on the Web, there is a need for extraction, restructuring, and integration of semistructured data from autonomous, heterogeneous sources. In this paper, we regard the Web and its contents as a unit, represented in an object-oriented data model: the Web structure (inter-document level), given by its hyperlinks, the parse-trees of Web pages (intra-document level), and their contents. The model is complemented by a rule-based object-oriented language which is extended by Web access capabilities and allows for and navigation in the unified model. We show the practicability of our approach by using the FLORID system.

[The Paper (IEEE Digital Library)]
[Slides]

Information Extraction from the Web with FLORID Guest talk Stony Brook, NY June 14 1999 6 14 Information Extraction from the Web with FLORID Wolfgang May Information Extraction

FLORID is an implementation of F-Logic by the database group at the University of Freiburg (Germany). In the talk, the Web extension of FLORID is presented. It allows for wrapping, restructuring and integrating data from the Web, in a unified framework by using F-Logic rules as unique language for programming and querying. The object-oriented Web Model is based on the classes url and webdoc for representing the skeleton of a relevant Web fragment. The intra-document structure is represented by parse-trees which are integrated into the Web skeleton. In the information retrieval task, objects in the extended Web skeleton are identified and restructured into an object-oriented model of the application domain. The wrapping task is done by analyzing the F-Logic representation of the parse-tree and matching with perl regular expressions. The approach is illustrated by a case study which integrates geographical data from different sources.

Structure of the talk:

The Web Model: What we can do with FLORID for Web Data Extraction
Implementation: How is this implemented in FLORID
Practice: How is it used?
Demonstration: The Mondial Case Study
Lessons we have learnt and further ideas.

[Slides]

Abstract TABLEAUX'97 International Conference on Analytic Tableaux and Related Methods (TABLEAUX'99) Albany, NY June 7-11 1999 6 7 LNCS 1617, Springer 232-246 A Tableau Calculus for a Temporal Logic with Temporal Connectives Wolfgang May Dynamics Formal Methods Temporal Logics

The paper presents a tableau calculus for a linear time temporal logic for reasoning about processes and events in concurrent systems. The logic is based on temporal connectives in the style of Transaction Logic and explicit quantification over states. The language extends first-order logic with sequential and parallel conjunction, parallel disjunction, and temporal implication. Explicit quantification over states via state variables allows to express temporal properties which cannot be formulated in modal logics. Using the tableau representation of temporal Kripke structures presented for CTL which represents states by prefix terms, explicit quantification over states is integrated into the tableau calculus by an adaptation of the delta-rule from first-order tableau calculi to the linear ordering of the universe of states. Complementing the CTL calculus, the paper shows that this tableau representation is both suitable for modal temporal logics and for logics using temporal connectives.

[postscript]
[Slides]

Information Extraction from the Web Guest talk Database and Artificial Intelligence Group at TU Vienna, Austria November 5 1999 11 5 Information Extraction from the Web with FLORID Wolfgang May Information Extraction

FLORID is an implementation of F-Logic by the database group at the University of Freiburg (Germany). In the talk, the Web extension of FLORID is presented. It allows for wrapping, restructuring and integrating data from the Web, in a unified framework by using F-Logic rules as unique language for programming and querying. The object-oriented Web Model is based on the classes url and webdoc for representing the skeleton of a relevant Web fragment. The intra-document structure is represented by parse-trees which are integrated into the Web skeleton. In the information retrieval task, objects in the extended Web skeleton are identified and restructured into an object-oriented model of the application domain. The wrapping task is done by analyzing the F-Logic representation of the parse-tree and by matching with perl regular expressions. The approach is illustrated by two case studies.

[Slides]

Abstract WWWCM99 International Workshop on the World-Wide Web and Conceptual Modeling (WWWCM'99) Paris, France November 15-18 1999 11 15 LNCS 1727, Springer 307-320 A Unified Framework for Wrapping, Mediating and Restructuring Information from the Web Wolfgang May Bertram Ludäscher Georg Lausen Rainer Himmeröder World-Wide Web and Conceptual Modeling

The goal of information extraction from the Web is to provide an integrated view on heterogeneous information sources via a common data model and query language. A main problem with current approaches is that they rely on very different formalisms and tools for wrappers and mediators, thus leading to an "impedance mismatch" between the wrapper and mediator level. In contrast, our approach integrates wrapping and mediation in a unified framework based on an object-oriented data model which represents both the Web structure and the data of the application domain. Wrappers and mediators are written in a rule-based object-oriented language which is augmented with features for Web access and structured document analysis, i.e., pattern matching by regular expressions and SGML parsing. In this paper, we develop generic, reusable rule patterns for typical extraction, integration, and restructuring tasks using this framework. We show the practicability of our approach by using the FLORID system.

[postscript]
[Slides]