![]() |
|
||||||||||||
|
|||||||||||||
|
|
Lehigh CSE 2003 Technical ReportsLU-CSE-03-001A Semi-supervised Algorithm for Pattern Discovery in Information Extraction from Textual DataTianhao Wu and William M. Pottenger In this article we present a semi-supervised algorithm for pattern discovery in information extraction from textual data. The patterns that are discovered take the form of regular expressions that generate regular languages. The regular languages consist of various representations of assorted features that are useful in information extraction. An example of such a regular language is the textual representations used to express a suspect’s height in a collection of police incident reports. We term our approach 'semi-supervised' because it requires significantly less effort to develop a training set than other approaches. Instead of labeling the exact location of features in a training set, the training-set developer need only record whether a specific feature of interest occurs in a sentence segment. From this training data our algorithm automatically generates regular expressions that can be used on previously unseen data for information extraction. Our experiments show that the algorithm has good testing performance on many features that are important in the fight against terrorism. PDF (14 pages, 70KB) LU-CSE-03-002Classification of Emotions in Internet Chat: An Application of Machine Learning Using Speech PhonemesLars E. Holzman and William M. Pottenger This article reports our progress in the classification of expressions of emotion in network-based chat conversations. Emotion detection of this nature is currently an active area of research. We detail a linguistic approach to the tagging of chat conversation with appropriate emotion tags. In our approach, textual chat messages are automatically converted into speech and then instance vectors are generated from frequency counts of speech phonemes present in each message. In combination with other statistically derived attributes, the instance vectors are used in various machine-learning frameworks to build classifiers for emotional content. Based on the standard metrics of precision and recall, we report results exceeding 90% accuracy when employing k-nearest-neighbor learning. Our approach has thus shown promise in discriminating emotional from non-emotional content in independent testing. PDF (8 pages, 253KB) LU-CSE-03-003Enforcing a lips Usage Policy for CORBA ComponentsWayne DePrince jr. and Christine Hofmeister Software components promise easy reuse, dependability, and simplified development. Problems arise when implicit assumptions about the use of the component are encoded in the implementation but not communicated to the user. One solution to this problem is to formally specify these constraints about a component's use. Once specified, these usage constraints can be statically verified or dynamically enforced. This dynamic enforcement code can be provided by either the developer or automatically generated. Our research project, lips, is a language for specifying usage constraints and a toolset for automatically generating dynamic code to enforce them. In this paper we present the notion of a virtual client and show how this is critical for ensuring correct usage of a component. We discuss our experiences providing automatic enforcement of usage constraints for CORBA components: While much of the needed support can be provided easily using a container concept, support for virtual clients requires more fundamental changes in a component model such as CORBA. PDF (9 pages, 65KB) LU-CSE-03-004Specifying Architectural Constraints on ComponentsWayne DePrince jr. and Christine Hofmeister Research to improve component reuse has focused on providing the specification of various behavior properties. In this paper we present our approach to this problem, which focuses not so much on specifying the behavior of the component, but instead on its architectural constraints. We introduce our research project “lips”, a language for formally capturing these usage constraints and a toolset for automatically providing for their enforcement at runtime. Our approach captures the architectural constraints that are local to a particular component. In this way we express these restrictions on its reuse independent of an actual client or application. We then embed these constraints within the component’s specification, and use it to generate code which will enforce the constraints at runtime. PDF (6 pages, 48KB) LU-CSE-03-005Overcoming Misconceptions About Computer Science With MultimediaSally Hiestand, Fang Wei and Glenn D. Blank Preconceived ideas about computer science may discourage students, especially women and minorities, from pursuing study in the field. Many of these common, but negative stereotypes are misconceptions. We address these misconceptions in multimedia courseware developed for a CS0 or CS1 course covering a breadth of topics in computer science. Experimental results show that the multimedia overcomes negative stereotypes, including a couple that are more pronounced for women. We discuss implications of these results for computer science curricula. PDF (5 pages, 863KB)
LU-CSE-03-006Personalized Web Prefetching In MozillaWei Zhang, David B. Lewanda, Christopher D. Janneck and Brian D. Davison This paper presents the design and implementation of a Web prefetching module in Mozilla, an open-source and cross-platform browser. We have incorporated two kinds of predictors: a history-based predictor and a content-based predictor. These two predictors can analyze a user's behavior and the contents of recent HTML pages to predict likely next links; thus, they provide personalized predictions which are then utilized to determine which resources should be prefetched. PDF (12 pages, 205KB)
|
|||||||
![]() |
||||||||
|
|