Gjør som tusenvis av andre bokelskere
Abonner på vårt nyhetsbrev og få rabatter og inspirasjon til din neste leseopplevelse.
Ved å abonnere godtar du vår personvernerklæring.Du kan når som helst melde deg av våre nyhetsbrev.
This book provides a principled data-driven framework that progressively constructs, enriches, and applies taxonomies without leveraging massive human annotated data. Traditionally, people construct domain-specific taxonomies by extensive manual curations, which is time-consuming and costly. In today's information era, people are inundated with the vast amounts of text data. Despite their usefulness, people haven't yet exploited the full power of taxonomies due to the heavy curation needed for creating and maintaining them. To bridge this gap, the authors discuss automated taxonomy discovery and exploration, with an emphasis on label-efficient machine learning methods and their real-world usages. Taxonomy organizes entities and concepts in a hierarchy way. It is ubiquitous in our daily life, ranging from product taxonomies used by online retailers, topic taxonomies deployed by news outlets and social media, as well as scientific taxonomies deployed by digital libraries across various domains. When properly analyzed, these taxonomies can play a vital role for science, engineering, business intelligence, policy design, e-commerce, and more. Intuitive examples are used throughout enabling readers to grasp concepts more easily.
The "e;big data"e; era is characterized by an explosion of information in the form of digital data collections, ranging from scientific knowledge, to social media, news, and everyone's daily life. Examples of such collections include scientific publications, enterprise logs, news articles, social media, and general web pages. Valuable knowledge about multi-typed entities is often hidden in the unstructured or loosely structured, interconnected data. Mining latent structures around entities uncovers hidden knowledge such as implicit topics, phrases, entity roles and relationships. In this monograph, we investigate the principles and methodologies of mining latent entity structures from massive unstructured and interconnected data. We propose a text-rich information network model for modeling data in many different domains. This leads to a series of new principles and powerful methodologies for mining latent structures, including (1) latent topical hierarchy, (2) quality topical phrases, (3) entity roles in hierarchical topical communities, and (4) entity relations. This book also introduces applications enabled by the mined structures and points out some promising research directions.
The real-world data, though massive, is largely unstructured, in the form of natural-language text. It is challenging but highly desirable to mine structures from massive text data, without extensive human annotation and labeling. In this book, we investigate the principles and methodologies of mining structures of factual knowledge (e.g., entities and their relationships) from massive, unstructured text corpora. Departing from many existing structure extraction methods that have heavy reliance on human annotated data for model training, our effort-light approach leverages human-curated facts stored in external knowledge bases as distant supervision and exploits rich data redundancy in large text corpora for context understanding. This effort-light mining approach leads to a series of new principles and powerful methodologies for structuring text corpora, including (1) entity recognition, typing and synonym discovery, (2) entity relation extraction, and (3) open-domain attribute-value mining and information extraction. This book introduces this new research frontier and points out some promising research directions.
Unstructured text, as one of the most important data forms, plays a crucial role in data-driven decision making in domains ranging from social networking and information retrieval to scientific research and healthcare informatics. In many emerging applications, people's information need from text data is becoming multidimensional-they demand useful insights along multiple aspects from a text corpus. However, acquiring such multidimensional knowledge from massive text data remains a challenging task.This book presents data mining techniques that turn unstructured text data into multidimensional knowledge. We investigate two core questions. (1) How does one identify task-relevant text data with declarative queries in multiple dimensions? (2) How does one distill knowledge from text data in a multidimensional space? To address the above questions, we develop a text cube framework. First, we develop a cube construction module that organizes unstructured data into a cube structure, by discovering latent multidimensional and multi-granular structure from the unstructured text corpus and allocating documents into the structure. Second, we develop a cube exploitation module that models multiple dimensions in the cube space, thereby distilling from user-selected data multidimensional knowledge. Together, these two modules constitute an integrated pipeline: leveraging the cube structure, users can perform multidimensional, multigranular data selection with declarative queries; and with cube exploitation algorithms, users can extract multidimensional patterns from the selected data for decision making.The proposed framework has two distinctive advantages when turning text data into multidimensional knowledge: flexibility and label-efficiency. First, it enables acquiring multidimensional knowledge flexibly, as the cube structure allows users to easily identify task-relevant data along multiple dimensions at varied granularities and further distill multidimensional knowledge. Second, the algorithms for cube construction and exploitation require little supervision; this makes the framework appealing for many applications where labeled data are expensive to obtain.
A lot of digital ink has been spilled on "e;big data"e; over the past few years. Most of this surge owes its origin to the various types of unstructured data in the wild, among which the proliferation of text-heavy data is particularly overwhelming, attributed to the daily use of web documents, business reviews, news, social posts, etc., by so many people worldwide.A core challenge presents itself: How can one efficiently and effectively turn massive, unstructured text into structured representation so as to further lay the foundation for many other downstream text mining applications?In this book, we investigated one promising paradigm for representing unstructured text, that is, through automatically identifying high-quality phrases from innumerable documents. In contrast to a list of frequent n-grams without proper filtering, users are often more interested in results based on variable-length phrases with certain semantics such as scientific concepts, organizations, slogans, and so on. We propose new principles and powerful methodologies to achieve this goal, from the scenario where a user can provide meaningful guidance to a fully automated setting through distant learning. This book also introduces applications enabled by the mined phrases and points out some promising research directions.
Abonner på vårt nyhetsbrev og få rabatter og inspirasjon til din neste leseopplevelse.
Ved å abonnere godtar du vår personvernerklæring.