Udvidet returret til d. 31. januar 2025

Data mining

Her finder du spændende bøger om Data mining. Nedenfor er et flot udvalg af over 394 bøger om emnet.
Vis mere
Filter
Filter
Sorter efterSorter Populære
  • af Josep Carmona
    542,95 kr.

    This is an open access book. This book comprises all the single courses given as part of the First Summer School on Process Mining, PMSS 2022, which was held in Aachen, Germany, during July 4-8, 2022. This volume contains 17 chapters organized into the following topical sections: Introduction; process discovery; conformance checking; data preprocessing; process enhancement and monitoring; assorted process mining topics; industrial perspective and applications; and closing.

  • af Bogumil Kaminski
    609,95 kr.

  • af Zygmunt Vetulani, Marek Kubis & Patrick Paroubek
    1.207,95 kr.

  • af Michael A. Bekos
    609,95 kr.

    This book focusses on techniques for automating the procedure of creating external labelings, also known as callout labelings. In this labeling type, the features within an illustration are connected by thin leader lines (called leaders) with their labels, which are placed in the empty space surrounding the image. In general, textual labels describing graphical features in maps, technical illustrations (such as assembly instructions or cutaway illustrations), or anatomy drawings are an important aspect of visualization that convey information on the objects of the visualization and help the reader understand what is being displayed. Most labeling techniques can be classified into two main categories depending on the "e;distance"e; of the labels to their associated features. Internal labels are placed inside or in the direct neighborhood of features, while external labels, which form the topic of this book, are placed in the margins outside the illustration, where they do not occlude the illustration itself. Both approaches form well-studied topics in diverse areas of computer science with several important milestones. The goal of this book is twofold. The first is to serve as an entry point for the interested reader who wants to get familiar with the basic concepts of external labeling, as it introduces a unified and extensible taxonomy of labeling models suitable for a wide range of applications. The second is to serve as a point of reference for more experienced people in the field, as it brings forth a comprehensive overview of a wide range of approaches to produce external labelings that are efficient either in terms of different algorithmic optimization criteria or in terms of their usability in specific application domains. The book mostly concentrates on algorithmic aspects of external labeling, but it also presents various visual aspects that affect the aesthetic quality and usability of external labeling.

  • af Alvitta Ottley
    440,95 kr.

    There is ample evidence in the visualization community that individual differences matter. These prior works highlight various personality traits and cognitive abilities that can modulate the use of the visualization systems and demonstrate a measurable influence on speed, accuracy, process, and attention. Perhaps the most important implication of this body of work is that we can use individual differences as a mechanism for estimating when a design is effective or to identify when people may struggle with visualization designs.These effects can have a critical impact on consequential decision-making processes. One study that appears in this book investigated the impact of visualization on medical decision-making showed that visual aides tended to be most beneficial for people with high spatial ability, a metric that measures a person's ability to represent and manipulate two- or three-dimensional representations of objects mentally. The results showed that participants with low spatial ability had difficulty interpreting and analyzing the underlying medical data when they use visual aids. Overall, approximately 50% of the studied population were unsupported by the visualization tools when making a potentially life-critical decision. As data fluency continues to become an essential skill for our everyday lives, we must embrace the growing need to understand the factors that may render our tools ineffective and identify concrete steps for improvement.This book presents my current understanding of how individual differences in personality interact with visualization use and draws from recent research in the Visualization, Human-Computer Interaction, and Psychology communities. We focus on the specific designs and tasks for which there is concrete evidence of performance divergence due to personality. Additionally, we highlight an exciting research agenda that is centered around creating tailored visualization systems that are aligned with people's abilities. The purpose of this book is to underscore the need to consider individual differences when designing and evaluating visualization systems and to call attention to this critical research direction.

  • af Ross Maciejewski
    315,95 kr.

    Analytical reasoning techniques are methods by which users explore their data to obtain insight and knowledge that can directly support situational awareness and decision making. Recently, the analytical reasoning process has been augmented through the use of interactive visual representations and tools which utilize cognitive, design and perceptual principles. These tools are commonly referred to as visual analytics tools, and the underlying methods and principles have roots in a variety of disciplines. This chapter provides an introduction to young researchers as an overview of common visual representations and statistical analysis methods utilized in a variety of visual analytics systems. The application and design of visualization and analytical algorithms are subject to design decisions, parameter choices, and many conflicting requirements. As such, this chapter attempts to provide an initial set of guidelines for the creation of the visual representation, including pitfalls and areas where the graphics can be enhanced through interactive exploration. Basic analytical methods are explored as a means of enhancing the visual analysis process, moving from visual analysis to visual analytics. Table of Contents: Data Types / Color Schemes / Data Preconditioning / Visual Representations and Analysis / Summary

  • af Fintan McGee
    563,95 kr.

    The emergence of multilayer networks as a concept from the field of complex systems provides many new opportunities for the visualization of network complexity, and has also raised many new exciting challenges. The multilayer network model recognizes that the complexity of relationships between entities in real-world systems is better embraced as several interdependent subsystems (or layers) rather than a simple graph approach. Despite only recently being formalized and defined, this model can be applied to problems in the domains of life sciences, sociology, digital humanities, and more. Within the domain of network visualization there already are many existing systems, which visualize data sets having many characteristics of multilayer networks, and many techniques, which are applicable to their visualization. In this Synthesis Lecture, we provide an overview and structured analysis of contemporary multilayer network visualization. This is not only for researchers in visualization, but also for those who aim to visualize multilayer networks in the domain of complex systems, as well as those solving problems within application domains. We have explored the visualization literature to survey visualization techniques suitable for multilayer network visualization, as well as tools, tasks, and analytic techniques from within application domains. We also identify the research opportunities and examine outstanding challenges for multilayer network visualization along with potential solutions and future research directions for addressing them.

  • af Alex Endert
    394,95 kr.

    This book discusses semantic interaction, a user interaction methodology for visual analytic applications that more closely couples the visual reasoning processes of people with the computation. This methodology affords user interaction on visual data representations that are native to the domain of the data.User interaction in visual analytics systems is critical to enabling visual data exploration. Interaction transforms people from mere viewers to active participants in the process of analyzing and understanding data. This discourse between people and data enables people to understand aspects of their data, such as structure, patterns, trends, outliers, and other properties that ultimately result in insight. Through interacting with visualizations, users engage in sensemaking, a process of developing and understanding relationships within datasets through foraging and synthesis.The book provides a description of the principles of semantic interaction, providing design guidelines for the integration of semantic interaction into visual analytics, examples of existing technologies that leverage semantic interaction, and a discussion of how to evaluate these technologies. Semantic interaction has the potential to increase the effectiveness of visual analytic technologies and opens possibilities for a fundamentally new design space for user interaction in visual analytics systems.

  • af Tamara Munzner & Heidi Lam
    319,95 kr.

    Displaying multiple levels of data visually has been proposed to address the challenge of limited screen space. Although many previous empirical studies have addressed different aspects of this question, the information visualization research community does not currently have a clearly articulated consensus on how, when, or even if displaying data at multiple levels is effective.To shed more light on this complex topic, we conducted a systematic review of 22 existing multi-level interface studies to extract high-level design guidelines. To facilitate discussion, we cast our analysis findings into a four-point decision tree: (1) When are multi-level displays useful? (2) What should the higher visual levels display? (3) Should the different visual levels be displayed simultaneously, or one at a time? (4) Should the visual levels be embedded in a single display, or separated into multiple displays? Our analysis resulted in three design guidelines: (1) the number of levels in display anddata should match; (2) high visual levels should only display task-relevant information; (3) simultaneous display, rather than temporal switching, is suitable for tasks with multi-level answers.Table of Contents: Introduction / Terminology / Methodology / Summary of Studies / Decision 1: Single or Multi-level Interface? / Decision 2: How to Create the High-Level Displays? / Decision 3: Simultaneous or Temporal Displays of the Multiple Visual Levels / Decision 4: How to Spatially Arrange the Visual Levels, Embedded or Separate? / Limitations of Study / Design Recommendations / Discussion and Future Work

  • af Kamran Sedig
    616,95 kr.

    Interest in visualization design has increased in recent years. While there is a large body of existing work from which visualization designers can draw, much of the past research has focused on developing new tools and techniques that are aimed at specific contexts. Less focus has been placed on developing holistic frameworks, models, and theories that can guide visualization design at a general level-a level that transcends domains, data types, users, and other contextual factors. In addition, little emphasis has been placed on the thinking processes of designers, including the concepts that designers use, while they are engaged in a visualization design activity. In this book we present a general, holistic framework that is intended to support visualization design for human-information interaction. The framework is composed of a number of conceptual elements that can aid in design thinking. The core of the framework is a pattern language-consisting of a set of 14 basic, abstract patterns-and a simple syntax for describing how the patterns are blended. We also present a design process, made up of four main stages, for creating static or interactive visualizations. The 4-stage design process places the patterns at the core of designers' thinking, and employs a number of conceptual tools that help designers think systematically about creating visualizations based on the information they intend to represent. Although the framework can be used to design static visualizations for simple tasks, its real utility can be found when designing visualizations with interactive possibilities in mind-in other words, designing to support a human-information interactive discourse. This is especially true in contexts where interactive visualizations need to support complex tasks and activities involving large and complex information spaces. The framework is intended to be general and can thus be used to design visualizations for diverse domains, users, information spaces, and tasks in different fields such as business intelligence, health and medical informatics, digital libraries, journalism, education, scientific discovery, and others. Drawing from research in multiple disciplines, we introduce novel concepts and terms that can positively contribute to visualization design practice and education, and will hopefully stimulate further research in this area.

  • af Martin Falk
    396,95 kr.

    Prevalent types of data in scientific visualization are volumetric data, vector field data, and particle-based data. Particle data typically originates from measurements and simulations in various fields, such as life sciences or physics. The particles are often visualized directly, that is, by simple representants like spheres. Interactive rendering facilitates the exploration and visual analysis of the data. With increasing data set sizes in terms of particle numbers, interactive high-quality visualization is a challenging task. This is especially true for dynamic data or abstract representations that are based on the raw particle data. This book covers direct particle visualization using simple glyphs as well as abstractions that are application-driven such as clustering and aggregation. It targets visualization researchers and developers who are interested in visualization techniques for large, dynamic particle-based data. Its explanations focus on GPU-accelerated algorithms for high-performance rendering and data processing that run in real-time on modern desktop hardware. Consequently, the implementation of said algorithms and the required data structures to make use of the capabilities of modern graphics APIs are discussed in detail. Furthermore, it covers GPU-accelerated methods for the generation of application-dependent abstract representations. This includes various representations commonly used in application areas such as structural biology, systems biology, thermodynamics, and astrophysics.

  • af Ron Metoyer
    486,95 kr.

    At the 2016 IEEE VIS Conference in Baltimore, Maryland, a panel of experts from the Scientific Visualization (SciVis) community gathered to discuss why the SciVis component of the conference had been shrinking significantly for over a decade. As the panelists concluded and opened the session to questions from the audience, Annie Preston, a Ph.D. student at the University of California, Davis, asked whether the panelists thought diversity or, more specifically, the lack of diversity was a factor.This comment ignited a lively discussion of diversity: not only its impact on Scientific Visualization, but also its role in the visualization community at large. The goal of this book is to expand and organize the conversation. In particular, this book seeks to frame the diversity and inclusion topic within the Visualization community, illuminate the issues, and serve as a starting point to address how to make this community more diverse and inclusive. This book acknowledges that diversity is a broad topic with many possible meanings. Expanded definitions of diversity that are relevant to the Visualization community and to computing at large are considered. The broader conversation of inclusion and diversity is framed within the broader sociological context in which it must be considered. Solutions to recruit and retain a diverse research community and strategies for supporting inclusion efforts are presented. Additionally, community members present short stories detailing their "e;"e;non-inclusive"e;"e; experiences in an effort to facilitate a community-wide conversation surrounding very difficult situations.It is important to note that this is by no means intended to be a comprehensive, authoritative statement on the topic. Rather, this book is intended to open the conversation and begin to build a framework for diversity and inclusion in this specific research community. While intended for the Visualization community, ideally, this book will provide guidance for any computing community struggling with similar issues and looking for solutions.

  • af Nicola Barbieri
    404,95 kr.

    The importance of accurate recommender systems has been widely recognized by academia and industry, and recommendation is rapidly becoming one of the most successful applications of data mining and machine learning. Understanding and predicting the choices and preferences of users is a challenging task: real-world scenarios involve users behaving in complex situations, where prior beliefs, specific tendencies, and reciprocal influences jointly contribute to determining the preferences of users toward huge amounts of information, services, and products. Probabilistic modeling represents a robust formal mathematical framework to model these assumptions and study their effects in the recommendation process. This book starts with a brief summary of the recommendation problem and its challenges and a review of some widely used techniques Next, we introduce and discuss probabilistic approaches for modeling preference data. We focus our attention on methods based on latent factors, such as mixture models, probabilistic matrix factorization, and topic models, for explicit and implicit preference data. These methods represent a significant advance in the research and technology of recommendation. The resulting models allow us to identify complex patterns in preference data, which can be exploited to predict future purchases effectively. The extreme sparsity of preference data poses serious challenges to the modeling of user preferences, especially in the cases where few observations are available. Bayesian inference techniques elegantly address the need for regularization, and their integration with latent factor modeling helps to boost the performances of the basic techniques. We summarize the strengths and weakness of several approaches by considering two different but related evaluation perspectives, namely, rating prediction and recommendation accuracy. Furthermore, we describe how probabilistic methods based on latent factors enable the exploitation of preference patterns in novel applications beyond rating prediction or recommendation accuracy. We finally discuss the application of probabilistic techniques in two additional scenarios, characterized by the availability of side information besides preference data. In summary, the book categorizes the myriad probabilistic approaches to recommendations and provides guidelines for their adoption in real-world situations.

  • af Manish Gupta
    397,95 kr.

    Outlier (or anomaly) detection is a very broad field which has been studied in the context of a large number of research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatio-temporal mining, etc. Initial research in outlier detection focused on time series-based outliers (in statistics). Since then, outlier detection has been studied on a large variety of data types including high-dimensional data, uncertain data, stream data, network data, time series data, spatial data, and spatio-temporal data. While there have been many tutorials and surveys for general outlier detection, we focus on outlier detection for temporal data in this book. A large number of applications generate temporal datasets. For example, in our everyday life, various kinds of records like credit, personnel, financial, judicial, medical, etc., are all temporal. This stresses the need for an organized and detailed study of outliers with respect to such temporal data. In the past decade, there has been a lot of research on various forms of temporal data including consecutive data snapshots, series of data snapshots and data streams. Besides the initial work on time series, researchers have focused on rich forms of data including multiple data streams, spatio-temporal data, network data, community distribution data, etc. Compared to general outlier detection, techniques for temporal outlier detection are very different. In this book, we will present an organized picture of both recent and past research in temporal outlier detection. We start with the basics and then ramp up the reader to the main ideas in state-of-the-art outlier detection techniques. We motivate the importance of temporal outlier detection and brief the challenges beyond usual outlier detection. Then, we list down a taxonomy of proposed techniques for temporal outlier detection. Such techniques broadly include statistical techniques (like AR models, Markov models, histograms, neural networks), distance- and density-based approaches, grouping-based approaches (clustering, community detection), network-based approaches, and spatio-temporal outlier detection approaches. We summarize by presenting a wide collection of applications where temporal outlier detection techniques have been applied to discover interesting outliers. Table of Contents: Preface / Acknowledgments / Figure Credits / Introduction and Challenges / Outlier Detection for Time Series and Data Sequences / Outlier Detection for Data Streams / Outlier Detection for Distributed Data Streams / Outlier Detection for Spatio-Temporal Data / Outlier Detection for Temporal Network Data / Applications of Outlier Detection for Temporal Data / Conclusions and Research Directions / Bibliography / Authors' Biographies

  • af Chi Wang
    564,95 kr.

    The "e;big data"e; era is characterized by an explosion of information in the form of digital data collections, ranging from scientific knowledge, to social media, news, and everyone's daily life. Examples of such collections include scientific publications, enterprise logs, news articles, social media, and general web pages. Valuable knowledge about multi-typed entities is often hidden in the unstructured or loosely structured, interconnected data. Mining latent structures around entities uncovers hidden knowledge such as implicit topics, phrases, entity roles and relationships. In this monograph, we investigate the principles and methodologies of mining latent entity structures from massive unstructured and interconnected data. We propose a text-rich information network model for modeling data in many different domains. This leads to a series of new principles and powerful methodologies for mining latent structures, including (1) latent topical hierarchy, (2) quality topical phrases, (3) entity roles in hierarchical topical communities, and (4) entity relations. This book also introduces applications enabled by the mined structures and points out some promising research directions.

  • af Geoffrey Barbier
    296,95 kr.

    Social media shatters the barrier to communicate anytime anywhere for people of all walks of life. The publicly available, virtually free information in social media poses a new challenge to consumers who have to discern whether a piece of information published in social media is reliable. For example, it can be difficult to understand the motivations behind a statement passed from one user to another, without knowing the person who originated the message. Additionally, false information can be propagated through social media, resulting in embarrassment or irreversible damages. Provenance data associated with a social media statement can help dispel rumors, clarify opinions, and confirm facts. However, provenance data about social media statements is not readily available to users today. Currently, providing this data to users requires changing the social media infrastructure or offering subscription services. Taking advantage of social media features, research in this nascent field spearheads the search for a way to provide provenance data to social media users, thus leveraging social media itself by mining it for the provenance data. Searching for provenance data reveals an interesting problem space requiring the development and application of new metrics in order to provide meaningful provenance data to social media users. This lecture reviews the current research on information provenance, explores exciting research opportunities to address pressing needs, and shows how data mining can enable a social media user to make informed judgements about statements published in social media. Table of Contents: Information Provenance in Social Media / Provenance Attributes / Provenance via Network Information / Provenance Data

  • af James M. McCracken
    514,95 kr.

    Many scientific disciplines rely on observational data of systems for which it is difficult (or impossible) to implement controlled experiments. Data analysis techniques are required for identifying causal information and relationships directly from such observational data. This need has led to the development of many different time series causality approaches and tools including transfer entropy, convergent cross-mapping (CCM), and Granger causality statistics. A practicing analyst can explore the literature to find many proposals for identifying drivers and causal connections in time series data sets. Exploratory causal analysis (ECA) provides a framework for exploring potential causal structures in time series data sets and is characterized by a myopic goal to determine which data series from a given set of series might be seen as the primary driver. In this work, ECA is used on several synthetic and empirical data sets, and it is found that all of the tested time series causality tools agree with each other (and intuitive notions of causality) for many simple systems but can provide conflicting causal inferences for more complicated systems. It is proposed that such disagreements between different time series causality tools during ECA might provide deeper insight into the data than could be found otherwise.

  • af Xiang Ren
    664,95 kr.

    The real-world data, though massive, is largely unstructured, in the form of natural-language text. It is challenging but highly desirable to mine structures from massive text data, without extensive human annotation and labeling. In this book, we investigate the principles and methodologies of mining structures of factual knowledge (e.g., entities and their relationships) from massive, unstructured text corpora. Departing from many existing structure extraction methods that have heavy reliance on human annotated data for model training, our effort-light approach leverages human-curated facts stored in external knowledge bases as distant supervision and exploits rich data redundancy in large text corpora for context understanding. This effort-light mining approach leads to a series of new principles and powerful methodologies for structuring text corpora, including (1) entity recognition, typing and synonym discovery, (2) entity relation extraction, and (3) open-domain attribute-value mining and information extraction. This book introduces this new research frontier and points out some promising research directions.

  • af Kai Shu
    609,95 kr.

    In the past decade, social media has become increasingly popular for news consumption due to its easy access, fast dissemination, and low cost. However, social media also enables the wide propagation of "e;fake news,"e; i.e., news with intentionally false information. Fake news on social media can have significant negative societal effects. Therefore, fake news detection on social media has recently become an emerging research area that is attracting tremendous attention. This book, from a data mining perspective, introduces the basic concepts and characteristics of fake news across disciplines, reviews representative fake news detection methods in a principled way, and illustrates challenging issues of fake news detection on social media. In particular, we discussed the value of news content and social context, and important extensions to handle early detection, weakly-supervised detection, and explainable detection. The concepts, algorithms, and methods described in this lecture can help harness the power of social media to build effective and intelligent fake news detection systems. This book is an accessible introduction to the study of detecting fake news on social media. It is an essential reading for students, researchers, and practitioners to understand, manage, and excel in this area. This book is supported by additional materials, including lecture slides, the complete set of figures, key references, datasets, tools used in this book, and the source code of representative algorithms. The readers are encouraged to visit the book website for the latest information:http://dmml.asu.edu/dfn/

  • af Guozhu Dong
    659,95 kr.

    This book presents pattern-based problem-solving methods for a variety of machine learning and data analysis problems. The methods are all based on techniques that exploit the power of group differences. They make use of group differences represented using emerging patterns (aka contrast patterns), which are patterns that match significantly different numbers of instances in different data groups. A large number of applications outside of the computing discipline are also included.Emerging patterns (EPs) are useful in many ways. EPs can be used as features, as simple classifiers, as subpopulation signatures/characterizations, and as triggering conditions for alerts. EPs can be used in gene ranking for complex diseases since they capture multi-factor interactions. The length of EPs can be used to detect anomalies, outliers, and novelties. Emerging/contrast pattern based methods for clustering analysis and outlier detection do not need distance metrics, avoiding pitfalls of the latter in exploratory analysis of high dimensional data. EP-based classifiers can achieve good accuracy even when the training datasets are tiny, making them useful for exploratory compound selection in drug design. EPs can serve as opportunities in opportunity-focused boosting and are useful for constructing powerful conditional ensembles. EP-based methods often produce interpretable models and results. In general, EPs are useful for classification, clustering, outlier detection, gene ranking for complex diseases, prediction model analysis and improvement, and so on.EPs are useful for many tasks because they represent group differences, which have extraordinary power. Moreover, EPs represent multi-factor interactions, whose effective handling is of vital importance and is a major challenge in many disciplines.Based on the results presented in this book, one can clearly say that patterns are useful, especially when they are linked to issues of interest.We believe that many effective ways to exploit group differences' power still remain to be discovered. Hopefully this book will inspire readers to discover such new ways, besides showing them existing ways, to solve various challenging problems.

  • af Huiji Gao
    395,95 kr.

    In recent years, there has been a rapid growth of location-based social networking services, such as Foursquare and Facebook Places, which have attracted an increasing number of users and greatly enriched their urban experience. Typical location-based social networking sites allow a user to "e;check in"e; at a real-world POI (point of interest, e.g., a hotel, restaurant, theater, etc.), leave tips toward the POI, and share the check-in with their online friends. The check-in action bridges the gap between real world and online social networks, resulting in a new type of social networks, namely location-based social networks (LBSNs). Compared to traditional GPS data, location-based social networks data contains unique properties with abundant heterogeneous information to reveal human mobility, i.e., "e;when and where a user (who) has been to for what,"e; corresponding to an unprecedented opportunity to better understand human mobility from spatial, temporal, social, and content aspects. The mining and understanding of human mobility can further lead to effective approaches to improve current location-based services from mobile marketing to recommender systems, providing users more convenient life experience than before. This book takes a data mining perspective to offer an overview of studying human mobility in location-based social networks and illuminate a wide range of related computational tasks. It introduces basic concepts, elaborates associated challenges, reviews state-of-the-art algorithms with illustrative examples and real-world LBSN datasets, and discusses effective evaluation methods in mining human mobility. In particular, we illustrate unique characteristics and research opportunities of LBSN data, present representative tasks of mining human mobility on location-based social networks, including capturing user mobility patterns to understand when and where a user commonly goes (location prediction), and exploiting user preferences and location profiles to investigate where and when a user wants to explore (location recommendation), along with studying a user's check-in activity in terms of why a user goes to a certain location.

  • af Danai Koutra
    665,95 kr.

    Graphs naturally represent information ranging from links between web pages, to communication in email networks, to connections between neurons in our brains. These graphs often span billions of nodes and interactions between them. Within this deluge of interconnected data, how can we find the most important structures and summarize them? How can we efficiently visualize them? How can we detect anomalies that indicate critical events, such as an attack on a computer system, disease formation in the human brain, or the fall of a company?This book presents scalable, principled discovery algorithms that combine globality with locality to make sense of one or more graphs. In addition to fast algorithmic methodologies, we also contribute graph-theoretical ideas and models, and real-world applications in two main areas:Individual Graph Mining: We show how to interpretably summarize a single graph by identifying its important graph structures. We complement summarization with inference, which leverages information about few entities (obtained via summarization or other methods) and the network structure to efficiently and effectively learn information about the unknown entities.Collective Graph Mining: We extend the idea of individual-graph summarization to time-evolving graphs, and show how to scalably discover temporal patterns. Apart from summarization, we claim that graph similarity is often the underlying problem in a host of applications where multiple graphs occur (e.g., temporal anomaly detection, discovery of behavioral patterns), and we present principled, scalable algorithms for aligning networks and measuring their similarity.The methods that we present in this book leverage techniques from diverse areas, such as matrix algebra, graph theory, optimization, information theory, machine learning, finance, and social science, to solve real-world problems. We present applications of our exploration algorithms to massive datasets, including a Web graph of 6.6 billion edges, a Twitter graph of 1.8 billion edges, brain graphs with up to 90 million edges, collaboration, peer-to-peer networks, browser logs, all spanning millions of users and interactions.

  • af Chao Zhang
    616,95 kr.

    Unstructured text, as one of the most important data forms, plays a crucial role in data-driven decision making in domains ranging from social networking and information retrieval to scientific research and healthcare informatics. In many emerging applications, people's information need from text data is becoming multidimensional-they demand useful insights along multiple aspects from a text corpus. However, acquiring such multidimensional knowledge from massive text data remains a challenging task.This book presents data mining techniques that turn unstructured text data into multidimensional knowledge. We investigate two core questions. (1) How does one identify task-relevant text data with declarative queries in multiple dimensions? (2) How does one distill knowledge from text data in a multidimensional space? To address the above questions, we develop a text cube framework. First, we develop a cube construction module that organizes unstructured data into a cube structure, by discovering latent multidimensional and multi-granular structure from the unstructured text corpus and allocating documents into the structure. Second, we develop a cube exploitation module that models multiple dimensions in the cube space, thereby distilling from user-selected data multidimensional knowledge. Together, these two modules constitute an integrated pipeline: leveraging the cube structure, users can perform multidimensional, multigranular data selection with declarative queries; and with cube exploitation algorithms, users can extract multidimensional patterns from the selected data for decision making.The proposed framework has two distinctive advantages when turning text data into multidimensional knowledge: flexibility and label-efficiency. First, it enables acquiring multidimensional knowledge flexibly, as the cube structure allows users to easily identify task-relevant data along multiple dimensions at varied granularities and further distill multidimensional knowledge. Second, the algorithms for cube construction and exploitation require little supervision; this makes the framework appealing for many applications where labeled data are expensive to obtain.

  • af Jialu Liu
    393,95 kr.

    A lot of digital ink has been spilled on "e;big data"e; over the past few years. Most of this surge owes its origin to the various types of unstructured data in the wild, among which the proliferation of text-heavy data is particularly overwhelming, attributed to the daily use of web documents, business reviews, news, social posts, etc., by so many people worldwide.A core challenge presents itself: How can one efficiently and effectively turn massive, unstructured text into structured representation so as to further lay the foundation for many other downstream text mining applications?In this book, we investigated one promising paradigm for representing unstructured text, that is, through automatically identifying high-quality phrases from innumerable documents. In contrast to a list of frequent n-grams without proper filtering, users are often more interested in results based on variable-length phrases with certain semantics such as scientific concepts, organizations, slogans, and so on. We propose new principles and powerful methodologies to achieve this goal, from the scenario where a user can provide meaningful guidance to a fully automated setting through distant learning. This book also introduces applications enabled by the mined phrases and points out some promising research directions.

  • af Deepayan Chakrabarti
    405,95 kr.

    What does the Web look like? How can we find patterns, communities, outliers, in a social network? Which are the most central nodes in a network? These are the questions that motivate this work. Networks and graphs appear in many diverse settings, for example in social networks, computer-communication networks (intrusion detection, traffic management), protein-protein interaction networks in biology, document-text bipartite graphs in text retrieval, person-account graphs in financial fraud detection, and others. In this work, first we list several surprising patterns that real graphs tend to follow. Then we give a detailed list of generators that try to mirror these patterns. Generators are important, because they can help with "e;what if"e; scenarios, extrapolations, and anonymization. Then we provide a list of powerful tools for graph analysis, and specifically spectral methods (Singular Value Decomposition (SVD)), tensors, and case studies like the famous "e;pageRank"e; algorithm and the "e;HITS"e; algorithm for ranking web search results. Finally, we conclude with a survey of tools and observations from related fields like sociology, which provide complementary viewpoints. Table of Contents: Introduction / Patterns in Static Graphs / Patterns in Evolving Graphs / Patterns in Weighted Graphs / Discussion: The Structure of Specific Graphs / Discussion: Power Laws and Deviations / Summary of Patterns / Graph Generators / Preferential Attachment and Variants / Incorporating Geographical Information / The RMat / Graph Generation by Kronecker Multiplication / Summary and Practitioner's Guide / SVD, Random Walks, and Tensors / Tensors / Community Detection / Influence/Virus Propagation and Immunization / Case Studies / Social Networks / Other Related Work / Conclusions

  • af Elena Zheleva
    317,95 kr.

    This synthesis lecture provides a survey of work on privacy in online social networks (OSNs). This work encompasses concerns of users as well as service providers and third parties. Our goal is to approach such concerns from a computer-science perspective, and building upon existing work on privacy, security, statistical modeling and databases to provide an overview of the technical and algorithmic issues related to privacy in OSNs. We start our survey by introducing a simple OSN data model and describe common statistical-inference techniques that can be used to infer potentially sensitive information. Next, we describe some privacy definitions and privacy mechanisms for data publishing. Finally, we describe a set of recent techniques for modeling, evaluating, and managing individual users' privacy risk within the context of OSNs. Table of Contents: Introduction / A Model for Online Social Networks / Types of Privacy Disclosure / Statistical Methods for Inferring Information in Networks / Anonymity and Differential Privacy / Attacks and Privacy-preserving Mechanisms / Models of Information Sharing / Users' Privacy Risk / Management of Privacy Settings

  • af Nitin Agarwal
    301,95 kr.

    This book offers a comprehensive overview of the various concepts and research issues about blogs or weblogs. It introduces techniques and approaches, tools and applications, and evaluation methodologies with examples and case studies. Blogs allow people to express their thoughts, voice their opinions, and share their experiences and ideas. Blogs also facilitate interactions among individuals creating a network with unique characteristics. Through the interactions individuals experience a sense of community. We elaborate on approaches that extract communities and cluster blogs based on information of the bloggers. Open standards and low barrier to publication in Blogosphere have transformed information consumers to producers, generating an overwhelming amount of ever-increasing knowledge about the members, their environment and symbiosis. We elaborate on approaches that sift through humongous blog data sources to identify influential and trustworthy bloggers leveraging content and network information. Spam blogs or "e;splogs"e; are an increasing concern in Blogosphere and are discussed in detail with the approaches leveraging supervised machine learning algorithms and interaction patterns. We elaborate on data collection procedures, provide resources for blog data repositories, mention various visualization and analysis tools in Blogosphere, and explain conventional and novel evaluation methodologies, to help perform research in the Blogosphere. The book is supported by additional material, including lecture slides as well as the complete set of figures used in the book, and the reader is encouraged to visit the book website for the latest information. Table of Contents: Modeling Blogosphere / Blog Clustering and Community Discovery / Influence and Trust / Spam Filtering in Blogosphere / Data Collection and Evaluation

  • af Francesco Cafaro
    611,95 kr.

    When you picture human-data interactions (HDI), what comes to mind? The datafication of modern life, along with open data initiatives advocating for transparency and access to current and historical datasets, has fundamentally transformed when, where, and how people encounter data. People now rely on data to make decisions, understand current events, and interpret the world. We frequently employ graphs, maps, and other spatialized forms to aid data interpretation, yet the familiarity of these displays causes us to forget that even basic representations are complex, challenging inscriptions and are not neutral; they are based on representational choices that impact how and what they communicate. This book draws on frameworks from the learning sciences, visualization, and human-computer interaction to explore embodied HDI. This exciting sub-field of interaction design is based on the premise that every day we produce and have access to quintillions of bytes of data, the exploration and analysis of which are no longer confined within the walls of research laboratories. This volume examines how humans interact with these data in informal (not work or school) environments, paritcularly in museums. The first half of the book provides an overview of the multi-disciplinary, theoretical foundations of HDI (in particular, embodied cognition, conceptual metaphor theory, embodied interaction, and embodied learning) and reviews socio-technical theories relevant for designing HDI installations to support informal learning. The second half of the book describes strategies for engaging museum visitors with interactive data visualizations, presents methodologies that can inform the design of hand gestures and body movements for embodied installations, and discusses how HDI can facilitate people's sensemaking about data. This cross-disciplinary book is intended as a resource for students and early-career researchers in human-computer interaction and the learning sciences, as well as for more senior researchers and museum practitioners who want to quickly familiarize themselves with HDI.

  • af Mark W. Spong, Daniel J. Block & Karl J. Astrom
    395,95 kr.

    This monograph describes the Reaction Wheel Pendulum, the newest inverted-pendulum-like device for control education and research. We discuss the history and background of the reaction wheel pendulum and other similar experimental devices. We develop mathematical models of the reaction wheel pendulum in depth, including linear and nonlinear models, and models of the sensors and actuators that are used for feedback control. We treat various aspects of the control problem, from linear control of themotor, to stabilization of the pendulum about an equilibrium configuration using linear control, to the nonlinear control problem of swingup control. We also discuss hybrid and switching control, which is useful for switching between the swingup and balance controllers. We also discuss important practical issues such as friction modeling and friction compensation, quantization of sensor signals, and saturation. This monograph can be used as a supplement for courses in feedback control at the undergraduate level, courses in mechatronics, or courses in linear and nonlinear state space control at the graduate level. It can also be used as a laboratory manual and as a reference for research in nonlinear control.

Gør som tusindvis af andre bogelskere

Tilmeld dig nyhedsbrevet og få gode tilbud og inspiration til din næste læsning.