Gør som tusindvis af andre bogelskere
Tilmeld dig nyhedsbrevet og få gode tilbud og inspiration til din næste læsning.Ved tilmelding accepterer du vores persondatapolitik.
Du kan altid afmelde dig igen.
This work presents a discourse-aware Text Simplification approach that splits and rephrases complex English sentences within the semantic context in which they occur. Based on a linguistically grounded transformation stage, complex sentences are transformed into shorter utterances with a simple canonical structure that can be easily analyzed by downstream applications. To avoid breaking down the input into a disjointed sequence of statements that is difficult to interpret, the author incorporates the semantic context between the split propositions in the form of hierarchical structures and semantic relationships, thus generating a novel representation of complex assertions that puts a semantic layer on top of the simplified sentences. In a second step, she leverages the semantic hierarchy of minimal propositions to improve the performance of Open IE frameworks. She shows that such systems benefit in two dimensions. First, the canonical structure of the simplified sentences facilitates the extraction of relational tuples, leading to an improved precision and recall of the extracted relations. Second, the semantic hierarchy can be leveraged to enrich the output of existing Open IE approaches with additional meta-information, resulting in a novel lightweight semantic representation for complex text data in the form of normalized and context-preserving relational tuples.
This book constitutes the refereed proceedings of the 20th International Conference on Formal Modeling and Analysis of Timed Systems, FORMATS 2022, held in Warsaw, Poland, in September 2022. The 12 full papers together with 2 short papers that were carefully reviewed and selected from 30 submissions are presented in this volume with 3 full-length papers associated with invited/anniversary talks. The papers focus on topics such as modelling, design and analysis of timed computational systems. The conference aims in real-time issues in hardware design, performance analysis, real-time software, scheduling, semantics and verification of real-timed, hybrid and probabilistic systems.
This book constitutes the proceedings of the 26th International Conference on Developments in Language Theory, DLT 2022, which was held in Tampa, FL, USA, during May, 2022. The conference took place in an hybrid format with both in-person and online participation.The 21 full papers included in these proceedings were carefully reviewed and selected from 32 submissions. The DLT conference series provides a forum for presenting current developments in formal languages and automata.
This book constitutes the refereed proceedings of the 13th International Conference of the CLEF Association, CLEF 2022, held in Bologna, Italy in September 2022.The conference has a clear focus on experimental information retrieval with special attention to the challenges of multimodality, multilinguality, and interactive search ranging from unstructured to semi structures and structured data.The 7 full papers presented together with 3 short papers in this volume were carefully reviewed and selected from 14 submissions. This year, the contributions addressed the following challenges: authorship attribution, fake news detection and news tracking, noise-detection in automatically transferred relevance judgments, impact of online education on children's conversational search behavior, analysis of multi-modal social media content, knowledge graphs for sensitivity identification, a fusion of deep learning and logic rules for sentiment analysis, medical concept normalization and domain-specific information extraction.In addition to this, the volume presents 7 "e;best of the labs"e; papers which were reviewed as full paper submissions with the same review criteria. 14 lab overview papers were accepted and represent scientific challenges based on new datasets and real world problems in multimodal and multilingual information access.
Most intermediate-level machine learning books focus on how to optimize models by increasing accuracy or decreasing prediction error. But this approach often overlooks the importance of understanding why and how your ML model makes the predictions that it does.Explainability methods provide an essential toolkit for better understanding model behavior, and this practical guide brings together best-in-class techniques for model explainability. Experienced machine learning engineers and data scientists will learn hands-on how these techniques work so that you'll be able to apply these tools more easily in your daily workflow.This essential book provides:A detailed look at some of the most useful and commonly used explainability techniques, highlighting pros and cons to help you choose the best tool for your needsTips and best practices for implementing these techniquesA guide to interacting with explainability and how to avoid common pitfallsThe knowledge you need to incorporate explainability in your ML workflow to help build more robust ML systemsAdvice about explainable AI techniques, including how to apply techniques to models that consume tabular, image, or text dataExample implementation code in Python using well-known explainability libraries for models built in Keras and TensorFlow 2.0, PyTorch, and HuggingFace
This book constitutes the proceedings of the 26th International Conference on Implementation and Application of Automata, CIAA 2022, held in Rouen, France in June/ July 2022. The 16 regular papers presented together with 3 invited lectures in this book were carefully reviewed and selected from 26 submissions. The topics of the papers covering various fields in the application, implementation, and theory of automata and related structures.
Labelling data is one of the most fundamental activities in science, and has underpinned practice, particularly in medicine, for decades, as well as research in corpus linguistics since at least the development of the Brown corpus. With the shift towards Machine Learning in Artificial Intelligence (AI), the creation of datasets to be used for training and evaluating AI systems, also known in AI as corpora, has become a central activity in the field as well. Early AI datasets were created on an ad-hoc basis to tackle specific problems. As larger and more reusable datasets were created, requiring greater investment, the need for a more systematic approach to dataset creation arose to ensure increased quality. A range of statistical methods were adopted, often but not exclusively from the medical sciences, to ensure that the labels used were not subjective, or to choose among different labels provided by the coders. A wide variety of such methods is now in regular use. This book is meant to provide a survey of the most widely used among these statistical methods supporting annotation practice. As far as the authors know, this is the first book attempting to cover the two families of methods in wider use. The first family of methods is concerned with the development of labelling schemes and, in particular, ensuring that such schemes are such that sufficient agreement can be observed among the coders. The second family includes methods developed to analyze the output of coders once the scheme has been agreed upon, particularly although not exclusively to identify the most likely label for an item among those provided by the coders. The focus of this book is primarily on Natural Language Processing, the area of AI devoted to the development of models of language interpretation and production, but many if not most of the methods discussed here are also applicable to other areas of AI, or indeed, to other areas of Data Science.
Empirical methods are means to answering methodological questions of empirical sciences by statistical techniques. The methodological questions addressed in this book include the problems of validity, reliability, and significance. In the case of machine learning, these correspond to the questions of whether a model predicts what it purports to predict, whether a model's performance is consistent across replications, and whether a performance difference between two models is due to chance, respectively. The goal of this book is to answer these questions by concrete statistical tests that can be applied to assess validity, reliability, and significance of data annotation and machine learning prediction in the fields of NLP and data science. Our focus is on model-based empirical methods where data annotations and model predictions are treated as training data for interpretable probabilistic models from the well-understood families of generalized additive models (GAMs) and linear mixed effects models (LMEMs). Based on the interpretable parameters of the trained GAMs or LMEMs, the book presents model-based statistical tests such as a validity test that allows detecting circular features that circumvent learning. Furthermore, the book discusses a reliability coefficient using variance decomposition based on random effect parameters of LMEMs. Last, a significance test based on the likelihood ratio of nested LMEMs trained on the performance scores of two machine learning models is shown to naturally allow the inclusion of variations in meta-parameter settings into hypothesis testing, and further facilitates a refined system comparison conditional on properties of input data. This book can be used as an introduction to empirical methods for machine learning in general, with a special focus on applications in NLP and data science. The book is self-contained, with an appendix on the mathematical background on GAMs and LMEMs, and with an accompanying webpage including R code to replicate experiments presented in the book.