NLP Reading notes #2 (EMNLP 2021 Highlights)
As some of you may know, in early November of 2021 I had the great honor to publish and be present for the EMNLP 2021 conference in the Dominican Republic. There, I met a lot of interesting people and saw many interesting papers. For a first conference, I quickly learned that the most interesting places to be are the poster sessions. There, in a large room, dozens of researchers were competing for attention and presenting their research. This was an opportunity to talk with them one on one and get a deeper understanding of what they did. In this article, I will present my selection of the most interesting research project that I came across during these sessions.
Traditional word embeddings provide us with the ability to translate words into number vectors such that computers can manipulate them algebraically. This basically allows us to do math with words (e.g. KING – MAN + WOMAN = QUEEN).
This paper proposes another technique for word embeddings. Instead of translating words into vectors (points in space), they translate them into n-dimensional boxes. The advantage of boxes over points to represent words is that boxes can contain one another. Hence, box embeddings provide us with a simple way of creating hierarchical embeddings. For example, the box for “mammals” would contain the boxes for “dogs“ and “cats”. This provides a better representation of words that can then be leveraged by machine learning models.
Beginning during the 24th century BC, the Akkadian Empire was one of the first human civilizations to ever exist. Today, the Akkadian language is long forgotten but old written tablets have been found in mesopotamia. However, many of these relics have suffered the wrath of time and have been badly damaged.
This paper proposes to use modern Language Modelling technology to try and predict missing words on these tablets. They showed impressive results and demonstrated that training their model on related modern semitic languages such as Arabic can help provide better performance than if we were to only use the few tablets available as training set. Such a model could help historians uncover the secrets of our past.
Language models are one of the most important developments in NLP. These are models capable of generating text and word embeddings. They are the basis of many downstream systems. As such, understanding their behavior is essential.
This paper studies the long term contextual memory of such models. That is when predicting the next word, how far in the past (previous words) will the model look to decide which word to generate next. They discovered that the type of text matters. Continuous narrative such as fiction leads the model to care more about past words than technical texts. This demonstrates that such models are flexible and can differentiate various kinds of texts.
Generating good text using Language Models is difficult. However, even if the text is legible, it might not be appropriate for every user. This paper discusses the importance and challenges of generating personalized text that is user-specific. This can mean adapting the lexical register used to align with the user. Examples include varying the level of formality in the text or the level of depth with respect to the user’s knowledge and ability.
This paper asks a simple question. Does the order of words matter in a sentence? They used language models to try to reorder shuffled sentences. They demonstrated that language models can infer the order of words with a minimal amount of error. This implies that word order does not convey much information that is independent from the information provided by the words themselves. This can be explained because syntax ( which is learned by the language models) limits the way words can be ordered. This puts into question the relevance of the position of words in a sentence as useful information for text analysis.
Language models do not function like us. When deciding which word to generate next, they compute a distribution of probability over a vocabulary of words. Consequently, if multiple words are valid, they will compete for probability since all probabilities must add to 1. This makes probability a poor measure of correctness.
This paper proposes to use PMI instead of probability as a scoring function as it is not subject to such competition. Using PMI shows significant improvement over simple probabilities for text generation. This simple trick can have a large impact on future language modelling research.
Language models are really good at understanding human language. However, they have difficulties with numbers. Specifically, numbers in ranges that they haven’t encountered during their training. Moreover, current methods manipulate numbers in the same manner as words by trying to extract common syllables which is not appropriate. This can impact performance significantly.
This paper proposes a simple yet elegant solution. They pre-process text to transform numbers into another surface form that leads to better performance for language models (“421” becomes “4e3 2e2 1e1”). Consequently, their model is better able to generalize to a larger range of numbers.