This is a limited proof of concept to search for research data, not a production system.

Search the MIT Libraries

Title: qdap: qdap Version 2.2.0

Type Software Tyler Rinker, Colin Gillespie, Craig Citro (2014): qdap: qdap Version 2.2.0. Zenodo. Software. https://zenodo.org/record/11960

Authors: Tyler Rinker (University at Buffalo) ; Colin Gillespie (Newcastle University) ; Craig Citro (Google) ;

Links

Summary

NEWS Versioning

Releases will be numbered with the following semantic versioning format:

<major>.<minor>.<patch>

And constructed with the following guidelines:

Breaking backward compatibility bumps the major (and resets the minor and patch) New additions without breaking backward compatibility bumps the minor (and resets the patch) Bug fixes and misc. changes bumps the patch CHANGES IN qdap VERSION 2.2.0

BUG FIXES

bag_o_words did not make use of the bag_o_words2 helper function that has finer grained control of the output. ... were ignored but now are respected.

fry threw an error if a group contained < 300 words but had enough text to generate 2 texts chunks of 100 words each, caught by S. Enrico P. Indiogine. The bug has been fixed as these groups are dropped and a warning given.

phrase_net threw an error caused by dplyr's (0.3) approach to subsetting columns. Proviously a vector was returned, now a tbl_df object is returned: https://github.com/hadley/dplyr/issues/587. This was adtreeded by using explicit df[[index]] rather than df[, index].

NEW FEATURES

chunker added to break text, optionally by grouping variables, into equal chunks. The chunk size can be specified by giving number of words to be in each chunk or the number of chunks.

IMPROVEMENTS

all_words gains char.keep and char2space arguments to enable retention of characters and multi word phrases. These features are passed to freq_terms as well. Suggestd by stackoverflow's lawyeR (http://stackoverflow.com/a/26162401/1000343).

CHANGES

rm_url has been moved into its own canned regex pattern extraction/replacer package named qdapRegex.

name2sex now uses the gender package to predict sex. This makes the function slightly slower but much more accurate than previous versions. Because of this increased accuracy and dependence on gender, the arguments pred.sex, fuzzy.match, and database are no longer necessary and have been removed.

CHANGES IN qdap VERSION 2.1.1

BUG FIXES

syllable_count returned the sentence (recycled) in the words column of the output. This behavior has been fixed. See GitHub issue #188 for details.

syn returned antonyms for some words. This was caused by the dictionary: qdapDictionaries::key.syn contained antonyms and elemets the were error messages (character). This has been fixed. Reference issue #190. (Jingjing Zou)

The pres_debates2012 data set contained three errors in speech attribution. This has been corrected and the turn of talk (tot) as well.

word_stats would throw an error if no poly-syllable words existed. This has been corrected (reported by Nicolas Turenne).

NEW FEATURES

qdap_df and %&% added to mimic some of the functionality of dplyr's tbl_df and chaining pipe in a more specific, less flexible, qdap oriented way.

Text added to view and change the text.var attribute of a data.frame of the classqdap_df`.

cumulative generic method added to view cumulative scores over time.

formality picks up a cumulative method.

polarity picks up a cumulative method.

end_mark picks up a class (end_mark), plot method, and a cumulative method.

syllable_sum, polysyllable_sum, and combo_syllable_sum pick up a class, plot method, and a cumulative method.

wfm becomes a generic method currently applied to a text.var that is: character, factor (coerced to character), or wfdf.

unbag added as a compliment to bag_o_words and friends for undoing string splitting. A convenience wrapper for paste(collapse = " ").

as.Corpus.TermDocumentMatrix, as.Corpus.DocumentTermMatrix, and as.Corpus.wfm added to convert a matrix format to a tm::Corpus.

exclude becomes a generic method for various classes. Functionality is the same but with improved code readability.

check_spelling_interactive, check_spelling, which_misspelled, and correct allow the user to identify potentially misspelled words and optionally suggest replacements.

random_data & random_sent added to generate random sentence data sets and vectors.

comma_spacer added to ensure strings with commas contain a space after them.

check_text added to identify potential problems in text.

replace_ordinal added to convert ordinal representations of 1 through 100 to strictly ordinal text (e.g., "1st" becomes "first").

A vignette: Cleaning Text & Debugging was added to assist users with cleaning and debugging problems in qdap.

pronoun_type, and subject_pronoun_type, object_pronoun_type added to examine usage of subject/object pronouns by grouping variable.

MINOR FEATURES

dplyr's chaining pipe imported for convenience. See http://www.rdocumentation.org/packages/magrittr/functions/magrittr for details.

IMPROVEMENTS

wfm gains a speedup through generic classes and tm package integration (strip is no longer used in wfm).

as.tdm.character and as.dtm.character gain a speed boost with a tm package integration.

Added message to as.data.frame.Corpus for missing end-marks suggesting the use of: sent.split = FALSE.

as.Corpus familiy of functions didn't necessarily respect document names and sometimes used numeric sequence instead. The introduction of a reader via tm::readTabular has fixed this.

sentSplit now gives warnings for text that may contain anomalies such as: non-ASCII characters, factors, missing punctuation, empty cells, and no alphabetic characters found.

read.transcript now gives a warning when reading from a .docx file and the separator (sep) used is still found in the text as this may indicate the data did not split correctly.

dispersion_plot now takes a named list of vectors of terms as the argument to match.terms. The vectors are combined as a unified theme named with the names of the list supplied to match.terms.

CHANGES

as.data.frame.Corpus's default value for sent.split is now FALSE.

The state column in the qdap::DATA2 data-set is now character (previously factor).

CHANGES IN qdap VERSION 2.1.0

BUG FIXES

new_project did not copy the .Rprofile over into the new project. This has been fixed. Reference issue #184.

sentiment_frame coerced words to factor. stringsAsFactors = FALSE has been added to prevent this.

polarity did not work on > 1 grams due to a bug in sentiment_frame converting character to factor (thanks for the find @chewth). See GitHub issue #185 for details.

NEW FEATURES

unique_by added to allow the user to find terms unique to individual elements of a grouping variable.

build_qdap_vignette replaces the temporary place holder version of the Introduction to qdap vignette. This function will replace the (1) HTML, (2) source, & (3) R code found in browseVignettes(package = 'qdap').

MINOR FEATURES

sub_holder picks up a alpha.type argument that allows the user to specify whether alpha or numeric keys should be used.

replace_number picks up a remove argument that removes numbers from text.

IMPROVEMENTS

qheat becomes a generic method. This means some of the internal function class checking has been moved to individual methods for those classes. Additionally, qheat now works with logical matrices/data.frames.

The tm package compatibility functions have been renamed in a more R-ish way and take the form of generic methods for specific classes. For example, df2tm_corpus becomes as.Corpus. Here is a complete list of changes:

df2tm_courpus is now as.Corpus tm_corpus2df is now as.data.frame as.wfm is now a generic method tm_corpus2wfm is now as.wfm tm2qdap is now as.wfm tdm is now as.tdm or as.TermDocumentMatrix dtm is now as.dtm or as.DocumentTermMatrix

CHANGES

colsplit2df and colpaste2df no longer convert character columns to factor.

df2tm_corpus is deprecated. It will be removed in a subsequent version of qdap. Use as.Corpus instead.

tm_corpus2df is deprecated. It will be removed in a subsequent version of qdap. Use as.data.frame instead.

tm2qdap is deprecated. It will be removed in a subsequent version of qdap. Use as.wfm instead.

tm_corpus2wfm is deprecated. It will be removed in a subsequent version of qdap. Use as.wfm instead.

tdm is deprecated. It will be removed in a subsequent version of qdap. Use as.tdm or as.TermDocumentMatrix instead.

dtm is deprecated. It will be removed in a subsequent version of qdap. Use as.dtm or as.DocumentTermMatrix instead.

The Introduction to qdap .Rmd vignette has been moved to an internal directory. The HTML version is not built by default. This saves CRAN space and time checking the package source. The file has been replaced with a temporary place holder that contains instructions for building the actual vignette. The user may also use the build_qdap_vignette directly.

qdap incorporates the changes from the tm package version: 0.6: http://cran.r-project.org/web/packages/tm/news.html Reference issue #187.

CHANGES IN qdap VERSION 2.0.0

The qdapTools package now houses several former qdap functions. While qdapTools is a Dependency and all of these functions will be accessible to the qdap user there is a break in backward compatibility if these functions are included in code. For this reason this release is a major bump of qdap.

BUG FIXES

replace_number did not replace single digits numbers. Spotted by Ben Bolker. This behavior has been fixed and unit testing added for this function. See issue #178.

NEW FEATURES

sub_holder added; this function holds the place for particular character values, allowing the user to manipulate the vector and then revert the place holders back to the original values.

Network method added to make network plots of select qdap objects.

qtheme, theme_nightheat, theme_duskheat, theme_norah,theme_cafe, theme_grayscale,theme_badkitchen, andtheme_hipsteradded to style Network` plots.

polarity picks up a Network method.

formality picks up a Network method.

qdap officially begins utilizing the testthat package for unit testing, though only a few functions have begun the process, more will be added over time.

MINOR FEATURES

IMPROVEMENTS

CHANGES

The qdapTools package now houses the following former qdap functions: hash, %ha%, hash_look, hms2sec, id, lookup, %l%, %l+%, %l*%, repo2github, sec2hms, text2color, url_dl, v_outer, list2df, matrix2df, vect2df, list_df2df, list_vect2df, counts2list,vect2list, & mtabulate. These functions will continue to be available to qdap users in interactive mode (qdapTools is a Dependency and thus these functions are loaded into the workspace by default). This will allow this bundle of functions to be used outside of qdap without calling the larger qdap package per the request of Kirill Muller (see issue #165).

As scheduled the dissimilarity function has been removed from the qdap package to avoid conflict with the tm package. Use Dissimilarity function instead.

CHANGES IN qdap VERSION 1.3.6

MINOR FEATURES

polarity picks up a constrain argument that constrains the polarity values to be between -1 and 1.

IMPROVEMENTS

polarity's equation now uses primes on the de-amplifiers before they're confined to be >= -1. This avoids confusion in the indicator function that took the de-amplifiers variable and returned the same variable.

dist_tab's frequency columns used a capital F in Freq. This was not consistent across all column names and has been changed to lower case.

CHANGES

polarity_frame is deprecated and will be removed in a subsequent release. Please use sentiment_frame instead. CHANGES IN qdap VERSION 1.3.5

BUG FIXES

The An Introduction to qdap vignette contained a broken link in the tm Package Compatibility section. This has been fixed. Also the reliance on Rgraphviz from the vignette has been removed. This will eliminate CRAN WARN in CRAN checks (for some OS) but not the note for tm's reliance on Rgraphviz.

polarity reported the incorrect number of words for sentences containing commas. This has been fixed (Max Ghenis).

NEW FEATURES

formality picks up an Animate method.

end_mark_by function added as a aggregated grouping version of end_mark.

MINOR FEATURES

raj.act.1POS added. raj.act.1POS is a data set for Romeo and Juliet: Act 1 broken into parts of speech.

IMPROVEMENTS

discourse_map picks up a pause argument that enables the user to pause between plots in interactive mode.

CHANGES

CHANGES IN qdap VERSION 1.3.4

BUG FIXES

NEW FEATURES

gantt and gantt_wrap (single facet) pick up and Animate method.

polarity picks up an Animate method.

vertex_apply and edge apply added to make uniform changes to lists of igraph objects.

MINOR FEATURES

IMPROVEMENTS

discourse_map picks up a condense argument that allows the user to condense sequential rows for like grouping variable sub groups.

list_df2df names now use a zero padded numeric portion for default names. For example c("L1", "L2", "L3", ... "L10"), becomes c("L01", "L02", "L03", ... "L10").

CHANGES

CHANGES IN qdap VERSION 1.3.3

BUG FIXES

colpaste2df dropped the column name for a single retained column when keep.orig = FALSE. See GitHub issue #157 for more.

multigsub (mgsub) would return NA for replacement of length 1 after the addition of the order.pattern (used to prevent substrings from replacing meta-strings) in version 1.3.2.

NEW FEATURES

phrase_net function provides functioning similar to the Many Eyes Phrase Net plot.

discourse_map function provides a network mapping of the flow of discourse between social actors. Function output is Animate ready as well. See ?discourse_map and http://trinker.github.io/qdap_examples/animation_dialogue for more.

Animate function added to convert select qdap outputs to an animated sequence. See ?Animate.discourse_map for more.

MINOR FEATURES

synonyms_fr

More information

  • DOI: 10.5281/zenodo.11960

Dates

  • Publication date: 2014
  • Issued: October 04, 2014

Rights

  • info:eu-repo/semantics/openAccess Open Access

Much of the data past this point we don't have good examples of yet. Please share in #rdi slack if you have good examples for anything that appears below. Thanks!

Format

electronic resource

Relateditems

DescriptionItem typeRelationshipUri
IsSupplementTohttps://github.com/trinker/qdap/tree/qdapVersion2.2.0
IsVersionOfhttps://doi.org/10.5281/zenodo.592474
IsPartOfhttps://zenodo.org/communities/zenodo