NEWS.md
Fix for datetimes when running as.data.frame(force = TRUE) (#666, thanks @rossellhayes)
The crunchy gadgets have been removed.
Variables can now be created as materialized by default instead of derived, by setting environment variable R_CRUNCH_DEFAULT_DERIVED or option crunch.default.derived to FALSE. See ?toVariable for more information (#648).
The concept of a personal folder is being removed from the API imminently. This has a few implications for rcrunch:
All datasets must be created with a project (eg via the project argument of newDataset() or forkDataset())
Because loading datasets by name doesn’t work for datasets in projects, it’s not really possible to load a dataset by name without specifying the full project path.
To make things easier, it is possible to set a default project path with environment variable R_CRUNCH_DEFAULT_PROJECT or option crunch.default.project. This will be used as the default project folder when creating and loading datasets. Forks will still be put next to parents.
crunch.warn.hidden & crunch.warn.private options are set (#619).names() to avoid issues with stuttering by RStudio auto-complete with option crunch.names.includes.hidden.private.variables (#619).crunch.order.var.catalog (#619).ScriptCatalog (and removal of the ScriptCatalog method for ScriptBody the full body, subset to the particular script if you need the body text with vapply(scripts(ds), function(x) scriptBody(x), character(1))).login() is no longer supported. See the vignette("crunch") or ?crunch-api-key for more details on authenticating with api keys.exportDataset() (#595)vignette("crunch") or ?crunch-api-key for more details. The login() authentication flow is deprecated and will be removed from an upcoming release.interactVariables() now uses server side logic to create categories so that it’s faster, but the sep argument is no longer supported (it’s always set to " and ") and the category order will not be the same.as.data.frame() and as.vector() work with numeric arrays now (#558)options(crunch.show.progress.url = TRUE) to show the URL checked for progress (#565)as.data.frame() where it would not respect the include.hidden argument (#560)forceVariableCatalog() or option(crunch.lazy.variable.catalog = FALSE)).searchDatasets() gained an argument f that allows you to pass R objects to filter on.deriveArray() and makeArray(), and you can perform calculations on them via crtabs() and tabBook().analyses argument of newSlide() or set analysis<- to a list, allowing all slide customizations from R.?runCrunchAutomation for more informationexpropriateUser() function is now accessed through the function call reassignUser(). Functionality of the call has not changed.makeCaseWhenVariable() that helps with many common recoding needs.alterArrayExpr() which allows adding, removing, reordering and renaming subvariables in a derived Array.keyring package to store your credentials. See ?login() for details (thanks @mainwaringb!)Add several new expressions that let you create derived variables in more flexible ways than was previously possible.
crunch::filter() now falls back to the next filter on your searchpath when no method is defined.
tabBook() by default uses a new endpoint, which allows for more options. The old endpoint is deprecated, but while the server supports it, you can still use it. See ?tabBook for more details.weight() and weight()<- (#440)filter or filter object when using tabbook(). Filtering by expression in the dataset argument is also supported again.newMultiTable() now correctly passes ..., so arguments like is_public work (#424)hiddenVariables() works when the hidden variables folder has subdirectories (#372).deriveArray() using expressions to create the subvariables.slideCategories() helps you create overlapping categorical variables (#396).stringsAsFactors defaults (#402).newSlide includes examples of vizType settings and other improvements.copyFolders() function that copies folders and variable order from one dataset to another (similar to copyOrder() which was deprecated)filters(slide_object) <- NULL or filters(slide_object) <- filter_object)expect_either())... removed from documentation(un)hideVariables() functions are upgraded to use folder operations.~, as in a *nix file system. cd(projects(), "~") takes you there; mv(projects(), ds, "~") moves ds into your personal folder.listDatasets() now by default only prints datasets in your personal folder, not a combination of your personal datasets and some of the datasets that have been shared with you.loadDataset(), it now searches to find datasets exactly matching that name unless you specify a project to load from. If you have multiple datasets with the same name in different locations, loadDataset("your dataset name") may return a different one than it did previously. If you want to identify a dataset precisely in loadDataset(), either specify the dataset URL (most effective but not as human friendly) or provide project = "path/to/folder".loadDataset(<integer>) no longer is supported.is.public(multitables[[i]]) <- TRUE and several other similar assignments of attributes on an element of a catalog, which previously successfully updated the value on the server but errored when returning to R (#303, #367)subvariables() on non-array variables returns NULL instead of an error (#237)prop.tablesnewExampleDataset() creates a sample dataset for you to exploreexportDeck() can now write to PowerPoint with format = "pptx"
newDataset() now supports importing data in Triple-S format, providing a schema file in addition to the row data.resolution() lets you see the data units of a datetime variable (“Y”, “M”, “D”, “ms”, etc.); resolution<- lets you set it (#234)deleteDataset() accepts web app URLs, just as loadDataset() already did (#279)options(crunch.warn.hidden=FALSE) to suppress the “Variable x is hidden” messages when accessing hidden variables (#172)team(deck) <-
upsert argument to appendDataset() to allow datasets to be updated based on the primary-key variable; see pk() for details on primary keys (#49)combineCategories() and combineResponses() are aliases for combine(), providing a way to avoid accidental clashes with dplyr::combine() (#359)teams()<- on them. View which teams can access them by calling teams() on them."." as a folder path/segment, referencing the current folder. cd(project, ".") returns project; mv(project, ds, ".") moves ds into project.datasetReference()
listDatasets() and makeArrayGadget() have been moved to the crunchy package. Wiring for them, including RStudio add-in configuration, remains here, but you’ll have to install that package to use them.mv() and the other folder operations. These functions will be removed in December 2018.cd(), mv(), mkdir(), rmdir()) for organizing datasets within projects, following the pattern of variable folders. See vignette("projects", package = "crunch").setName() and setNames() for renaming folders and folder contents, respectively.makeWeight() is now correct for categorical variables with non-sequential IDs.write.csv or as.data.frame(force = TRUE) if requested.index.table() to better reflect analysts’ intentions. Now, index.table() calculates the index with respect to the marginal proportion of the margin given, so for index.table(cube, 2) the column proportions of the table are indexed to the marginal row proportions. In other words: for each column how much larger or smaller is the proportion in that column when compared to the proportions for the row variable alone.haven package and its new haven_labelled and haven_labelled_spss object classes.margin.table, prop.table, etc.)mv() to move them to a folder.deleteVariables() no longer tries to delete duplicate variables.as.data.frame(..., force = TRUE) with numeric variables that have missing values.Suggests reference for test packages, following new check requirement.getDimTypes() returns a richer set of cube dimension types differentiating multiple response from categorical array dimensions.alias, description, and notes on VariableTuples
vignette("crunch")
changeCategoryID() tries to unset then reset the dataset exclusion if that impacts its progress. Best practice is to disable exclusions before running changeCategoryID() if at all possible.ordering<- of datasets within a project will now drop any invalid entries with a warning, rather than error.NA data.streamRows() for case when sending only one row (#253).getDimTypes() returns a richer set of cube dimension types differentiating multiple response from categorical array dimensions.alias, description, and notes on VariableTuples
makeArrayGadget() launches an RStudio gadget to help you build valid categorical arrays and multiple response variables.CrunchCubes can now be subset just like R arrays using the [ method.numeric_values). See ?addSummaryStat for more information.index.table() to return tables indexed to a margin.subtotals(var) <- NULL when it already was NULL (#231)."" for variable metadata fields if no value is set (#232).makeMRFromText() with a categorical variable.crunch* packages can use it.%in% and == on Crunch objects now follow R semantics more closely with regards to missing data.cd(), mv(), mkdir(), rmdir(). These functions use a new API for variable folders (unlike the experimental versions of some that were introduced in the 1.19.0 package release). This API is currently in a beta testing phase. See vignettes("variable-order", package="crunch") for examples and details.listDatasets(shiny = TRUE) launches an RStudio addin which allows you to select your dataset in order to generate a valid loadDataset() call. You can also associate this addin with a hotkey using in RStudio through Tools > Modify Keyboard Shortcuts.webApp() now works for Crunch variables: it will take you to the “browse” view of the web application with the given variable card loaded on screen.ds$id_var_numeric <- as.Numeric(ds$id_var). There are as.* methods for all Crunch data types except for array-like variables.haven’s labelled class when converting to Crunch variable types.makeMRFromText() to take a variable imported as delimited strings, parse the multiple-response options, and return a (derived) multiple_response variable.setPopulation(ds, size = 24.13e6, magnitude = 3) and for getting population sizes (or magnitudes) with popSize(ds) and popMagnitude(ds) respectively.rollupResolution(ds$datetime) and set with rollupResolution(ds$datetime) <- "M".options(crunch.show.progress) to govern whether to report progress of long-running requests. Default is TRUE, but set it to FALSE to run quietly.pollProgress() and recommend using that when a long-running request fails to complete within the local timeout.subtotals(variable) <- Subtotal(name = 'subtotal', categories = c(1, 2), after = 2). Use subtotals(variable) to see what subtotals are set for a variable.subtotalArray([cube])
?subtotals or vignette("subtotals", package="crunch") for more information.as_selected function instead of selected_array, which is now deprecated).options(crunch.mr.selection = "selected_array").conditionalTransform()
conditionalTransform() now has a formulas argument to specify a list of conditions to be used.conditionalTransform().refresh() for Datasets is now more efficient.ordering(ds)[[c("Top folder", "Nested folder")]]) or a single string with nested folders separated by a delimiter (as in ordering(ds)[["Top folder/Nested folder"]]). “/” is the default path delimiter, and this is configurable via options(crunch.delimiter). If you have folders that actually contain “/” in the folder name, this may be a breaking change. If so, set options(crunch.delimiter="|") or some other string so that folder names are not incorrectly interpreted as paths.mv() and mkdir() functions for creating variable folders and moving variables into them. These take a Dataset as their argument and can be chained together for convenience/readability.folder() and folder<- to locate a variable in the folder hierarchy and to move it to a new folder. folder(ds$var) <- "New folder/subfolder" is equivalent to ds <- mv(ds, "var", c("New folder", "subfolder")).conditionalTransform() (#64, #153)collapseCategories() allows you to combine categories in place without creating a new variablecopy() has been made more efficientCrunchDataFrames have been improved to act more data.frame-like. You can now access and overwrite values with standard data.frame methods like crdf$variable1 or crdf[,"variable1"] and crdf$variable1 <- 1 or crdf[,"variable1"] <- 1. CrunchDataFrames now also support adding arbitrary columns, although it should be noted that these columns are not stored on the Crunch server, so if you want to keep that data outside of your current R session, you should send it back to your Dataset as a new variable.is.selected() is now vectorized to work with Categories, as is.na() has always been. You can also now assign into the function (#123)addSubvariable() now accepts variable definitions directly (#72)makeCaseVariable() has better errors when a user doesn’t name all of their case definitions (#158).as.data.frame() when force = TRUE has been removed (#150)as.data.frame() method.modifyWeightVariables(), weightVariables(ds) <- ds$newWeight or is.weightVariables(ds$var) <- TRUE
expropriateUser() to transfer datasets, projects, and other objects owned by one user to another, as when that user has left your organization.UserCatalogs by email (e.g. catalog[["you@example.com"]]) by default. All catalog extract methods ([ and [[) now also accept a secondary argument for setting an index to match against to change that default.R_CRUNCH_EMAIL and R_CRUNCH_PW respectively.as_selected multiple-response variables have margin and prop.table methodsvariables() now contain additional metadata, including “type”bases() when called on a univariate statistic (#124)testthat
makeWeight() allows you to generate new weighting variables based on categorical variables (#80).cut(), equivalent to base::cut, allows you to generate a derived categorical variable based on a numeric variable (#93).newDataset() directly instead of newDatasetFromFile. Also, you can now create a dataset from a hosted file passing its URL to newDataset(FromFile).as.data.frame() method for VariableCatalog for a view of variable metadata (#75)crunchBox() now allows you to specify colors for branding or even category-specific coloring.login() in a way that conceals the input.changeCategoryID() to only update numeric values of the category having its id changed when the id and the numeric value are the same.autorollback argument of appendDataset() has been deprecated. The option no longer has any effect and a warning will be printed to notify users about the deprecation.newDatasetByCSV was removed.geo() on a variable to see if there is already associated geographic data.addGeoMetadata() function to match a text or categorical variable with available geodata based on the contents of the variable and metadata associated with Crunch-hosted geographic data.derivation()
derivation() <- NULL
resetPassword() functioncopyOrder() to copy the ordering of variables from one dataset to another.loadDataset() and it will now load the same dataset in your R session.webApp() function to go the other way: open the dataset from your R session in your web browser.categoriesFromLevels() is now exported (#77)deleteSubvariable() by index instead deleted the parent variablemethods package so that Rscript works (#90)CrunchDataFrames with standard data.framesTwo attempts to fix download issues introduced by 1.17.4:
crGET with httr::write_disk() to hopefully work around issues caused by utils::download.file with method “libcurl”.retry for downloads to hopefully work around a delay in CDN population.searchDatasets() to use the Crunch search API.digits() (useful when exporting to SPSS files).crtabs and table where a dimension is a CrunchLogicalExpr now return a boolean dimension with names “FALSE” and “TRUE”, rather than the previous behavior of dropping the dimension and only returning the TRUE value.makeCaseVariable() takes a sequence of case statements to derive a new variable based on the values from other variables.interactVariables() takes two or more categorical variables and derives a new variable with the combination of each.options(download.file.method="curl").pendingStream(); append that pending stream data to the dataset with appendStream() (#40)multitables(ds)[["Multitable name"]] <- ~ var1 + var2 syntax. Similarly, multitables can be deleted with multitables(ds)[["Multitable name"]] <- NULL. Multitables also have new name() and delete() methods.toVariable() now accepts (and then strips) arguments of class AsIs (#44)changeCategoryID() failed on multiple response variables.dashboard and dashboard<- methods to view and set a dashboard URL on a datasetchangeCategoryID function to map categorical data to a new “id” and value in the data (#38, #47)importMultitable() to copy a multitable form one dataset to another. Additionally, Multitables now have a show method showing its name and column variables.appendDataset() now truly appends a dataset and no longer upserts if there is a primary key set. This is accomplished by removing the primary key before appending. (#35)pk(dataset) and set with pk(dataset) <- variable.inst/ so that other packages that depend on crunch can use the same setup.prop.table computations line up with those not containing array variables (i.e. move subvariables to the third array dimension in the result).names, aliases, and descriptions methods to CrunchCube (corresponding to variables of the dimensions in the cube), MultitableResult (corresponding to the “column” variables of the cubes in the result), and TabBookResult (corresponding to the “row”/“sheet” variables in each multitable result).names method for TabBookResults following an API change.crtabs formula parsing to support multiple, potentially named, measuresweightVariables method to display the set of variables designated as valid weights. (Works like hiddenVariables.)appendDataset, allow specifying a subset of rows to append (in addition to the already supported selection of variables)loadDataset can now load a dataset by its URL.?with_consent for more details.inst/ so that other packages depending on this package can access them more easily.is.derived method for VariablesTabBookResults when the row variable is a categorical arraymultitables method to access catalog from a Dataset. newMultitable to create one. See ?multitables and ?newMultitable for docs and examples.tabBook to compute a tab book with a multitable. If format="json" (the default), returns a TabBookResult containing CrunchCube objects with which further analysis or formatting can be done.bases method for cubes and tab book responses to access unweighted counts and margin tables.saveVersion when there are no changes since the last saved version.roxygen2 6.0.0 release
newFilter and newProject functions to create those objects more directly, rather than by assigning into their respective catalogs.mergeFork.with_consent as an alternative to with(consent(), ...)
delete in favor of the consent context manager.httptest for mocking HTTP and the Crunch API.embedCrunchBox to generate embeddable HTML markup for CrunchBoxesduplicated method for Crunch variables and expressionsas.vector and as.data.frame methods by smarter pagination of requests.ordering print aliases.is.na<- to set missing values on a variable, equivalent to assigning NA
settings(ds)$weight and not just its self URL.crunchBox to make a public, embeddable analysis widgetsettings and settings<- to view and modify dataset-level controls, such as default “weight” and viewer permissions (“viewers_can_change_weight”, “viewers_can_export”)flattenOrder to strip out nested groups from an ordermean, median, and sd, now respect filter expressions, as does the summary method.crtabs
loadDataset from a nonexistent project.dedupeOrder, removeEmptyGroups
appendDataset can now append a subset of variablesflipArrays function to generate derived views of array subvariablesautorollback argument to appendDataset, defaulted to TRUE, which ensures that a failed append leaves the dataset in a clean state.allVariables is now ordered by the variable catalog’s order, just as variables has always been.mergeFork.as_array (pseudo-)function in crtabs that allows crosstabbing a multiple-response variable as if it were a categorical array.merge) a subset of variables and/or rows of a dataset.moveToGroup function and setter for easier adding of variables to existing groups.locateEntity function to find a variable or dataset within a potentially deeply nested order.hiddenVariables from “name” to “alias”, governed by options(crunch.namekey.dataset) as elsewhereoptions(crunch.check.updates=FALSE).session() that lazily fetches catalogs rather than when instantiated.as.vector on a categorical-array or multiple-response variable now returns a data.frame. While a matrix is a more accurate representation of the data type, using data.frame allows for more intuitive accessing of subvariables by $, just as they are from the Crunch dataset.joinDatasets with its (new) default copy=TRUE argument.addSubvariable to PATCH rather than unbind and rebind; also extend it to accept more than one (sub)variable to add to the array.pattern matching argument from makeArray, makeMR, deleteVariables, and hideVariables, deprecated since 1.9.6.deleteSubvariable to follow model of deleteVariable, including requiring consent to delete.options(crunch.namekey.array="name") in your script or in your .Rprofile.deleteSubvariable now follows “crunch.namekey.array” and will take either subvariable names or aliases, depending on the value of the setting.extendDataset function, also aliased as merge, to allow you to add columns from one dataset to another, joining on a key variable from each.compareDatasets now checks the subvariable matching across array variables in the datasets to identify additional conflicts.notes and notes<- methods for datasets, variables, and variable catalogs to view and edit those new metadata fields.name<- on NULL (i.e. when you reference a variable in a dataset using $ and the variable does not exist) returns a helpful message.newDataset when passing a data.frame or similar that has spaces in the column names.toVariable
as.character if you have a factor and want it to be imported as type Text.cleanseBatches function to remove batch records from failed append attempts. Remove deprecated code around batch conflict reporting.datasets and projects functions to get dataset and project catalogs. (datasets previously existed only as a method for Project entities.)project argument to listDatasets and add project and refresh to loadDatasets to facilitate viewing and loading datasets that belong to projects.compareDatasets that shows how datasets will line up when appending. A summary method on its return value prints a report that highlights areas of possible mismatch.crtabs
NULL assignment into Variable/DatasetGroups to remove elementsCrunchExpr, Variable, and Dataset objectsDatetimeVariable and a character vector, assumed to be ISO-8601 formatted.permissions method for Datasets to work directly with sharing privileges.as.data.frame/as.environment for CrunchDataset when a variable alias contained an apostrophe.MemberCatalog.jsonlite API in its v0.9.22exportDataset to download a CSV or SAV file of a dataset. write.csv convenience method for CSV export.icon and icon<- methods for Projects to read the project’s current icon URL and to set a new icon by supplying a local file name to upload.is.archived, is.draft, and is.published (the inverse of is.draft). See ?publish for more.draft argument to forkDataset
owner and owner<- for datasets to read and modify the ownerowners and ownerNames for DatasetCatalogis.editor and is.editor<- for project MemberCatalogme function to get the user entity for yourselfpattern argument for functions including makeArray, makeMR, deleteVariables, and hideVariables is being deprecated. The help pages for those functions advise you to grep for or otherwise identify your variables outside of these functions.unshare to revoke access of a user or a team to a dataset.type<- assignment is safe.CrunchExprs) for greater reliabilitysession() or returned from login(), containing the various catalog resources (Datasets, etc.).names<-.loadDataset with a dataset catalog tuple, allowing some degree of tab completion by dataset name. (Example: cr <- login(...); ds <- loadDataset(cr$datasets$My_Dataset_Name))testthat.useAlias attribute of datasets and move it to a global option, “crunch.namekey.dataset”, defaulted to “alias”. Implement the same for array variables, “crunch.namekey.array”, and default to “name” for consistency with previous versions. This default will change in a future release.as.vector for CrunchExpr to GET rather than POST.forkDataset to make a fork (copy) of a dataset; mergeFork to merge changes from a fork back to its parent (or vice versa)digest package (httpcache depends on it instead).combine categories of categorical and categorical-array variables, and responses of multiple-response variables, into new derived variablesstartDate and endDate attributes and setters for dataset entities (#10, #11)CrunchFilter)ncol(ds) by removing a server requestCrunchExpr): prints an R formula-like expressiondigest package.name(ds$var$subvar) <- value
share
addSubvariable function to add to array and multiple response variables (#7)dropRows to permanently delete rows from a dataset.catalogToDataFrame function.shojiURL, batches)NULL in cube dimension when referencing subvariable that does not exist (as when using alias instead of name) and return a useful message.%in% expression translation.addVariables function to add multiple variables to a dataset efficientlyCrunchExprs and filtered variables in table
crtabs when requesting a crosstab of three or more dimensions.VariableDefinition (or VarDef) function and class for creating variable definitions with more metadata (rather than assigning R vectors into a dataset and having to add metadata after).copy, makeArray, and makeMR, to return VariableDefinitions rather than creating the new variables themselves. Creation happens on assignment into the dataset.NA for categoricals) even if No Data doesn’t already exist?startLog and ?logMessage.copy of a variable. See ?copyVariable.NULL into a dataset when the referenced variable (alias) does not exist.NA assignment into variables./batches/ while waiting for an append to complete. Improves the performance of the append operation.c method for Categories, plus support for creating and adding new categories to variables. See ?Categories and ?"c-categories"
as.vector by specifying a “mode” of “id” or “numeric”, respectively. See ?"variable-to-R"
NA into variables.margin.table on CrunchCube objects.with statements. Use it to give consent() to delete things.<- NULL into a dataset (like removing a column from a data.frame). Requires consent. Also create deleteVariable(s) functions that also return the dataset object. Use either method to prevent your dataset from getting out of sync with the server when you delete variables.deleteSubvariable(s).crtabs to allow you to crosstab array subvariables.[ or subset
exclusion filters on datasets to drop certain rows(un)lock datasets for editing when there are multiple editorssaveVersion and restoreVersion for dataset versioninghttr 1.0; remove dependency on RCurl in favor of curl
appendDataset.duplicates parameter, which is FALSE by default, adding new Groups to an Order “moves” the variable references to the new Group, rather than creating copies. See the variable order vignette for more details.share function for sharing a dataset with other users.New vignettes for deriving variables and analyzing datasets.
Update appending workflow to support new API.
Add query cache, on by default.
as.data.frame now does not return an actual data.frame unless given the argument force=TRUE. Instead, it returns a CrunchDataFrame, and environment containing unevaluated promises. This allows R functions, particularly those of the form function(formula, data) to work with CrunchDatasets without copying the entire dataset from the server to local memory. Only the variables referenced in the formula fetch data when their promises evaluated.
Remove RJSONIO dependency in favor of jsonlite for toJSON.
crunch. Update all docs to reflect that. Make amendments to pass CRAN checks.newDataset2 renamed to newDatasetByCSV and made to be the default strategy in newDataset. The old newDataset has been moved to newDatasetByColumn.
Support for NA and NaN in crtabs response.