This method is defined principally so that you can use a CrunchDataset
as
a data
argument to other R functions (such as stats::lm()
) without
needing to download the whole dataset. You can, however, choose to download
a true data.frame
.
# S3 method for class 'CrunchDataset'
as.data.frame(
x,
row.names = NULL,
optional = FALSE,
force = FALSE,
categorical.mode = "factor",
row.order = NULL,
include.hidden = TRUE,
...
)
# S3 method for class 'CrunchDataFrame'
as.data.frame(
x,
row.names = NULL,
optional = FALSE,
include.hidden = attr(x, "include.hidden"),
array_strategy = c("alias", "qualified_alias", "packed"),
verbose = TRUE,
...
)
a CrunchDataset
or CrunchDataFrame
part of as.data.frame
signature. Ignored.
part of as.data.frame
signature. Ignored.
logical: actually coerce the dataset to data.frame
, or
leave the columns as unevaluated promises. Default is FALSE
.
what mode should categoricals be pulled as? One of factor, numeric, id (default: factor)
vector of indices. Which, and their order, of the rows of
the dataset should be presented as (default: NULL
). If NULL
, then the
Crunch Dataset order will be used.
logical: should hidden variables be included? (default: TRUE
)
additional arguments passed to as.data.frame
(default method).
Strategy to import array variables: "alias" (the default) reads them as flat variables with the subvariable aliases, unless there are duplicate aliases in which case they are qualified in brackets after the array alias, like "array_alias[subvar_alias]". "qualified_alias" always uses the bracket notation. "packed" reads them in what the tidyverse calls "packed" data.frame columns, with the alias from the array variable, and subvariables as the columns of the data.frame.
Whether to output a message to the console when subvariable aliases are qualified when array_strategy="alias" (defaults to TRUE)
When called on a CrunchDataset
, the method returns an object of
class CrunchDataFrame
unless force = TRUE
, in which case the return is a
data.frame
. For CrunchDataFrame
, the method returns a data.frame
.
By default, the as.data.frame
method for CrunchDataset
does not return a
data.frame
but instead CrunchDataFrame
, which behaves like a
data.frame
without bringing the whole dataset into memory.
When you access the variables of a CrunchDataFrame
,
you get an R vector, rather than a CrunchVariable
. This allows modeling functions
that require select columns of a dataset to retrieve only those variables from
the remote server, rather than pulling the entire dataset into local
memory.
If you call as.data.frame()
on a CrunchDataset
with force = TRUE
, you
will instead get a true data.frame
. You can also get this data.frame
by
calling as.data.frame
on a CrunchDataFrame
(effectively calling
as.data.frame
on the dataset twice)
When a data.frame
is returned, the function coerces Crunch Variable
values into their R equivalents using the following rules:
Numeric variables become numeric vectors
Text variables become character vectors
Datetime variables become either Date
or POSIXt
vectors
Categorical variables become either factors with
levels matching the Crunch Variable's categories (the default), or, if
categorical.mode
is specified as "id" or "numeric", a numeric vector of
category ids or numeric values, respectively
Array variables (Categorical Array, Multiple Response) can be decomposed into
their constituent categorical subvariables or put in 'packed' data.frame columns,
see the array_strategy
argument.
Column names in the data.frame
are the variable/subvariable aliases.