In the web application, variables in a dataset are displayed in a list on the left side of the screen.
Typically, when you import a dataset, the variable list is flat, but it can be organized into an accordion-like hierarchy. The variable organizer in the Crunch GUI allows you to organize your variables visually, but you can also manage this metadata from R using the
This variable hierarchy can be thought of like a file system on your computer, with files (variables) organized into directories (folders). As such, the main functions you use to manage it are reminiscent of a file system.
cd(), changes directories, i.e. selects a folder
mkdir()makes a directory, i.e. creates a folder
mv()moves variables and folders to a different folder
rmdir()removes a directory, i.e. deletes a folder
Like a file system, you can express the “path” to a folder as a string separated by a “/” delimiter, like this:
mkdir(ds, "Brands/Cars and Trucks/Domestic")
If your folder names should legitimately have a “/” in them, you can set a different character to be the path separator. See
?mkdir or any of the other functions’ help files for details.
Paths can be expressed relative to the current object—a folder, or in this case, the dataset, which translates to its top-level
"/" root folder in path specification—and the file system’s special path segments
".." (go up a level) and
"." (this level) are also supported. We’ll use those in examples below.
You can also specify paths as a vector of path segments, like
which is equivalent to the previous example. One or the other way may be more convenient, depending on what you’re trying to accomplish.
These four functions all take a dataset or a folder as the first argument, and they return the same object passed to it, except for
cd, which returns the selected folder. As such, they are designed to work with
magrittr-style piping (
%>%) for convenience in chaining together steps, though they don’t require that you do.
To get started, let’s pick up the dataset we used in the array variables vignette and view its starting layout. We can do that by selecting the root folder (“/”) and printing it
print() isn’t strictly necessary here as
cd will return the folder and thus it will print by default, but we’ll use different
It’s flat—there are no folders here, only variables. If you’re importing data from a
data.frame or a file, like an SPSS file, this is where you’ll begin.
Let’s make some folders and move some variables into them. To start, I know that the demographic variables are at the back of the dataset, so let’s make a “Demos” folder and move variables 21 to 37 into it:
Now when I print the top-level directory again, I see a “Demos” folder and don’t see those demographic variables:
mv() can reference variables or folders within a level in several ways. Numeric indices like we just did probably won’t be the most common way you’ll do it: names work just as well and are more transparent. Let’s move the first variable,
perc_skipped, into “Demos” as well
A side note: although the last step of that chain was
cd(), we haven’t changed state in our R session. There is no “working folder” set globally.
cd()is a function that returns a folder; if we had assigned the return from the function (pipeline) to some object, we could then pass that in to another function to “start” in that folder.
Another way we can identify variables is by using the
contains. Let’s use
matches to move all of the questions about Edward Snowden or Bradley (Chelsea) Manning to a folder for the topical questions in this week’s survey:
We can also select all variables in a folder using the
variables function (or all folders within a folder using
folders). Let’s move all remaining variables from the top level folder to a folder called “Tracking questions”. To do this, we do need to explicitly change to the top level folder.
(Curious about the “dot” notation? See the magrittr docs.)
The reason we change to the top level folder here is that there is a subtle difference between passing
cd(ds, "/"). Whatever object, dataset or folder, that is passed into
mv() determines the scope from which the objects to move are selected. If you pass the dataset in, you can select any variables in the dataset, regardless of what folder they’re in. If you pass in a folder, you’re selecting just from that folder’s contents. It can be convenient to find all variables that match some criteria across the whole dataset to move them, but sometimes we don’t want that. In this case, we wanted only the variables sitting in the top level folder, not nested in other folders, so we wanted
variables(cd(ds, "/")) and not
Now, our variable tree has some structure. Let’s use
print(folder, depth = 1) to see these folders and their contents one level deep:
We can create folders within folders as well. In the “This week” folder, we have a set of questions about Edward Snowden. Let’s nest them inside their own subfolder inside “This week”:
Note how we used
".." to change folders up a level, as you can in a file system . We did that just so we can print the folder structure at the top level (and to illustrate that you can specify relative paths :).
You could also do this using the full path segments.
mkdir will recursively make all path segments it needs in order to ensure that the target folder exists.
Folders themselves have names, which we can set with
We can also set the names of the objects contained in a folder with
ds %>% cd("Demos") %>% setNames(c("Birth Year", "Gender", "Political Ideo. (3 category)", "Political Ideo. (7 category)", "Political Ideo. (7 category; other)", "Race", "Education", "Marital Status", "Phone", "Family Income", "Region", "State", "Weight", "Voter Registration (new)", "Is a voter?", "Voter Registration (old)", "Voter Registration"))
Unlike files in a file system, variables within folders are ordered.
Let’s move “Demographics” to the end. One way to do that is with the
setOrder function. This lets you provide a specific order, but it requires you to specify all of the folder’s contents. Let’s use that function to put “Tracking questions” first: