## Using R’s ggplot2 with Stata

Stata is great, but it’s true that R makes prettier graphs, especially when you make use of the outstandingly excellent ggplot2 Thanks to Roger Newson we can have both. This post walks you through exploiting ggplot2 directly from Stata. I’ve tested this with both Stata 13 and Stata 14 on Windows 7 on two different computers.

Here are the steps:

• First, you need the foreign and ggplot2 packages installed in R. Install them using the install.packages() command, e.g. install.packages("ggplot2")
• Second, you need the rsource package installed in Stata. You can do this with the ssc inst rsource command.
• Third, you need to find the R terminal program, named Rterm.exe. For me, this is located in C:/Program Files/R/R-3.3.0/bin/x64/Rterm.exe. You then need to change the line in the code that begins with global Rterm_path to wherever Rterm.exe is on your computer.

Then you can use this code (click here for download). It’s well commented below so you should be able to understand what it’s doing. It produces the graph you can see above.

** Open up R interactively through Stata
** Boyd Center / Economics
** University of Tennessee, July 2016

** For sample purposes, let's use the auto dataset. Obviously you change this to your data.
clear
sysuse auto

***********************************************
** You need to adjust this. Find the location of Rterm.exe on your machine
***********************************************
global Rterm_path "C:/Program Files/R/R-3.3.0/bin/x64/Rterm.exe"'

** This records Stata's present working directory in R-compatible format
local r_pwd = subinstr("c(pwd)'","","/",.)

** Temporarily make a copy of the dataset in a format R will probably understand
saveold holderfile.dta, version(12) replace

** Start R via rsource, pass your present working directory to it
rsource, terminator("end_r_stata") roptions(" --vanilla --args "r_pwd'" "')

## We're now in R, so switching the comment designation from star to hash
## Stata may mention an error anytime you include an R comment. Don't worry about it.

## Use the argument (i.e. pwd) passed via Stata and move to it
stata_pwd = commandArgs(trailingOnly=TRUE);
setwd(stata_pwd[1]);

library("foreign");
library("ggplot2");

## Read and then delete the data
file.remove("holderfile.dta")

## Draw the graph
ggplot(df, aes(x=mpg, y=price, color=foreign)) +
geom_point() +
ggtitle("Ahh, lovely R graphs through Stata")

## Save both a PDF and PNG version
ggsave("ggplot_stata.pdf")
ggsave("ggplot_stata.png")

## And now you stop using R
end_r_stata

And there we have it. Transferring the data over to R and generating nice ggplots without ever leaving Stata. Of course you don’t need to restrict yourself to ggplots. With this basic idea you can use any of R’s capabilities directly from Stata.

Fitting tables to the width of a page

## Fitting tables to the width of a page

Ever have the problem of Stata regression output being a little too wide? Worry no more.

I use Ben Jann’s excellent esttab to export Stata regressions into LaTeX documents.

My only problem with esttab is that the tables can be too wide, i.e. wider than the width of the text in the PDF. So I made a few edits to esttab that automatically scale the tables to the text-width.

I have called this program estwide. You can download it here. As it is based on estout, Ben Jann should be considered a co-author. Click here to see an example of its effect. (If you wish to replicate the above example, you can download the associated do-file here and the TeX file here.)

To use estwide:
1. Make sure estout is installed. To do this, in Stata type ssc inst estout, replace
2. Save estwide.ado to the same folder that estout is now installed in. You can check the folder by typing which estout
3. Restart Stata.
4. Rather than exporting your tables using the esttab command, simply replace esttab with estwide, e.g. estwide using hello.tex, style(tex) replace
5. Make sure you have both the adjustbox and booktabs LaTeX packages installed.
6. Make sure you have called both of these packages up by including \usepackage{booktabs} and \usepackage{adjustbox} in the header of your LaTeX file.
7. Include your tables as normal. You can copy and paste the output into your TeX file, or have the tables update automatically when you make changes by using \input{myfilename}.

Update, September 2017: after some emails from people, I have two things to add. Firstly, estwide seems to work much better if you include a caption to the table.

Secondly, if you have a problem with the caption appearing on one page and the table itself on another, wrap the input in a LaTeX table. For example, this code works well for me:
\begin{table}[ht]
\input{myfilename}
\end{table}

Stata’s default background colour

## Stata’s default background colour

In case you want to find out the name of the Stata’s default background colour for graphs, as I did recently, its name is “ltbluishgray”. Thanks to the Stata Daily blog, its Red Green Blue (RGB) value is 234,242,243. Its hexadecimal value is #EAF2F3.

Cohort grouping script

## Cohort grouping script

Suppose you have data on year of birth, but you want to group several years together, e.g. group 1950, 1951 and 1952 births together; 1953, 1954, and 1955 together, etc.

Below is some JavaScript code I wrote to generate the relevant Stata commands without much fuss. You only have to make minor adjustments: Enter the start year (e.g. 1950), the end year (e.g. 1955), and the interval length (e.g. 3 years).

<html><body><script>
//*** Generate Stata Code to replace cohort groups *****
//*** Just replace the following three variables and refresh the page

var start_year = 1881;
var end_year = 1990;
var interval = 5;

//*** You're done. Or at least you should be.

var c;
var a;
var backup1 = start_year;
var backup2 = end_year;
a = end_year - start_year;
a = a/interval;
a = Math.ceil(a);
a=a+1;

document.write("gen cohort_group = 0 <br />");
for(i=1;i<a;i++)
{
c=start_year+interval;
document.write("replace cohort_group = " + i + " if cohort > " + (start_year-1) + " & cohort < " + c + "<br />");

start_year = start_year+interval;

}

start_year = backup1;
end_year = backup2

document.write("<br />recode cohort_group ");
document.write("( 0 = 0 \"Other\" ) ///<br />");
for(j=1;j<a;j++)
{

if(j<(a-1))
{
c=start_year+interval-1;
document.write("( " + j + " = " + j + " \"" + (start_year) + " - " + c + "\" ) ///<br />");
start_year=start_year+interval;
}
if(j==a-1)
{
c=start_year+interval-1;
document.write("( " + j + " = " + j + " \"" + (start_year) + " - " + c + "\" ), gen(cohort_clean)<br />la var cohort_clean \"Birth Cohort\"<br />");
start_year=start_year+interval;
}
}
</script></body></html>


I wrote this with a five-year interval in mind so I cannot guarantee you won’t run into an integer problem with the last entry, etc. However, it should get you most of the way there. Enjoy!