Currently in a PhD program?

You’ve survived coursework, now comes the fun (and sometimes messy) part! Doing research, managing data and teams, writing papers, giving talks, and eventually going on the job market.

A lot of what I wrote below comes from Marc Bellemare’s “Doing Economics“, I would strongly recommend reading the book right after you are done with your coursework.

This is from my own experience, as I am towards the end of my PhD program now (depending on when you are reading this blog). I have gathered the below resources that I found very useful in my journey and thought could be useful to share with other fellow PhD students.

Thinking formally about your research question

Pick a question you can answer credibly with data you can actually get. Write down (in plain text) your estimand (what parameter you want), identification strategy (why your design is persuasive), and early diagnostics (variation, sample size, power). Early research wins come from doable questions with clean designs and data you can actually get.

Two excellent toolkits for thinking formally about design (and for simulating power/robustness before you collect data) are DeclareDesign and randomizr. They help you specify designs, simulate data, and diagnose power/bias, even generate reproducible treatment assignments.

If you’re learning (or refreshing) causal inference, Scott Cunningham’s free, web-based Causal Inference: The Mixtape chapters on panel data and DiD are a fast way to orient yourself and your RA(s).

Designing your study

When you sketch a design, make the threat model explicit (selection, omitted variables, spillovers, bad counterfactuals). For difference-in-differences and event studies (now standard in applied micro) be aware of problems with two-way fixed effects under staggered adoption. Reach for modern estimators and diagnostics, e.g., Callaway–Sant’Anna’s DID package (R) and Stata’s csdid, and read concise cautions (Borusyak–Jaravel–Spiess; Sun–Abraham).

The World Bank’s DIME and Development Impact blog maintain curated, field-tested methods posts across topics. Use implementations that handle treatment effect heterogeneity. If you’re lost (like I often am) with all the new papers you can re-orient yourself with Asjad Naqvi’s roundup.

If you’re designing an experiment start early with power calculations. Power depends on variance, intra-cluster correlation, and take-up, things you can often learn from pilots or administrative data. For checklists, see J-PAL’s quick guide and Stata/R tools; EGAP’s “10 things to know about power”; and the World Bank’s DIME Wiki pages on power and design trade-offs. When clustering or stratifying, I would recommend revisiting McKenzie/Bruhn’s practical posts here, here, and here.

If you run an experiment (or even a high-stakes observational analysis, this is one of mine as an example), consider a short pre-analysis plan and public registration. Economics journals increasingly expect it; the AEA RCT Registry is the main venue, with OSF as complements (note that EGAP has stopped accepting resignation). J-PAL has step-by-step registry guidance; EGAP’s “10 Things to Know about PAPs” keeps plans realistic.

Building a reproducible data workflow and data governance

The earlier you start with this the easier it is down the road (2-3 years after you’ve started your project). Adopt a “push-button” project: a clean repo with a README, data dictionary, environment file, and scripts that rebuild everything from raw to final. For an end-to-end playbook, start with Gentzkow & Shapiro’s Code and Data for the Social Sciences; pair it with IPA’s Best Practices for Data and Code Management, and the DIME Wiki guide on data encryption for PII or sensitive data. If you use Stata, Julian Reif’s Stata Coding Guide and Tal Gross’s “Best Practices” are concrete, opinionated templates, that I found very useful myself when I started my first project as a co-PI.

Speaking of collaborating with other researchers, I found this handbook by the World Bank’s DIME very useful. I learned a lot in terms of data setup, coding, and data sharing best practices.

Protect people and your future self. Classify your data risk level, set access rules, and scan for PII before pushing code. Start with Harvard’s Data Risk Classification, J-PAL’s Data Security Procedures, their PII scanning tool for repos, and the AEA Data Editor guidance on what journals expect for reproducibility and archiving. It is not fun policing IRB protocols when you have several different collaborators, and when your institution is the main IRB institution. Believe me! I’ve done it for 3 years in one of my projects. But if you ever have to, the above resources helped me a lot do it, and would hopefully be helpful to you too.

Writing (from idea to paper) and presenting

I often get this advice from my mentors, that my dissertation work is probably not my best work. Which makes sense! I always tell myself that my JMP is probably not going to cure cancer. So get started with your idea and work on it, improve it, present it, get feedback, and start over if needed.

Great papers are clear papers. I was lucky to learn about writing from Marc Bellemare. Over the years I also found several resources that helped me a lot when writing my dissertation. For introductions, Keith Head’s Introduction Formula, CGD’s “How to Write the Introduction of Your Development Economics Paper”, and Bellemare’s Conclusion and “Middle Bits” formulas are concise, concrete starting points. For sustained writing momentum, Jason Kerwin’s “How to get your paper done” is always a great read. Ryan B. Edwards also curates a terrific list of writing resources.

On presenting.. I’ve come to learn that your first talks should sell the question + design + main result. For structure and slide craft, see Jesse Shapiro’s “How to Give an Applied Micro Talk,” AEA’s Top 10 Presentation Tips (CSWEP), and AEA’s Best Practices for Economists particularly the part on running constructive seminars. DIME’s “Gotcha!?” post is a smart list of common seminar traps and how to handle them.