General data management and efficiency best practices
- Consider reviewing the Strongly Recommended References on our Other Data Resources page.
- For large projects, keep a README file in the top level directory with a project summary including who was involved, dates, and a listing of the directory stucture and imporant files within that project folder. Avoid unnecessary creation of data sets - combine multiple data steps into a single step if possible.
- Keep files zipped or compressed if you aren't using them.
- Check for duplicate files when sharing a project folder with multiple users.
- Do not keep duplicate copies of raw data in different software formats.
- Avoid keeping unnecessary interim data sets.
- Store common sub-expressions in variables rather than re-computing them.
- Identify which portions of the program are using the most time. In Stata, "set rmsg on" causes the run time to be displayed after each command; in MATLAB, use the "tic" and "toc" functions to compute elapsed time.
Optimization and maximum likelihood (any language)
- Supply analytic derivatives and Hessian if possible.
- Supply good starting values (for example, if bootstrapping, use the parameter values from the original data set as starting values for the bootstrap samples).
- If calculations don't depend on the parameters being estimated, move them outside the likelihood or objective functions calculations so they are only done once, and save results in global variables.