Software-specific Data Recommendations


  • For large data sets, use a Length statement to reduce the size of variables.
  • If you have long character strings, consider leaving them out or using a FORMAT to convert between strings and shorter codes.
  • For PROC GLM, if you have categorical variables with large numbers of levels, use ABSORB statement when appropriate.
  • In PROC MIXED, speed can depend on how the model is specified. For example, using RANDOM INTERCEPT/SUBJECT=xxx can be faster than RANDOM xxx.
  • For large multilevel models in PROC MIXED, consider using specialized software such as MLWin or HLM instead.
  • Determine whether you need all of your variables in the working dataset. Space and computing time may be saved by retaining long character strings or extraneous variables in a separate dataset. Non-essential variables can be merged back into the main datasets when needed.


  • Use built-in commands rather than commands implemented in ado-files if a built-in command is available with the appropriate functionality.
  • On the research grid, use stata-large or stata-xl only when you need more memory than the standard stata wrapper will provide you.
  • If you are getting unneeded output (e.g. with "by" group processing), use "quietly".
  • Avoid macro variable loops if possible - substitute vector-oriented data set processing.


  • Use sparse matrices where applicable.
  • Use the profiler to identify sections of code that are using the most execution time and optimize those.
  • Use vector and matrix operations rather than loops.