- For large data sets, use a Length statement to reduce the size of variables.
- If you have long character strings, consider leaving them out or using a FORMAT to convert between strings and shorter codes.
- For PROC GLM, if you have categorical variables with large numbers of levels, use ABSORB statement when appropriate.
- In PROC MIXED, speed can depend on how the model is specified. For example, using RANDOM INTERCEPT/SUBJECT=xxx can be faster than RANDOM xxx.
- For large multilevel models in PROC MIXED, consider using specialized software such as MLWin or HLM instead.
- Determine whether you need all of your variables in the working dataset. Space and computing time may be saved by retaining long character strings or extraneous variables in a separate dataset. Non-essential variables can be merged back into the main datasets when needed.
- Use built-in commands rather than commands implemented in ado-files if a built-in command is available with the appropriate functionality.
- On the research grid, use stata-large or stata-xl only when you need more memory than the standard stata wrapper will provide you.
- If you are getting unneeded output (e.g. with "by" group processing), use "quietly".
- Avoid macro variable loops if possible - substitute vector-oriented data set processing.
- Use sparse matrices where applicable.
- Use the profiler to identify sections of code that are using the most execution time and optimize those.
- Use vector and matrix operations rather than loops.