There are a fair few things that can be done to optimise a particle engine. Off the top of my head I would recommend some of these...
Calculate a bounding volume (bounding box or sphere) for your particle system so that you can then cull this against your view frustum. If outside your view then simply don't render the particle system - although you still may (or maybe not) wish to update those particles in that system.
Use a fixed array size for the particles in a given system. (As IndirectX hints at in the post above). Alternatively a method whereby you use two linked lists to keep track of the particles can be used - one tracking live particles, the other tracking 'dead' particles. Whichever method you use you do not want to be creating and deleting particles on the fly too often (i.e. using, say, new and delete operators) as this can seriously dent performance. Easy solution? Use a fixed array and flag particles as dead or alive.
Do you use quads (two triangles) for rendering your particles? If you do then try using triangles and mapping your particle texture onto a single triangle. You will need to modify the uv's to display the texture correcly, but you immediately decrease the number of triangles you need to render by two.
Ensure you do not call SetTexture for *each* and every particle. Group particles by texture. It is usually common for a particle system to produce particles that utilise only one texture (for example, a snowflake within a snowflake particle system). This way you set the texture once for the system then proceed to render the individual particles contained within that system.
Maybe consider enabling alpha testing - this can then reject fragments of your particles before the final rasterisation stage.
If you are calculating the inverse view matrix for billboarding then ensure you only do this once per frame (or, if your implementation suggests so, less than once per frame). This matrix need not be calculated per particle per frame - although a lot of implementations I have seen in the past calculate this per particle!
Ok - there are several more things you can do to optimise, but at least the above gives an idea as to what to look for, or be aware of. Hopefully this will help you out a bit!