Monday, November 14, 2016

NSDUH R-DAS resurrected as less useful somewhat clunkier PDAS



Details here, but do some exploration:

http://pdas.samhsa.gov/#/?_k=6dmm20

I can see no STATE nor REC nor ELG nor other useful constructed variables that formerly were in R-DAS (e.g., ungrouped age in years).

It looks as if PDAS draws on the same dataset as one would use for online SDA analyses, based on a couple of checking runs I did.

For example, here is the unweighted count and weighted count for age and sex of adults in SDA for 2013 and in PDAS for 2013. I am not seeing much difference.
SDA:
https://drive.google.com/open?id=0B9Ud1oy4LsimeVNsZE5leEdIUXM
PDAS:
https://drive.google.com/open?id=0B9Ud1oy4LsimRC1wOFoxU3VGVG8



Here is pair of SDA and PDAS runs on crack flag and cocfyu, also with minor variations that suggest use of same datafile.
SDA:
https://drive.google.com/open?id=0B9Ud1oy4LsimS0Z5U1ZwNXNTYm8

PDAS:
https://drive.google.com/open?id=0B9Ud1oy4LsimeUp0cWVMZGdWTkk

It would appear that PDAS functionality has been reduced in the shift of the subcontract from ICPSR to RTI, sadly.

I tried and was able to achieve a "Network Request Failed" error message as shown here in an attempt to get an age-specific estimate of the probability of starting to drink in 2011 and continuing to drink in the month prior to assessment in 2013. (Always pays to push the system to its limits, sooner rather than later.) I am not sure whether there is a "confidentiality" rule at play, or something akin to a CPU limit being set. Someone should push it and then ask the help desk with a concrete example.

https://drive.google.com/open?id=0B9Ud1oy4LsimNWxTSml5clFfczg

To end on a positive note, it now is possible to export a PDAS table to CSV format file, and that is a plus. (Look on PDAS output down at the bottom of the resulting PDAS table for a link to create the CSV file, near the documentation of the date of your run and the dataset used.)

We'll have to see whether we can get our friends at SAMHSA to restore functionality and utility of PDAS to the level of the former R-DAS, but the lesson for today is that the restricted data portal has become more central to anyone wishing to harvest new evidence from this important public health research resource. Given a choice between PDAS as presently constituted versus offline SDA dataset analyses, I'd always choose the latter, unless someone can show me PDAS variables not in SDA version of that dataset, or someone comes up with a good reason to favor PDAS. At present, not sure why it was commissioned. Seems redundant with online analyses via the ICPSR portal. 

[No sign of a batch option. Clunky point and click, and I could not find a way to type in known variable names, nor to recode a variable. Alas. A backdoor for recoding might be found by using a CONTROL variable as in my network failure example, but with taking advantage of informatively combined variables. For example, there is a variable for cannabis CF conditioned on (gated by) use in the year prior to assessment. Put the CF variable in the Row position, put the age of first cannabis use in the Column position, and put the AGE2 variable in the Control position. In this way, you might be able to estimate the fine-grained age-specific occurrence of the CF for newly incident users who started two calendar years before the assessment year, versus the occurrence of CF for newly incident users who started in the calendar year before the year of assessment, unless network failure is encountered. Here, the trick would be to take the analysis-weighted-age-specific numerators for each CF from PDAS, age by age. Repeat for MJFLAG to get weighted number of ever users as of each age in that year, with approximations when AGE2 is binned.  Then get the age-specific population count from the US Census, subtract the ever users from the age-specific census count, add back in the weighted number of newly incident users so that the "at risk" count grows properly, and form the ratio of the weighted number of newly incident users with CF, age by age, in each numerator and the size of the "at risk" population, age by age in the denominator. The PDAS would give you this for 12-21-year olds, and you could check your derived (concocted?) estimates against those values, but it would give you a way to study the age-specific cannabis CF incidence for age by age subgroups older than 21 years. Come to think of it, you can do this in SDA more readily, so why do it with PDAS until PDAS gets more variables not in SDA.]

Oh well.


No comments:

Post a Comment

Comments to this blog are moderated. Urgent or other time-sensitive messages should not be sent via the blog.