So, I happen to be sitting here with an HP fan who is formally trained in stats and related methodology. Her supposition is that constructing a good sample would have to involve a lot more randomness. So, for instance, going on all the major archives and taking 5% of all NC-17/M H/D fics and analyzing content.
(Statistician friend also adds that she would be interested to see if there's a difference by venue. She reads primarily on ff.net and in her experience they either switch or Harry tops. I read on AO3 and would say they mostly switch or do other stuff (but I also dgaf and can't be arsed to remember most of the time)) (Also re: the time involved in actual reading, I wonder if one could use python or other content analysis software, though that's sadly just above my methodological paygrade. Who knew this is why I'd want to have made the investment? Hmph.)
But - doing that would take a tremendous amount of time, and it would be incredibly difficult to parse by year and establish chronological trends, so. That.
Also, so important to say, and I realize I haven't said it - this data is tremendously impressive!! Especially where dating is concerned. Avoid errors would be impossible given the information you have to work with, and the time you've put into the dating is a gift. Seeing the chronological shift is among the most fascinating pieces of this to me. I could spend aaaaaagggggeeeees speculating about why that is. Among other things. Kind of like I'm doing. So, this data is a goldmine and I'm all for you doing anything you feel like doing with it and think it's really really wonderful to have! Whatever the methodological whatever, it's still scads more than we had, you know?
no subject
Date: 2015-03-20 04:06 am (UTC)(Statistician friend also adds that she would be interested to see if there's a difference by venue. She reads primarily on ff.net and in her experience they either switch or Harry tops. I read on AO3 and would say they mostly switch or do other stuff (but I also dgaf and can't be arsed to remember most of the time))
(Also re: the time involved in actual reading, I wonder if one could use python or other content analysis software, though that's sadly just above my methodological paygrade. Who knew this is why I'd want to have made the investment? Hmph.)
But - doing that would take a tremendous amount of time, and it would be incredibly difficult to parse by year and establish chronological trends, so. That.
Also, so important to say, and I realize I haven't said it - this data is tremendously impressive!! Especially where dating is concerned. Avoid errors would be impossible given the information you have to work with, and the time you've put into the dating is a gift. Seeing the chronological shift is among the most fascinating pieces of this to me. I could spend aaaaaagggggeeeees speculating about why that is. Among other things. Kind of like I'm doing. So, this data is a goldmine and I'm all for you doing anything you feel like doing with it and think it's really really wonderful to have! Whatever the methodological whatever, it's still scads more than we had, you know?