Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fun paper.

If I may highlight, though, your sample is completely flawed:

1. The close vote queue's backlog is such that most bad questions remain open and therefor don't ever get a chance to be deleted in the first place unless (as you've noted) a moderator randomly runs into them months later.

2. A problem experienced by SO is one-off accounts: users that get question banned end up coming back for more with a new account, and leave the abandonned ones unmaintained. Where a normal user may end up deleting their downvoted questions, an abandonned account's user will not.

Put another way, your sample would be a heck of a lot larger if everything that should likely get deleted would be.



Thanks.

> If I may highlight, though, your sample is completely flawed:

I would respectfully disagree. :)

> The close vote queue's backlog is such that most bad questions remain open and therefor don't ever get a chance to be deleted in the first place unless (as you've noted) a moderator randomly runs into them months later.

A question could be deleted even if it was not closed earlier. Closed Question is just "one way" for a question to be deleted (Refer Figure 1). In fact, only 14.38% of the questions which were deleted were marked as 'Closed' (Refer Table 6). Your intuition about "moderator random runs" was however found to be true. In case, you are suggesting that most 'closed' questions should be considered as future 'deleted' that is a call IMO would not like to make. In any case, we found that deleted questions are more poor in quality than closed questions (Refer Section 4.5 Sub section "Quality Pyramid"). In that light, it would make sense to treat them as separate entities.

TL;DR - Deleted questions are beyond repair but Closed questions could be patched up

> A problem experienced by SO is one-off accounts: users that get question banned end up coming back for more with a new account, and leave the abandonned ones unmaintained. Where a normal user may end up deleting their downvoted questions, an abandonned account's user will not.

This may be true but it does not necessarily mean there is a problem with the dataset sample. It simply means that the problem is more challenging :)

> Put another way, your sample would be a heck of a lot larger if everything that should likely get deleted would be.

Since all closed questions do not get deleted, this may not necessarily be true. But, since you suggested closed questions, we had a prequel to our deleted questions paper which was about analysis and prediction of closed questions [0, 1]. You might like that as well!

[0] https://dl.dropboxusercontent.com/u/19882021/pdf/cosn-2013.p...

[1] http://cosn.acm.org/2013/files/Session7/Session7Paper4.pdf




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: