Page 1 of 1

Ideas for statistics, data, and input fields on all pedias

Posted: Sun Jan 01, 2017 2:40 pm
by is89
1.0 Statistics options and ideas regarding data:
There are some standard fields — but also some custom fields I added — that I wanted to add to statistics, but the displaying option wasn't fitting for the type of data.
For example:
  • Gamepedia: Game size [GB] (custom)
    Gamepedia: Age ratings
    Bookpedia: Audiobook length [hh:mm] (custom)
    All pedias: Price
1.1 Categories
The simplest option would be, to display data types like game size, price, and length in the same way as the page count statistic in Bookpedia (N/A; 201-300; 301-500; 501-1000; 1000+).
Image
Just without fixed categories (I'm assuming the page count has static categories, please excuse and correct me if I'm wrong), but automatically adjusting categories, depending on average or some other value. For example: If the majority of games is under 50GB, and only 1 or 2 above, there would be these categories:
  • N/A
    < 1GB
    1GB - 5GB
    5GB - 10GB
    10GB - 20GB
    20GB - 30GB
    30GB - 40GB
    40GB - 50GB
    >50GB
But if a certain percentage of all games (let's say 15%) is above 50GB, the categories adjust accordingly:
  • N/A
    < 1GB
    1GB - 5GB
    5GB - 10GB
    10GB - 15GB
    15GB - 25GB
    25GB - 50GB
    50GB - 75GB
    75GB -100GB
    >100GB

1.1 Histogram
But a graphical representation of the data would be better.
Something like this would be nice (e.g. "1€" stands for 0€ - 1€, "10€" means 5€ - 10€, and so on):
Image

Option 1: The bin width and number has to be entered manually.
Option 2: The bin width changes, depending on the distribution and total number of entries.
In the example above, the bin width is set to 5. The average count per bin is 1.7 (39 counts/22 bins), but it looks rather scattered with most bins above 50€ being empty. Now the program should divide the bin range in two halves and check, wether one of them has a significantly higher avg. count per bin than the other. In the example above, the avg. count per bin for the bottom half (1€ - 50€) would be 3.18, while the upper half (55€ - 100+€) only has 0.36. Now the bottom half keeps the bin width, while the upper half is reduced to as many bins as needed to achieve an average as close as possible to 3.18.

Code: Select all

4/x = 3.18

4/3.18 = x

x = 1.25 

1.25 ≈ 1 
So, the closest bin number to 1.25 would be 1, which means: one bin for the upper half. We now have:
  • N/A: 1 bin
    bottom (<1€): 1 bin
    price range (1€ - 50€): 10 bins
    top (>50€): 1 bin
A total of 13 bins.


Now let's say, the avg. count per bin is equal for both halves, but too low, resulting in scattered and empty bins. Summarizing one half into fewer bins isn't an option. Now the bin width has to be adjusted, to reach a higher average. For example: Still bin width 5, but only 17 entries. Range from 1€ - 60€, resulting in 13 bins (12 + 1 for N/A).
The average count per bin should be 1.5 or more — because below that, a lot of bins would just show 1, or a lot of them would show higher numbers, while many others are emtpy. This wouldn't be very useful as a graphical representation, because you would be less likely to see any tendencies or concentration of entries.

Code: Select all

17 entries / 13 bins  = 1.3 entries/bin

17/x = 1.5

17/1.5 = x

x = 11.33

11.33 ≈ 11
11 bins it is. We now have:
  • N/A: 1 bin
    price range (1€ - 60€): 11 bins
A total of 12 bins.

Option 3: I'm sure you have more insight into the possibilities of programming something like this and could work out an algorithm that works better than my suggestions.



1.3 Data interpretation:
For one of the fields I mentioned, the graphical representation doesnt' work as neatly: The age rating. Be it a game, movie, or even music rating.

If I have
  • PEGI: 3; 7; 12; 16; 18; RP (alsways entered as "PEGI {number}")
    USK: 0; 6; 12; 16; 18; RP (entered as "USK {number}" or "USK RP" or "USK Rating Pending")
    ESRB: 3; 6; 10; 13; 17; 18; RP or eC; E; E10+; T; M; AO; RP (entered as "ESRB {number}" or "ESRB {abbreviation}"
    Age rating for board games: any number, as they vary widely (entered as "{number}")
on the histogram, that's a fuckton of bars. Of course I could just enter the number without "PEGI" or "USK", but then the information would be lost, which rating board approved the game. And of course people will mainly use just one of the three rating boards, depending on their location. But still, some mixed ratings do occur, and then it would be nice to have ratings for the same age show up combined on the histogram.

Option 1: One solution could be: Before the values are used for the histogram, they are parsed to a number or other category:
  • N/A (when field is empty)
    Rating pending: change everything that contains either "RP" or "Rating pending" or just "pending" to Rating pending or — depending on space below the histogram — to a short RP.
    Numbers: change every expression with a number in it to just the number. "USK {number}" to "{number}". Example: "USK {16}" to "{16}". Maybe there could be a lookup table for common abbreviations such as "ESRB T", so that they get changed to their respective number. "ESRB {abbreviation}" to "{number}. Example: "ESRB {AO}" to "{18}"
    Other: No age number, e.g. parental advisory
For this option, there should be an escape character to suppress the parsing. For example, if I don't want "PEGI 12" and "USK 12" to be the same, I could put a backslash in front of one. That particular rating will be a new bar in the histogram, displayed with full name. Example: "\PEGI 12" will be displayed as "PEGI 12" below the histogram.

Option 2: The other option I had in mind was to add a second input field to the age rating field (which I have already seen on OSX, but I can imagine that it would be difficult to implement something similar for Pocketpedia on mobile). In the first input field you can enter whatever you want, it won't be used for the histogram, just for displaying purposes. In the second field the number or expression is entered.
But that second input would still need to be parsed in order to unify entries such as "RP" and "Rating pending" and "pending", or to interpret abbreviations such as "AO" or full expressions like "Teens" or "approved for teens" or "mature".
(Of course the categories listed above would stay the same and in the same order for the histogram.)
In order to choose wether to parse the input there would need to be a third field with a dropdown menu, or a checkbox alternatively. This would be the equivalent of the backslash of option 1.


1.4 New statistic:
There's a statistic of how many books I have read in a year. I think this has already been requested, but I'm not sure. I'll add it anyways:
It would be very cool to add another statistic of how many pages I have read in a year. And one for audiobooks (how many hours I've "read" in a year). Maybe show them side by side, if possible. Like this:

Image


2.0 Statistics options regarding appearance and behaviour
This applies to all the pedias.
At the moment, the statistics window opens in the same window as the library. If I have no item selected, it displays statistics for all of them. If I select a row of items, it shows statistics for them only. To change the items, I have to close the statistics window first. This way one doesn't get to see the change.
I would certainly appreciate it if there were some options in the settings to choose from, and I'd wager there are more people who would. One option to open statistics in a new, separate window. And for this option, another option to choose wether or not the statistics should refresh when I select or deselect different items. Or maybe, if statistics are opened in a new window, only allow manual refresh via button in that window. The button could be on the top bar where "options" and "done" are located.
Here is a scheme of how the setup could be arranged:

Image



2.1 Refresh statistics when data is changed
Regardless of the ideas at 2.0 it would make for a more fluent feeling of the application if the statistics refreshed the exact same moment I click "Save" in the editing window. Right now I have to close statistics and open them again, which is a little inconvenient.
Of course, if statistics are set to open in a new window and the setting is "Don't refresh automatically", this refresh after editing shouldn't override the settings.



3.0 Input fields
There are some input fields that would profit of multiple entries like the "genres" field has them. As I've only used Bookpedia and Gamepedia I don't know how it is with the others. But for games there are often multiple developers and 1-3 studios (publisher). The "publisher" and "developer" input field badly need this feature. For example Doom 3. Four developers: Splash Damage, Aspyr Media, Id Software, Vicarious Visions. It's not like I use the "developer" field in statistics, but others might — and it would make the data input as a whole more complete and fitting.







Foot note A few final remarks
I hope everything is complete and explained well enough. If there are missing parts or unfinished ideas in this post, I'm sorry for that. After reading the text too often I'm not able to spot any mistakes anymore. :oops:
I know, this is a lot, I accumulated this over many weeks, maybe even months of using the pedias.
I don't even know which of these ideas are possible, but there's no harm in asking. And even if they are possible, in the end you might just decide to not use them — for priority reasons or because it's too much work or for whatever reason there might be. And that's okay, because the apps are still awesome!
Also, you might need to brace yourself, another equally long post regarding Pocketpedia, and a short post regarding date formatting in the pedias might be coming in the next days...