I run a website (https://jpdb.io/) which has an Anki importer so I deal with a lot of Anki databases that people send to me and which fail to import, and yeah, Anki's database schema is kind of a mess to be honest. (Which is to be expected for a program of Anki's age and with such a long development history.)
A few extra tidbits:
- A few versions back the cards' "ease" field (that is - how a card was graded) meant something different depending on which phase the card was at the time (so sometimes "2" meant "hard" and sometimes "2" meant "okay"). It was finally fixed and AFAIK in new versions it's consistent now, but apparently the migration didn't always work properly and I still sometimes see databases where the grading is the other way around compared to what it's supposed to be, and I need to heuristically detect that this is the case and handle it.
- Initially JSON blobs were used to store a lot of data; relatively recently that was changed so that it's stored as proper tables, but not completely, so a lot of data's still in the blobs, but this time instead of JSON it's protobuf. (Which seems strange to me considering SQLite has native support for JSON.)
It's a good thing the schema's slowly being cleaned up, but unfortunately it's only done incrementally, so every time any little thing changes I need to add yet another special case to my importer to handle it, and often in various permutations too because some databases are half migrated Frankensteins. (Don't ask me how that happens; I don't know. Maybe it's an issue of people using outdated plugins with their Anki installation, or copying their database between multiple independent Anki implementations, or maybe the current phase of the moon's just wrong.)
I love JPDB, it's cool to randomly see you in the wild!
I can't personally use JPDB (due to my own niche learning strategy, not a flaw in JPDB), but I desperately want to be able to consume the underlying data. It's just that good -- the data that you've curated is unbeatable. If you ever provide a public API, I'll join your Patreon in a heartbeat.
I've thought about potentially tackling other languages in the future, and I think that would be fun to work on too, but alas, at this point I don't really have the resources to even be able to work on it as-is (since this is currently purely a spare time project, and my TODO list is already hundreds of items long), so I'd be just spreading myself way too thin.
> due to my own niche learning strategy, not a flaw in JPDB
Just for curiosity's sake - what kind of strategy is it, if I may ask? I have a very ambitious plans for the future, so depending on what exactly it is it might be possible someday.
I know I gave jpdb a try a while ago and found that while the dataset is incredible (and like others I'd pay just for it) but the built in tool doesn't work the way I need.
I have always had the most success with Anki and Wanikani when it comes to Japanese. Trying to add in yet another paradigm for learning is frustrating. I appreciate you've put a lot of effort into helping people move from those tools, but I don't want to.
The single biggest reason is offline access. Anki works on my phone on an airplane or in an area with no mobile service. (In Australia there are lots of those).
I only started using WK seriously when I discovered the Android apps that let me do my reviews offline.
If you had a Patreon tier that allowed for Anki exports of your lists I'd sign up in a heartbeat even if it only allowed 1 download per month of something similar. I mean lets be honest what possible valid need could I have for downloading the whole data set in one go...
Compared to the time it would take me to use Subs2srs across a season of a show I'd rather just give my money to you.
Well, there's nothing wrong with using Anki if it works for you! I know that a lot of folks need offline access and/or other features which Anki provides which I don't have, and that's totally fine.
I always ask this not because I necessarily want to convert people to use my thing, but because I always love to hear what features people need and what I can improve. In case of offline access it is something that's technically on my tentative roadmap, but very far off into the future, so indeed for anyone who needs that Anki's the better choice.
> allowed for Anki exports
That's something that I'm planning to add very soon actually! Well, maybe not exactly Anki exports (I haven't yet researched as to what that would entail), but just generic functionality to be able to export the built-in decks as a .csv (which I'll be happy to tweak/improve to make it easier to import).
I'm following an eclectic strategy where I isolate and separately learn spoken and written Japanese. The process looks a little like this:
1. I start with a deck of Anki vocabularly notes that I want to acquire
2. Study begins with "Speech" Anki cards from these notes (The card front is audio-only, including a clip of the word and a clip of an example sentence. The back has the English definition & a helper image). I only consider a card as being "Good" once I am able to recall & replicate the pitch accent with a steady rythm (I pipe back delayed audio from my microphone while I practice with a metronome running)
3. In parallel, I also do Kanji isolation study using KKLC
4. Each week, I manually enable new "Writing" Anki cards that come from the same set of notes (The writing is on the front. Only the word audio is on the back). I only enable a "Writing" Anki card if I have previously learned BOTH the component Kanji and the spoken word
5. I study my enabled "Writing" Anki cards in parallel with the other two tracks
I like this approach because I effectively have three separate learning tracks that I can switch between -- the variety keeps me motivated. It also helps train your ear to be able to distinguish homophones by pitch and leads you to think of 同訓異字 writings as variations of a spoken word, rather than as true homophones.
Wow, that's indeed a very niche learning strategy. (: I like it though!
I'd definitely like to expand the configurability of jpdb up to a point where you'll actually be able to do something like this in the future. Unfortunately that's not going to be anytime soon, so you're definitely better off with sticking with what you have now. (The most immediate feature that I have planned soon-ish are pure kanji decks; the necessary customizability for the rest will come much later.)
I'm incredibly thankful that Anki exists, I don't think I would have ever learned Japanese to a high level without it. But having spent some time looking at its guts myself, it sure is a mess in there. I thought at one point about building something on top of Anki, but decided against it discovering some of the same stuff that has already been mentioned.
The main Anki codebase is getting rewritten in Rust (from Python) so they'll probably clean up a whole lot of technical debt in the process and make future contributions easier.
A few extra tidbits:
- A few versions back the cards' "ease" field (that is - how a card was graded) meant something different depending on which phase the card was at the time (so sometimes "2" meant "hard" and sometimes "2" meant "okay"). It was finally fixed and AFAIK in new versions it's consistent now, but apparently the migration didn't always work properly and I still sometimes see databases where the grading is the other way around compared to what it's supposed to be, and I need to heuristically detect that this is the case and handle it.
- Initially JSON blobs were used to store a lot of data; relatively recently that was changed so that it's stored as proper tables, but not completely, so a lot of data's still in the blobs, but this time instead of JSON it's protobuf. (Which seems strange to me considering SQLite has native support for JSON.)
It's a good thing the schema's slowly being cleaned up, but unfortunately it's only done incrementally, so every time any little thing changes I need to add yet another special case to my importer to handle it, and often in various permutations too because some databases are half migrated Frankensteins. (Don't ask me how that happens; I don't know. Maybe it's an issue of people using outdated plugins with their Anki installation, or copying their database between multiple independent Anki implementations, or maybe the current phase of the moon's just wrong.)