Long March to the Middle Kingdom: 谢谢, 多邻国! Xiè Xie, Duō Lín Guó! (Thank You, Duolingo!)

I started learning Mandarin through Duolingo on 21 June 2021. Almost exactly 5 months later, on 20 November 2021, I completed all 88 lessons in the course, and 6 levels in each lesson.

Thank you, Duolingo! It's been an incredible learning experience, and I'm immensely grateful.

What next?

Well, I'm going to keep practising the lessons on Duolingo. The spaced repetition algorithm used by the app is a scientifically proven way to get a language into one's long-term memory, so this would be a great safety net to ensure I don't forget what I've learnt.

Of course, I have a few other resources that I will now pay more attention to, to continue to progress with my language learning. Du Chinese is an obvious one. I haven't been spending enough time on Du Chinese for a while, since Duolingo has been taking up about 2 hours of my time every day. With that effort easing up, I can devote more time to Du Chinese and to other online resources that I find on the web. I will post about any interesting ones I find as I go along.

One other interesting thing I've done is download the full set of words that Duolingo covers in its course. Some good souls have made this publicly available as a spreadsheet here.

Being an IT person with some database skills, I loaded this word list into a PostgreSQL database on my computer. The following steps may be useful to anyone who wants to slice and dice the data in ways that a simple spreadsheet cannot do.

create table t_temp
(
hanzi varchar(100) not null,
pinyin varchar(100) not null,
toneless_roman varchar(100),
meaning text
);

The PostgreSQL command to load a CSV-formatted spreadsheet with four columns into the above table looks like this (assuming one has saved the spreadsheet into a CSV file with semicolons as delimiters instead of commas).

\COPY t_temp from t_temp.csv CSV HEADER DELIMITER ';';

Once the data is loaded, here's the neat thing you can do. Use the column 'toneless_roman' to hold the romanised pronunciation of each word - without the diacritical marks that represent the tones. I'll show you why in a moment.

begin transaction;

update t_temp set toneless_roman = pinyin;

update t_temp set toneless_roman = replace( toneless_roman, 'ā', 'a' );
update t_temp set toneless_roman = replace( toneless_roman, 'á', 'a' );
update t_temp set toneless_roman = replace( toneless_roman, 'ǎ', 'a' );
update t_temp set toneless_roman = replace( toneless_roman, 'à', 'a' );

update t_temp set toneless_roman = replace( toneless_roman, 'ē', 'e' );
update t_temp set toneless_roman = replace( toneless_roman, 'é', 'e' );
update t_temp set toneless_roman = replace( toneless_roman, 'ě', 'e' );
update t_temp set toneless_roman = replace( toneless_roman, 'è', 'e' );

update t_temp set toneless_roman = replace( toneless_roman, 'ī', 'i' );
update t_temp set toneless_roman = replace( toneless_roman, 'í', 'i' );
update t_temp set toneless_roman = replace( toneless_roman, 'ǐ', 'i' );
update t_temp set toneless_roman = replace( toneless_roman, 'ì', 'i' );

update t_temp set toneless_roman = replace( toneless_roman, 'ō', 'o' );
update t_temp set toneless_roman = replace( toneless_roman, 'ó', 'o' );
update t_temp set toneless_roman = replace( toneless_roman, 'ǒ', 'o' );
update t_temp set toneless_roman = replace( toneless_roman, 'ò', 'o' );

update t_temp set toneless_roman = replace( toneless_roman, 'ū', 'u' );
update t_temp set toneless_roman = replace( toneless_roman, 'ú', 'u' );
update t_temp set toneless_roman = replace( toneless_roman, 'ǔ', 'u' );
update t_temp set toneless_roman = replace( toneless_roman, 'ù', 'u' );

commit;

Now you can run neat queries like this, which shows you all words that are pronounced "shi", whether the exact tonal pronunciation is shī, shí, shǐ or shì.

And this query is even more useful, because it can show you all words that contain any variant of "shi".

I'm going to be spending a lot of time mulling over the words I've learnt through Duolingo, and quite a bit of that rumination is going to involve SQL queries on these four columns to discover the semantic connections between words. In spite of its tremendous usefulness, Duolingo only provides the main meaning of each word, not its etymology or the literal meanings of compound words. I've posted earlier about some of the fascinating meanings I've discovered, and I'm sure my SQL database will help me discover many more.

Best of all, my database will continue to grow even beyond the vocabulary provided by Duolingo, and I hope to use this as a learning aid indefinitely.

I will post about all the insights I gain from my explorations in this blog, of course.

But for now, 非常感谢，多邻国 fēicháng gǎnxiè, duō lín guó ("Thank you very much, Duolingo").

Long March to the Middle Kingdom

Saturday, 20 November 2021

谢谢, 多邻国! Xiè Xie, Duō Lín Guó! (Thank You, Duolingo!)

No comments:

Post a Comment