Over the last year, we have been using our own importer at DiscourseHosting. We think we have some conceptual advantages over the 'official' importers, but also a lot of disadvantages. We have been contemplating cleaning up our code and releasing it, but I think it would be nicer to take some of our concepts and re-use them in the official importers.
I am opening this topic to discuss the advantages and disadvantages, and to see if you would be interested if we opened up a PR for this.
First of all, a comparison:
The official importers are more cleanly coded
The official importers are a bit faster
The official importers have more features - like nested categories
The official importers are well, the official importers
The offical importers are the official importers, as in: there are multiple. Some things are implemented in one importer but not in another. We have a single importer using YML config files for the different forum types like bbPress, Phorum, Vanilla, VBulletin. The importer gets the YML config file as a command line argument.
This makes it faster and easier to create a new importer in our approach. Almost all code is reused, we only need to make new queries mapping the source database to a generic import format
The official importers have hardcoded MySQL credentials, source DB and table prefixes. Those are read from the YML file in our importer, making it easier to change stuff and keep the old code as well
When we develop a new feature, it applies to all existing import scripts. You are using derived classes but there is still quite a lot of logic in the derived class.
Our import script has support for password migration using our plugin https://github.com/discoursehosting/discourse-migratepassword so people can login using their original password
Our import script can write the Discourse ID to the orginal database, making redirection scripts more easy to implement on the original server (you do it the other way around but that isn't always the best solution for our customers, for instance when the original forum is in www.domain.com/forum and the new forum is at forum.domain.com ).
This also makes restarting the import much faster.
An example of such a YML config script can be found below.
I would like to know if you are interested in us creating a scripts/import_scripts/generic.rb
script that uses our concepts but that is using your base class like all the other importers, or maybe you explicitly chose for your current approach and you are not interested in this.
Example vanilla.yml
sql_server: localhost
sql_user: root
sql_password: password
sql_database: XXXXX
discourse_admin: system
test_mode: false
max_errors: 1000
prepare_users_query: |
ALTER TABLE gdn_user
ADD COLUMN discourse_id INT NOT NULL DEFAULT '0';
prepare_posts_query: |
ALTER TABLE gdn_comment
ADD COLUMN discourse_id INT NOT NULL DEFAULT '0';
prepare_topics_query: |
ALTER TABLE gdn_discussion
ADD COLUMN discourse_id INT NOT NULL DEFAULT '0';
get_user_query: |
SELECT
u.UserID AS user_id,
u.Name AS fullname,
u.Name AS username,
u.Password AS crypted_password,
u.Email AS email,
IF(u.DateLastActive = '', DateInserted, DateLastActive) AS lastvisit,
u.Admin AS is_admin,
1 AS is_active,
u.discourse_id,
FROM gdn_user u
WHERE u.Name != 'System'
AND u.discourse_id != -1
get_post_query: |
SELECT
d.DiscussionID * 1000000 AS post_id,
d.DiscussionID AS topic_id,
d.Name AS topic_title,
u.Name AS username,
u.UserID AS user_id,
u.discourse_id AS discourse_user_id,
cat.Name AS category_name,
d.DateInserted AS post_time,
IFNULL(d.DateUpdated, d.DateInserted) AS post_edit_time,
replace(replace(d.body,'\t', ''), '<br />', '\n') AS post_text,
d.discourse_id
FROM gdn_discussion d
LEFT JOIN gdn_category cat ON cat.CategoryID = d.CategoryID
LEFT JOIN gdn_user u ON u.UserID = d.InsertUserID
WHERE d.discourse_id = 0 AND d.InsertUserID > 0
GROUP BY d.DiscussionID
UNION
SELECT
c.CommentID AS post_id,
d.DiscussionID AS topic_id,
d.Name AS topic_title,
u.Name AS username,
u.UserID AS user_id,
u.discourse_id AS discourse_user_id,
cat.Name AS category_name,
c.DateInserted AS post_time,
IFNULL(c.DateUpdated,c.DateInserted) AS post_edit_time,
replace(replace(c.body,'\t', ''), '<br />', '\n') AS post_text,
c.discourse_id
FROM gdn_comment c
LEFT JOIN gdn_discussion d ON d.DiscussionID = c.DiscussionID
LEFT JOIN gdn_category cat ON cat.CategoryID = d.CategoryID
LEFT JOIN gdn_user u ON u.UserID = c.InsertUserID
WHERE c.discourse_id = 0 AND c.InsertUserID > 0
GROUP BY c.CommentID
ORDER BY post_time
get_likes_query: |
unique_topic_query: |
SELECT discourse_id
FROM gdn_discussion
WHERE DiscussionID = %d
process_user_query: |
UPDATE gdn_user
SET discourse_id = %d
WHERE UserID = %d
process_topic_query: |
UPDATE gdn_discussion
SET discourse_id = %d
WHERE DiscussionID = %d
process_post_query: |
UPDATE gdn_comment
SET discourse_id = %d
WHERE CommentID = %d
reset_topics_query: |
UPDATE gdn_discussion
SET discourse_id = 0
reset_posts_query: |
UPDATE gdn_comment
SET discourse_id = 0
reset_users_query: |
UPDATE gdn_user
SET discourse_id = 0