Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-5148

IN2P3 cluster worker nodes failed to start due to Innodb error

    XMLWordPrintable

Details

    • Bug
    • Status: Won't Fix
    • Resolution: Done
    • None
    • Qserv
    • None
    • 4
    • DB_W16_02
    • Data Access and Database

    Description

      Next error happens when starting mariadb on worker (with existing data from 35TB dataset, which were generated by mysql):

      2016-02-13 22:02:36 139632684558144 [Note] InnoDB: Completed initialization of buffer pool
      2016-02-13 22:02:36 139632684558144 [ERROR] InnoDB: auto-extending data file ./ibdata1 is of a different size 640 pages (rounded down to MB) than specified in the .cnf file: initial 768 pages, max 0 (relevant if non-zero) pages!
      2016-02-13 22:02:36 139632684558144 [ERROR] InnoDB: Could not open or create the system tablespace. If you tried to add new data files to the system tablespace, and it failed here, you should now edit innodb_data_file_path in my.cnf back to what it was, and remove the new ibdata files InnoDB created in this failed attempt. InnoDB only wrote those files full of zeros, but did not yet use them in any way. But be careful: do not remove old data files which contain your precious data!
      2016-02-13 22:02:36 139632684558144 [ERROR] Plugin 'InnoDB' init function returned error.
      2016-02-13 22:02:36 139632684558144 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
      2016-02-13 22:02:36 139632684558144 [Note] Plugin 'FEEDBACK' is disabled.
      2016-02-13 22:02:36 139632684558144 [ERROR] Unknown/unsupported storage engine: InnoDB
      2016-02-13 22:02:36 139632684558144 [ERROR] Aborting
      

      Attachments

        Issue Links

          Activity

            It works if I remove InnoDB files on the worker (which should not contains data):

            qserv@ccqserv126:/qserv/data$ /qserv/run/etc/init.d/mysqld start
            Starting MySQL
            [.ok 
            qserv@ccqserv126:/qserv/data$ ps x
               PID TTY      STAT   TIME COMMAND
                 1 ?        Ss     0:00 bash
               375 ?        S      0:00 /bin/sh /qserv/stack/Linux64/mariadb/10.1.11/bin/mysqld_safe --defaults-file=/qserv/run/etc/my.cnf --datadir=/qserv/data/mysql --p
               491 ?        Sl     0:00 /qserv/stack/Linux64/mariadb/10.1.11/bin/mysqld --defaults-file=/qserv/run/etc/my.cnf --basedir=/qserv/stack/Linux64/mariadb/10.1.
               534 ?        R+     0:00 ps x
            

            jammes Fabrice Jammes added a comment - It works if I remove InnoDB files on the worker (which should not contains data): qserv@ccqserv126: /qserv/data $ /qserv/run/etc/init .d /mysqld start Starting MySQL [.ok qserv@ccqserv126: /qserv/data $ ps x PID TTY STAT TIME COMMAND 1 ? Ss 0:00 bash 375 ? S 0:00 /bin/sh /qserv/stack/Linux64/mariadb/10 .1.11 /bin/mysqld_safe --defaults- file = /qserv/run/etc/my .cnf --datadir= /qserv/data/mysql --p 491 ? Sl 0:00 /qserv/stack/Linux64/mariadb/10 .1.11 /bin/mysqld --defaults- file = /qserv/run/etc/my .cnf --basedir= /qserv/stack/Linux64/mariadb/10 .1. 534 ? R+ 0:00 ps x

            From the name of the data file (ibdata1) it looks like it tries to extend default system tablespace and not some specific table tablespace (we have innodb_file_per_table option set, so all regular tables go to their own separate files). I'm not sure what this failure means and why it happens on one host only. Is there a chance that this file was not removed for some reason when you switched from mysql to mariadb?

            salnikov Andy Salnikov added a comment - From the name of the data file (ibdata1) it looks like it tries to extend default system tablespace and not some specific table tablespace (we have innodb_file_per_table option set, so all regular tables go to their own separate files). I'm not sure what this failure means and why it happens on one host only. Is there a chance that this file was not removed for some reason when you switched from mysql to mariadb?

            Hi salnikov,

            Do you think we could add this in workers my.cnf, to make it more robust?

            default-storage-engine=myisam
            skip-innodb
            

            Indeed, it is silly to crash mysql server because of empty and useless InnoDB data.

            jammes Fabrice Jammes added a comment - Hi salnikov , Do you think we could add this in workers my.cnf, to make it more robust? default-storage-engine=myisam skip-innodb Indeed, it is silly to crash mysql server because of empty and useless InnoDB data.

            Removal of empty/unused InnoDB files on workers solve the problem:

            fjammes@ccosvms0070:~/src/qserv-cluster/shmux (master %=)$ cat test-query.sh 
            . /qserv/stack/loadLSST.bash 
            setup mysqlclient
            time mysql --host ccqserv125 --port 4040 --user qsmaster LSST -e "SELECT ra, decl FROM Object WHERE deepSourceId = 2322920177142607;";
            fjammes@ccosvms0070:~/src/qserv-cluster/shmux (master %=)$ ./test-query.sh 
            +--------------------+--------------------+
            | ra                 | decl               |
            +--------------------+--------------------+
            | 29.308806347275485 | -86.30884046118973 |
            +--------------------+--------------------+
             
            real    0m1.301s
            user    0m0.006s
            sys     0m0.018s
            

            jammes Fabrice Jammes added a comment - Removal of empty/unused InnoDB files on workers solve the problem: fjammes@ccosvms0070:~ /src/qserv-cluster/shmux (master %=)$ cat test -query.sh . /qserv/stack/loadLSST . bash setup mysqlclient time mysql --host ccqserv125 --port 4040 --user qsmaster LSST -e "SELECT ra, decl FROM Object WHERE deepSourceId = 2322920177142607;" ; fjammes@ccosvms0070:~ /src/qserv-cluster/shmux (master %=)$ . /test-query .sh +--------------------+--------------------+ | ra | decl | +--------------------+--------------------+ | 29.308806347275485 | -86.30884046118973 | +--------------------+--------------------+   real 0m1.301s user 0m0.006s sys 0m0.018s

            Appears to have been a transient failure based on invalid "left-over" db state from a previous run during development. Probably better to be warned of this in production rather than disabling the storage engine on the workers and not hearing about it.

            fritzm Fritz Mueller added a comment - Appears to have been a transient failure based on invalid "left-over" db state from a previous run during development. Probably better to be warned of this in production rather than disabling the storage engine on the workers and not hearing about it.

            People

              Unassigned Unassigned
              jammes Fabrice Jammes
              Andy Salnikov, Fabrice Jammes, Fritz Mueller, Jacek Becla (Inactive), John Gates
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Jenkins

                  No builds found.