Issue upgrading MMC

Hi

I am having trouble to start working with the current IPMC code. I tried many different approaches without luck. In the end, I made the simplest test I could think of, but even that seems to be crashing for me.

Here is what I did:

  • got a fresh clone of the ipmc-project repository
  • removed the file ipmc-user/user_mainfile.c
  • edited the config.xml file to be similar (very basic changes, I left all the details and sensors out, please see attached) to what I will need
  • executed compile.py which performed the compilation without errors and returned to me some files, including hpm1all.img that I use to program the IPMC
  • programmed our IPMC with that file and activated it
  • the IPMC system immediately reverted to the firmware that was previously used, which suggests me that the code is seg faulting or something.

I am probably missing something obvious. I would appreciate some feedback.

Thanks

PS: Well, it seems I cannot attach anything here. I will just add the config.xml file below.

<?xml version="1.0" encoding="UTF-8"?>

<IPMC>
	<GeneralConfig>

		<DeviceID>0x12</DeviceID>
		<DeviceRevision>0x00</DeviceRevision>
		<ManufacturerID>0x000060</ManufacturerID>
		<ProductID>0x1236</ProductID>

		<ManufacturingDate>06/01/2017</ManufacturingDate>

		<BoardManuf>Cirly/Addax</BoardManuf>
		<BoardName>TEST_FRUFROLLBACK</BoardName>
		<BoardSN>00001</BoardSN>
		<BoardPN>P580050995</BoardPN>

		<ProductManuf>CERN</ProductManuf>
		<ProductName>IPMC-TestPAD</ProductName>
		<ProductPN>PN00001</ProductPN>
		<ProductSN>0000001</ProductSN>
		<ProductVersion type="major">1</ProductVersion>
		<ProductVersion type="minor">20</ProductVersion>

		<MaxCurrent>30.0</MaxCurrent>
		<MaxInternalCurrent>2.0</MaxInternalCurrent>

		<!-- Hardware -->
		<HandleSwitch active="LOW" inactive="HIGH" />

        <!-- <ResetOnWrongHAEn /> -->
        <!-- <PowerMonitoringEn /> -->
        <!-- <AlertMonitoringEn />-->

        <!-- Shutdown timeout in tens of ms (optional - if not defined: 10s) -->
        <shutdownTimeout>0</shutdownTimeout>

        <nonVolatileParams forced="false" />
	</GeneralConfig>

    <SerialInterfaces>
        <!--
            This part allows connecting the UART port to interfaces.

            The ports 0 to 2 are linked to the hardware:
                port 0: Edge connector (Tx: 57 / Rx: 60)
                port 1: Edge connector (Tx: 58 / Rx: 61)
                port 2: Optionnal UART (Tx: 75 / Rx: 76)

            Warning: Enabling port 2 will automatically set the GPIOs in UART mode!

            For each bord, the following name can be used:
                "SOL": Serial Over Lan
                "SDI": Serial Debug Interface
                "PI": Payload Interface

            The baudrate can be set using the baudrate param. By default,
            it is configured to 115200b/s.
        -->
        <Connect port="0" name="SDI" baudrate="115200"/>
        <!-- <Connect port="1" name="PI"  baudrate="115200"/> -->
        <!-- <Connect port="2" name="SOL"  baudrate="115200" extended="true" /> -->

        <RedirectSDItoSOL/>

    </SerialInterfaces>

	<PowerManagement>

		<PowerONSeq>
			<step>PSQ_ENABLE_SIGNAL(CFG_PAYLOAD_DCDC_EN_SIGNAL)</step>
			<step>PSQ_END</step>
		</PowerONSeq>

		<PowerOFFSeq>
			<step>PSQ_DISABLE_SIGNAL(CFG_PAYLOAD_DCDC_EN_SIGNAL)</step>
			<step>PSQ_END</step>
		</PowerOFFSeq>

	</PowerManagement>

	<LANConfig>

		<MACAddr>0A:0A:0A:0A:0A:86</MACAddr>
		<NetMask>255.255.255.0</NetMask>
		<GatewayIP>192.138.1.3</GatewayIP>

        <UseFlashedMAC />
        <EnableDHCP />

		<IPAddrList> <!-- Default IP Addresses (used if DHCP is not active) -->
			<IPAddr slot_addr="default">192.168.1.34</IPAddr>
			<IPAddr slot_addr="0x41">192.168.1.20</IPAddr>
			<IPAddr slot_addr="0x42">192.168.1.21</IPAddr>
			<IPAddr slot_addr="0x43">192.168.1.22</IPAddr>
			<IPAddr slot_addr="0x44">192.168.1.23</IPAddr>
			<IPAddr slot_addr="0x45">192.168.1.24</IPAddr>
			<IPAddr slot_addr="0x46">192.168.1.25</IPAddr>
			<IPAddr slot_addr="0x47">192.168.1.26</IPAddr>
			<IPAddr slot_addr="0x48">192.168.1.27</IPAddr>
			<IPAddr slot_addr="0x49">192.168.1.28</IPAddr>
			<IPAddr slot_addr="0x4a">192.168.1.29</IPAddr>
			<IPAddr slot_addr="0x4b">192.168.1.30</IPAddr>
			<IPAddr slot_addr="0x4c">192.168.1.31</IPAddr>
			<IPAddr slot_addr="0x4d">192.168.1.32</IPAddr>
			<IPAddr slot_addr="0x4e">192.168.1.33</IPAddr>
			<IPAddr slot_addr="0x4f">192.168.1.34</IPAddr>
			<IPAddr slot_addr="0x50">192.168.1.35</IPAddr>
		</IPAddrList>

	</LANConfig>

	<AMCSlots>

		<AMC site="1">
			<PhysicalPort>1</PhysicalPort>
			<MaxCurrent>6.0</MaxCurrent>
			<PowerGoodTimeout>300</PowerGoodTimeout>
			<DCDCEfficiency>85</DCDCEfficiency>
		</AMC>

		<AMC site="2">
			<PhysicalPort>2</PhysicalPort>
			<MaxCurrent>6.0</MaxCurrent>
			<PowerGoodTimeout>300</PowerGoodTimeout>
			<DCDCEfficiency>85</DCDCEfficiency>
		</AMC>

		<AMC site="3">
			<PhysicalPort>3</PhysicalPort>
			<MaxCurrent>6.0</MaxCurrent>
			<PowerGoodTimeout>300</PowerGoodTimeout>
			<DCDCEfficiency>85</DCDCEfficiency>
		</AMC>

	</AMCSlots>

	<SensorList>
        <Sensors type="raw" global_define="CFG_SENSOR_MCP9801" function_name="SENSOR_MCP9801" rawType="MCP9801">
            <Sensor>
                <Name>Internal temp.</Name>

                <Type>Temperature</Type>
                <Units>degrees C</Units>

                <NominalReading>25</NominalReading>
                <NormalMaximum>60</NormalMaximum>
                <NormalMinimum>-10</NormalMinimum>

                <Point id="0" x="0" y="0" />
                <Point id="1" x="5" y="5" />

                <Thresholds>
                    <UpperNonRecovery>80</UpperNonRecovery>
                    <UpperCritical>60</UpperCritical>
                    <UpperNonCritical>40</UpperNonCritical>
                    <LowerNonRecovery>-20</LowerNonRecovery>
                    <LowerCritical>-10</LowerCritical>
                    <LowerNonCritical>0</LowerNonCritical>
                </Thresholds>

                <Params>
                    <p type="record_id"></p>  <!-- mandatory -->
                    <p type="user">0x090</p>
                    <p type="user">UCGH | LCGL</p>
                </Params>

                <AssertEvMask>0x0A80</AssertEvMask>
                <DeassertEvMask>0x7A80</DeassertEvMask>
                <DiscreteRdMask>0x3838</DiscreteRdMask>
                <AnalogDataFmt>2S_COMPL</AnalogDataFmt>
                <PosHysteresis>2</PosHysteresis>
                <NegHysteresis>2</NegHysteresis>
                <MaxReading>127</MaxReading>
                <MinReading>-128</MinReading>
            </Sensor>
        </Sensors>

        <!-- Example for GPIO sensors:
        <Sensors type="raw" global_define="CFG_SENSOR_GPIO " function_name="SENSOR_GPIO" rawType="GPIOSENSOR">
            <Sensor>
                <Name>GPIOSens Ex.</Name>

                <Type>Processor</Type>

                <Params>
                    <p type="record_id"></p>
                    <p type="user">0x1</p>
                    <p type="user">POWER_GOOD_12V</p>
                </Params>

                <DiscreteRdMask>0x000F</DiscreteRdMask>
            </Sensor>
        </Sensors>
        -->

        <!-- Example for payload sensors:
        <Sensors type="raw" global_define="CFG_SENSOR_PAYLOAD_THRESHOLD" function_name="SENSOR_PAYLOAD_THRESHOLD" rawType="PAYLOADSENSOR_THRESH">
            <Sensor>
                <Name>PayloadSens Ex.</Name>

                <Type>Temperature</Type>
                <Units>degrees C</Units>

                <NominalReading>25</NominalReading>
                <NormalMaximum>60</NormalMaximum>
                <NormalMinimum>-10</NormalMinimum>

                <Point id="0" x="0" y="0" />
                <Point id="1" x="5" y="5" />

                <Thresholds>
                    <UpperNonRecovery>80</UpperNonRecovery>
                    <UpperCritical>60</UpperCritical>
                    <UpperNonCritical>40</UpperNonCritical>
                    <LowerNonRecovery>-20</LowerNonRecovery>
                    <LowerCritical>-10</LowerCritical>
                    <LowerNonCritical>0</LowerNonCritical>
                </Thresholds>

                <Params>
                    <p type="record_id"></p>
                </Params>

                <AssertEvMask>0x0A80</AssertEvMask>
                <DeassertEvMask>0x7A80</DeassertEvMask>
                <DiscreteRdMask>0x3838</DiscreteRdMask>
                <AnalogDataFmt>2S_COMPL</AnalogDataFmt>
                <PosHysteresis>2</PosHysteresis>
                <NegHysteresis>2</NegHysteresis>
                <MaxReading>127</MaxReading>
                <MinReading>-128</MinReading>
            </Sensor>
        </Sensors>
        -->

	</SensorList>
</IPMC>
1 Like

Dear Thiago,

Sorry for the delay but I am just coming back from vactions. As Ralf told you, some problems were found on the i2c bus with the version 1.2 and are fixed on version 1.3. As the command might send a lot of data through the i2c bus, it could be the source of your issue.

Concerning your problem with going to 1.3, it is really weird, I’ve just tried again to compile and force and it looks ok. However, it could be because of a version checking issue that the rollback is automatically performed. Could you check what is printed on the serial debug interface when you activate the new version?

Thank you,
best,
Julian

Dear Julian,

Thanks for answering promptly after your return. I hope you enjoyed your time off.

Concerning your problem with going to 1.3, it is really weird, I’ve just tried again to compile and force and it looks ok.

Just to confirm, have you used the same config.xml I sent above?

However, it could be because of a version checking issue that the rollback is automatically performed. Could you check what is printed on the serial debug interface when you activate the new version?

I am afraid I cannot get this. The IPMC serial is connected to a Zynq device on the same blade, and I would only have access to the serial messages after the Zynq itself finish booting. Can you explain a bit more about this version checking issue?

Also, these IPMCs are an old revision. Could be this the problem? Can you try with a similar one as well?

Thanks,

Thiago

Hi Thiago,

The XML looks good and you should not have any problem using your binary file. When you move from 1.2 to 1.3, the firmware tries to recover “non-volatile parameters”. However, the parameters changed between 1.2 and 1.3 and, at the first boot, as the activate function cannot convert all of the parameter, it automatically performs a rollback. However, the upgrade can be forced by adding the “norollback” instruction when you activate the new firmware.

Connecting to the debug interface allows getting additional details to confirm that the rollback is issued by the non volatile parameter issue. But, if you flash with the norollback and you really have an issue with the firmware, you could face the case where you need to flash your ipmc again via JTAG.

Do you have a JTAG cable that you could use, or even a raspberry pi that we now support for reseting the system? If you have a way to flash back the module, you can try adding the “norollback” instruction to the activate command.

Cheers,
Julian

Hi, Julian.

I bought the JTAG cable sometime ago because I thought we could end-up on this kind of situation, but I never actually used it. Do you have instructions about how to use it under the CERN IPMC context? (Linux solution, please, I don´t have any computer with other operating systems around)

Thanks,

Thiago

Hi Thiago,

Unfortunatelly, I am sorry but Pigeon Point provides only a solution with Windows, I am not aware of equivalent with Linux. But, maybe, using the stapl file, there is a solution to run it on linux.

However, we have a solution to flash the IPMC with a raspberry pi that is described here: IPMC v3 Image Upgrade Issue - #6 by jumendez

Cheers,
Julian

Thanks, Julian. Can you add this somewhere in the documentation? This way people are able to get the correct items from start. I will try to get my hands on a RPi then, but it will stall the debug process for some more days.

Cheers,

Thiago

Hi again, Julian.

I just visited the other issue and the picture is not accessible anymore. Can you please provide it again?

Thanks,
Thiago

Hi Thiago,

The issue is reported in the release note: v.1.3.1 · Tags · ep-ese-be-xtca / ipmc-project · GitLab

That you can also find from the documentation section of the cern-ipmc website: CERN-IPMC > Release note v.1.3

Cheers,
Julian

I’ve just tested the link and it works well on my side, but it takes time to get the file being downloaded. Which kind of error are you facing?

Cheers,
Julian

The picture in that link is not available:

Hi, Julian.

The issue is reported in the release note: v.1.3.1 · Tags · ep-ese-be-xtca / ipmc-project · GitLab

That is not what I meant. I think it would be helpful for the developers to know what they need to have to recover an IPMC, in my case I just learned that I need a RPI for that. If I had this info before, I would get one beforehand in case of unexpected results.

Thiago

The picture in that link is not available:

Now it is, thanks for fixing it =)

Hi, Julian.

I was able to boot the RPI with the image you provided, and I connected the RPI to the IPMC development bed following the pin mapping suggested. However, I am getting the following error:

pi@raspberrypi:~ $ flashipmc                                                                                                                                                                                        
[Progress] 10%                                                                                                                                                                                                      
{'details': {'file': '/home/pi/ipmc-tester/ipmc-tester/../ipmc-config/stapl/20200928_cern_ipmc.stp'},                                                                                                               
 'measurements': [{'data': {'s': 'Init', 'v': 244.45676803588867},                                                                                                                                                  
                   'name': 'timer',                                                                                                                                                                                 
                   'pass': 0}],                                                                                                                                                                                     
 'name': 'jam_player_program',                                                                                                                                                                                      
 'pass': -1,
 'summary': {'duration': 244.79293823242188,
             'exitcode': -1,
             'finalState': -1,
             'verbose': 'Init failed - device not found?'},
 'test_type_name': 'PROGRAM_v.1.0'}

BTW, the development kit powered up as it should. I will try to debug the setup further, but I was wondering if you have any suggestions since we are in a rush.

One more question: I assume that it will flash the IPMC with some default firmware, right? So in case of troubles, the procedure is: remove the IPMC from the blade, install it in the development kit, flash this default firmware and then return it to the blade to reprogram it as usual. Does this sound correct?

Thanks,

Thiago

Hi, Julian.

I was able to get the recovery setup to work (informing here so you don’t need to spend time on this). I was able to recover a bricked IPMC from more than 2 years ago caused by a norollback that failed (no idea why).

Now that I have a way to recover the cards in case of problems, I will proceed with using the norollback option you suggested above for the current issue.

Thiago

Hi Thiago,

Did you manage to switch to v.1.3 ? Did it solve your issue ?

Cheers,
Julian

Dear Thiago,

I was just wondering where you are with updating the IPMC software from 1.2 to 1.3. Any news on the issue of using the IPMC for updating the MMC?

Cheers,
Ralf.

Hello,

I’m also from the NSW and I’m trying to look at the issue since we’re getting closer and closer to running and Thiago is quite busy with other work, too.

However, I’m quite new to IPMC. How should we update the IPMC software? If I ran this command I got

ipmitool -H 192.168.0.2 -P "" -m 0x20 -t 0x72 -T 0x82 -b 7 -B 0 mc info
Device ID                 : 1
Device Revision           : 0
Firmware Revision         : 1.07
IPMI Version              : 1.5
Manufacturer ID           : 28688
Manufacturer Name         : Unknown (0x7010)
Product ID                : 528 (0x0210)
Product Name              : Unknown (0x210)
Device Available          : yes
Provides Device SDRs      : yes
Additional Device Support :
    Sensor Device
    FRU Inventory Device
    IPMB Event Generator

Rongkun

Hi,

I’m tried compiling with v.1.3.1 tag of ep-ese-be-xtca / ipmc-project · GitLab. However I encounter some error message during utf-8 decoding in line strVar = strVar + chunk.decode("utf-8")

Traceback (most recent call last):
  File "compile.py", line 302, in <module>
    main(USERNAME, PASSWORD)
  File "compile.py", line 270, in main
    link = CERNSSORequestInstance.post_stream(URL, data = values, files = files)
  File "compile.py", line 134, in post_stream
    strVar = strVar + chunk.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 0: unexpected end of data

I wonder if the url request returns incorrect data? Are you able to reproduce this error?

Another question I have is should we use false or true in <nonVolatileParams forced="false" /> ?

Best,
Rongkun

Hi Rongkun,

Can you please update your compile.py file to read

                strVar = strVar + chunk.decode("utf-8", "ignore")

in line 134?

This was mentioned in issue IPMC Compilation not working and was updated in the ep-ese-be-xtca/ipmc-project gitlab files.

Please let me know if it helps. Cheers,
Ralf