ETH官方钱包

前往
大廳
主題

狗頭人哨兵 靜立 V2

夏洛爾 | 2022-11-27 02:50:06 | 巴幣 0 | 人氣 132


Kobold Sentinel Stand V2

實驗?zāi)繕?biāo):
1.進(jìn)入站立瞬間後,由於其實可能仍處於不穩(wěn)定狀態(tài),要再進(jìn)入靜立狀態(tài)
2.進(jìn)入站立瞬間後,可能面向並沒有瞄準(zhǔn)目標(biāo),要轉(zhuǎn)向目標(biāo)
3.使用Clamp Reward避免快速自殺

實驗設(shè)計:
1.任何弱點觸地皆失敗 (尾巴和武器並非弱點)
2.非弱點肢體
if(koboldBodies[i].damageCoef > 0f){// AddReward(-0.01f * koboldBodies[i].damageCoef);clampReward += -0.01f * koboldBodies[i].damageCoef;}
3.
//Set: judge.endEpisode = true//Set: judge.episodeLength = 10f//Set: weapon, tail not weakness//Set: useClampReward = trueif(weaknessOnGround){if(inferenceMode){brainMode = BrainMode.GetUp;SetModel("KoboldGetUp", getUpBrain);behaviorParameters.BehaviorType = BehaviorType.InferenceOnly;}else{//===Train Stand===AddReward(-1f);judge.outLife++;judge.Reset();return;//===Train Other===// brainMode = BrainMode.GetUp;// SetModel("KoboldGetUp", getUpBrain);// behaviorParameters.BehaviorType = BehaviorType.InferenceOnly;}}else if(koboldRoot.localPosition.y < -10f){if(inferenceMode){brainMode = BrainMode.GetUp;SetModel("KoboldGetUp", getUpBrain);behaviorParameters.BehaviorType = BehaviorType.InferenceOnly;}else{//===Train Stand===AddReward(-1f);judge.outY++;judge.Reset();return;//===Train Other===// brainMode = BrainMode.GetUp;// SetModel("KoboldGetUp", getUpBrain);// behaviorParameters.BehaviorType = BehaviorType.InferenceOnly;}}else if(targetDistance > 500f){judge.Reset();}else{targetSmoothPosition = targetPositionBuffer.GetSmoothVal();headDir = targetSmoothPosition - stageBase.InverseTransformPoint(koboldHeadRb.position);rootDir = targetSmoothPosition - stageBase.InverseTransformPoint(koboldRootRb.position);flatTargetVelocity = rootDir;flatTargetVelocity.y = 0f;targetDistance = flatTargetVelocity.magnitude;Vector3 flatLeftDir = Vector3.Cross(flatTargetVelocity, Vector3.up);lookAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldHead.up, headDir));upAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldHead.right * -1f, Vector3.up));//LeanVector3 leanDir = rootAimRot * flatTargetVelocity;spineLookAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldSpine.forward, flatLeftDir));spineUpAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(koboldSpine.right * -1f, leanDir));rootLookAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldRoot.right * -1f, flatLeftDir));rootUpAngle = Mathf.InverseLerp(180f, 20f, Vector3.Angle(koboldRoot.up, leanDir));float velocityReward = GetVelocityReward(10f);float angularReward = GetAngularVelocityReward(15f);float standReward = (koboldLeftFeetBody.isStand? 0.5f : 0f) + (koboldRightFeetBody.isStand? 0.5f : 0f);if(Time.fixedTime - landingMoment > landingBufferTime){bool outVelocity = velocityReward > Mathf.Lerp(1f, 0.3f, (Time.fixedTime - landingMoment - landingBufferTime)/3f);bool outAngularVelocity = angularReward > Mathf.Lerp(1f, 0.5f, (Time.fixedTime - landingMoment - landingBufferTime)/5f);bool outSpeed = outVelocity || outAngularVelocity;float aimLimit = Mathf.Lerp(0f, 0.7f, (Time.fixedTime - landingMoment - landingBufferTime)/5f);float aimLimit2 = Mathf.Lerp(0f, 0.85f, (Time.fixedTime - landingMoment - landingBufferTime)/3f);bool outDirection = lookAngle < aimLimit2 || upAngle < aimLimit2 || spineLookAngle < aimLimit2 || rootLookAngle < aimLimit2;bool outMotion = spineUpAngle < aimLimit || rootUpAngle < aimLimit;if( outSpeed || outDirection || outMotion){AddReward(-1f);if(outSpeed){// Debug.Log("outSpeed");// Debug.Log("outVelocity: " + outVelocity);// Debug.Log("outAngularVelocity: " + outAngularVelocity);judge.outSpeed++;}if(outDirection){// Debug.Log("outDirection");// Debug.Log("lookAngle: " + lookAngle);// Debug.Log("upAngle: " + upAngle);// Debug.Log("spineLookAngle: " + spineLookAngle);// Debug.Log("rootLookAngle: " + rootLookAngle);judge.outDirection++;}if(outMotion){// Debug.Log("outMotion");// Debug.Log("spineUpAngle: " + spineUpAngle);// Debug.Log("rootUpAngle: " + rootUpAngle);judge.outMotion++;}judge.Reset();return;}}//===Train Stand===lastReward = (1f-velocityReward) * 0.015f + (1f-angularReward) * 0.015f+ (lookAngle + upAngle + spineLookAngle + spineUpAngle + rootLookAngle + rootUpAngle) * 0.008f + standReward * 0.01f+ (1f - exertionRatio) * 0.002f;if(lookAngle > 0.9f && upAngle > 0.9f  && spineLookAngle > 0.9f  && rootLookAngle > 0.9f && velocityReward < 0.3f && angularReward < 0.5f && standReward > 0.9f){//===Train Stand===// Debug.Log("Stand");lastReward += 0.01f;}//===Train Stand===if(useClampReward){lastReward = lastReward+clampReward;if(lastReward < 0f) lastReward = 0f;}totalReward += lastReward;AddReward( lastReward );
}

//大致來說
1.鼓勵面向和前傾,使用ForceSharping
2.鼓勵抑制速度和角速度,使用ForceSharping
3.鼓勵雙腳觸地
4.鼓勵抑制出力
5.使用ClampReward

實驗時間:
Step: 5e7
Time Elapsed: 61833s (17.17hr)

實驗結(jié)果:
實驗結(jié)果為成功

狗頭人哨兵可以快速調(diào)整面向並進(jìn)入靜立狀態(tài)
但是有兩點不穩(wěn)定要素
1.目前看來有機(jī)會不是每次都很有效率
2.和道爾靜立3相同,靜立狀態(tài)未必能持續(xù)到Survived條件 (在包含F(xiàn)orceSharping條件下靜立10秒)

目前看起來效率部分,有可能是尺寸這個新變數(shù),更進(jìn)一步分散訓(xùn)練步數(shù),導(dǎo)致訓(xùn)練不夠成熟
而Survived部分,已經(jīng)放寬數(shù)個ForceSharping標(biāo)準(zhǔn),但還是會超過
這裡也可能是訓(xùn)練步數(shù)不足

然而目前總體來說表現(xiàn)相當(dāng)好,靜立狀態(tài)也非常安定
偶爾例外的情況也是能幾個動作就修復(fù)平衡

但有一個值得關(guān)注的點是LandingBufferTime
目前會被Cirruculum收斂到1s,但是GetUp進(jìn)入Stand時其實有可能在非常不安定的狀態(tài),例如懸空,1s觀測起來確實略嫌過於嚴(yán)格

因此下個實驗將開始進(jìn)行狗頭人追逐實驗
1.跑向目標(biāo)
2.會用Force Sharping引導(dǎo)動作

創(chuàng)作回應(yīng)

更多創(chuàng)作